
Following these steps a memory-based classifier exploits the distance between vectors to identify the presence of targeted co-hyponymy relationships. Random Manhattan indexing is used to construct l1-normed spaces and random indexing for l2-normed spaces. To obviate the curse of dimensionality and to facilitate the construction of models, novel methods employing sparse random projections are proposed. Terms are represented as vectors to form a so-called term-space model. This thesis is a study of corpus-based distributional methods for characterising co-hyponymy between terms. In order to use these methods we thus need to define (a) the contexts, that is, which statistical information must be collected and (b) the functions, that is, how this information must be used to correlate with a meaning. The meaning of an entity, such as a word or a phrase, is assumed to be a function of its statistical distribution in contexts. Among empirical methods for analysing linguistic structures, distributional approaches to semantics encode language data to models that should correspond to the meanings of linguistic entities. This thesis proposes the use of a vector-based distributional representation of terms in order to construct a quantitative conceptual model of kinds-sorts in a given field of knowledge. Analysing the co-hyponymy relationships between terms is important because it bridges the semantic gap between a) specialised lexical knowledge, b) the quantitative interpretation of meanings in specialised discourse, and c) machine-accessible conceptualisation of knowledge. Co-hyponyms are sets of lexical units sharing a common hypernym bank and building society, for example, are co-hyponyms of the hypernym financial organisation. The approach taken in this thesis is to analyse the co-hyponymy relationships between terms as an organisational mechanism.


Collecting terms (i.e., creating a specialised vocabulary) and capturing their relationships are thus important mechanisms for distilling knowledge from specialised texts and for formalising it for machines. Language is still the primary medium for communicating knowledge and presumably linguistic objects and structures are expressions of knowledge and its organisation in mind. These processes are not yet accessible directly.

Knowledge is assumed by cognitive science to consist of concepts that are organised and maintained by complex processes taking place in human minds.
