WSD systems have been developed in many ways. One of the simplest thing is to build a classifier for all ambiguous words and that will settle it all. This idea is easy to implement but the problem is the scarcity of Tagged Corpora. Also, these systems don't generalize i.e. being word specific, one can't use System of word1 for word2.
English Lexical Sample task promote the similar ideology. We'll discuss a system, 'Multi-Component Word Sense Disambiguation', proposed by Ciaramita and Johnson in Senseval-3.
Multi-Component Word Sense Disambiguation:
The idea is to use multi class averaged perceptron over a feature space which consists of various linguistic features around the word to be disambiguated[1].
For a word w, a set of instances with feature vectors in the generated space with labels as wordnet senses. For each noun, or verb, synset they generate a fixed number k of other semantically similar synsets (k=100 in experiments) . Thus for each set of labels Y(w), we have an induced set of pseudo-labels, Y(w)'.
The point of using averaged perceptron (Collins, 2002) is to avoid overfitting. Train two systems based on only Y(w) and Y(w) with Y(w)'.
Multilabel cases:
It can happen that a synset can be in neighboring set of more than 1 senses of a word. When this is the case, the instance is labelled as multilabelled instance. This is handled by generalizing/relaxing the perceptron algorithm.
Conclusions by [1]:
Selecting the external training data based on the most similar synsets has the advantage, over using supersenses, of generating an equivalent amount of additional data for each word sense. The additional data for each synset is also more homogeneous, thus the model should have less variance.
Conclusions:
The idea of generating semantically similar synsets to be used in classification is good, rather than just using the description of the sense.
References:
[1] Multi-Component Word Sense Disambiguation. 2004. Massimiliano Ciaramita and Mark Johnson. SENSEVAL-3: Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text, Barcelona, Spain, July 2004. Association for Computational Linguistics