Monday, March 26, 2012

WSD using Term to Term Similarity in a Context Space

The paper[1] proposes to represent terms as bags of contexts and define a similarity measure between terms. The idea is similar to Standard Term-Document matrices used for document similarity. The main challenge lies in representing words, lemmas and senses in same context space, for which they use a very simple idea.

Training Methodology
Training Corpus is represented as a matrix A, of size LXE, where rows being different lemmas/words encountered, basically the dictionary and columns be different examples. Entries in the matrix wi,j  are experimented with binary presence-absence, frequency and tf-idf weighting. Weights in q are set to presence-absence.

Testing Methodology
Representing the sentence in context space : 
A new instance q can represented with the vector of weights, of size 1XL, subsequently transformed into a vector in the context space, by the usual inner product q · A, of size 1XE.

Representing senses in context space : 
Let senik be the representation of kth candidate sense for the ambiguous lemma lemi. It is of size 1XE.
senik[j] = 1 if lemma lemi is used with sense senik in the training context j, and 0 otherwise.

Assigning the sense to ambiguous lemma :
For a new context of the ambiguous lemma lemi , the candidate sense with higher similarity is selected.


Similarity Measures [sim(sen, q)]
Two similarity measures have been compared. The first one (maximum) is a similarity of q as bag of words with the training contexts of sense sen. The second one (cosine) is the similarity of sense sen with q in the context space.

  • Maximum :  Max {j=1:N} (sen_j · q_j ) 
  • Cosine  : (sen · q) /  (||q|| ||sen||)

Observations/Suggestions from the above experiments:
  • almost all the results are improved when the similarity measure (cosine) is applied in the Context Space. The exception is the consideration of co-occurrences to disambiguate nouns.
  • if sense sen_1 
    has two training contexts with the highest 
    number of co-occurrences and sense sen_2 
    has only one with the same number of co-
    occurrences, sen_1 must receive a higher 
    value than sen_2 
Using the above ideas, they propose 
  • Artificially reducing number of co-occurrences: If c1 and c2 are contexts with highest and second highest number of co-occurrences with q, then assign to the first context c1 the number of co-occurrences of context c2
  • Modified Similarity : \sum_{j=1^N} sen_j N^{q_j}

Conclusions:
  • The idea to use semCor for exemplar based approach by term-document matrix idea is interesting and intuitive. 
  • The paper ignores the Wordnet semantic structure and is totally annotated data dependent. The point in focus is that its not a generalized for unseen text/ambiguous words. 

References:
[1] Word Sense Disambiguation based on Term to Term Similarity in a Context Space, Artiles et.al.





















No comments:

Post a Comment