[umls-similarity] Details of Lesk

[email protected] [umls-similarity] Mon, 28 Jul 2014 18:06:07 -0700

Hi
 

 The Lesk algorithm computes similarity between glosses of two concepts. As per 
the article (Banerjee and Pedersen 2002) an overlap is computed as the longest 
sequence of one or more consecutive words.


 I ran lesk (UMLS3013AA) on a set of 58K pairs using
 

 config1
 SABDEF:: SNOMEDCT
 RELDEF:: CUI
 and almost all pairs had a score of -1.
 

 But with

 config2
 SABDEF:: SNOMEDCT
 RELDEF:: TERM
 has a number of positive scores.
 

 I would like to know
 1. What exactly is the gloss when RELDEF has TERM ? Is it the str column in 
the mrconso table of that cui ? 
 

 2. As per my understanding of the documentation, CUI in RELDEF implies that 
the cui definition is fetched from the mrdef table, which is treated as a 
gloss. Is this correct ? If CUI is not specified, what is the gloss ? In either 
case i am unable to understand why config1 gets no output. 
 The concepts i am working on have been extracted from a dataset associated 
with the same disease. I ran similarity measures on them many yield positive 
output for similarity.
 

 3. The published paper is for WordNet and the configurations are specific to 
UMLS. It would be great if you can provide brief commentary on what constitutes 
a gloss for different options in UMLS.
 

 Thanks,

 Chaitanya.

[umls-similarity] Details of Lesk

Reply via email to