Hi The Lesk algorithm computes similarity between glosses of two concepts. As per the article (Banerjee and Pedersen 2002) an overlap is computed as the longest sequence of one or more consecutive words.
I ran lesk (UMLS3013AA) on a set of 58K pairs using config1 SABDEF:: SNOMEDCT RELDEF:: CUI and almost all pairs had a score of -1. But with config2 SABDEF:: SNOMEDCT RELDEF:: TERM has a number of positive scores. I would like to know 1. What exactly is the gloss when RELDEF has TERM ? Is it the str column in the mrconso table of that cui ? 2. As per my understanding of the documentation, CUI in RELDEF implies that the cui definition is fetched from the mrdef table, which is treated as a gloss. Is this correct ? If CUI is not specified, what is the gloss ? In either case i am unable to understand why config1 gets no output. The concepts i am working on have been extracted from a dataset associated with the same disease. I ran similarity measures on them many yield positive output for similarity. 3. The published paper is for WordNet and the configurations are specific to UMLS. It would be great if you can provide brief commentary on what constitutes a gloss for different options in UMLS. Thanks, Chaitanya.
