Hi Jen,

I looked at those particular CUIs and don't think they are in MSH or
SNOMEDCT - that's why you are getting the -1 even though one would imagine
there is some similarity between them. To find some other examples using
Alzheimer's I used UTS Metathesaurus to look up CUIs in MSH that included
the term Alzheimer's (and 9 were found in MSH).

I took 2 of those and ran them with path and got -1, indicating no path
found. However, when I used lesk or vector I found non-zero values. Lesk
and vector are both based on comparing the definitions of two CUIs and do
not rely on finding paths.

tpederse@maraca:~$ perl query-umls-similarity-webinterface.pl C0002395
C0299337 --measure vector --sab MSH
Default Settings:
  --default http://atlas.ahc.umn.edu/
  --rel CUI/PAR/CHD/RB/RN
User Settings:
  --measure vector

0.3131<>Disease, Alzheimer's(C0002395)<>familial Alzheimer's disease
protein 1(C0299337)


tpederse@maraca:~$ perl query-umls-similarity-webinterface.pl C0002395
C0299337 --measure lesk --sab MSH
Default Settings:
  --default http://atlas.ahc.umn.edu/
  --rel CUI/PAR/CHD/RB/RN
User Settings:
  --measure lesk

19<>Disease, Alzheimer's(C0002395)<>familial Alzheimer's disease protein
1(C0299337)

So, the tricky part is sometimes the coverage in different sources - two
CUIs might be intuitively similar but simply not found in the source being
used (or not path between them may exist) so will show a -1 value.

I'm not sure this exactly answers your question, but I will think a little
more and add what I can...

More soon,
Ted

On Mon, Jun 5, 2017 at 5:41 PM, Jennifer Wilson [email protected]
[umls-similarity] <[email protected]> wrote:

>
>
> Hey Ted,
>
> So I haven't quite figured out the MetaMap, but I have a set of diseases
> that I mapped to CUIs another way. I'm still getting negative results with
> diseases that I think should be "similar". For example:
>
> ./query-umls-similarity-webinterface.pl --sab MSH --rel PAR/CHD
> "C1864828" "C3810041"
>
> Default Settings:
>
>   --default http://atlas.ahc.umn.edu/
>
>   --measure path
>
>
> User Settings:
>
>   --rel PAR/CHD
>
>
> ["b'-1", 'ALZHEIMER DISEASE 10(C1864828)', "ALZHEIMER DISEASE
> 18(C3810041)\\n'"]
>
> You can see my results on the last row. Could you advise- Would you expect
> that these two CUIs would not be similar? I wanted to measure path as a
> simple starting point, but could you recommend that another distance might
> be more informative? Thanks again for your help!
>
> On Mon, Jun 5, 2017 at 1:43 PM, Jennifer Wilson <[email protected]>
> wrote:
>
>> Hey Ted,
>>
>> Thanks for all of the help. I found the interactive interface really
>> helpful and had been able to create inputs similar to what you shared. I
>> have an open help ticket now on trying to get the file to download. He gave
>> me some commands to try that I had already tried, so there must be
>> something else to unzipping the code...
>>
>> Thanks again. Hopefully I'm close to a solution!
>>
>> On Mon, Jun 5, 2017 at 11:21 AM, Ted Pedersen [email protected]
>> [umls-similarity] <[email protected]> wrote:
>>
>>>
>>>
>>> Hi Jen,
>>>
>>> Nothing to be embarrassed about at all!. If you haven't already used
>>> MetaMap interactively you might want to try that before you attempt a local
>>> install :
>>>
>>> https://ii.nlm.nih.gov/Interactive/UTS_Required/metamap.shtml
>>>
>>> (You would need to be logged into UTS for the link to work I think...)
>>>
>>> Anyway, once at that site on the right side there are some links for
>>> using MetaMap interactively. Below is an example of what that looks like
>>> (where the first line is my input and the rest is the output). I turned on
>>> the option to show CUIs, since I think that is your desire output...
>>>
>>> About the bz2 file, I think you'd need to uncompress that with bunzip2,
>>> although I have not done a local install for a while so I am not 100
>>> percent sure if that is the issue or not. But, I've cc'd the MetaMap help
>>> line on this note, they are usually very good about following up on issues
>>> like this.
>>>
>>> I hope this helps!
>>> Ted
>>>
>>> Processing 00000000.tx.1: I have a really bad headache, and my joints ache.
>>>
>>> Phrase: I
>>> >>>>> Phrase
>>> i
>>> <<<<< Phrase
>>> >>>>> Mappings
>>> Meta Mapping (1000):
>>>   1000   C0021966:I- (Iodides) [Inorganic Chemical]
>>> Meta Mapping (1000):
>>>   1000   C0221138:I NOS (Blood group antibody I) [Amino Acid, Peptide, or 
>>> Protein,Immunologic Factor]
>>> <<<<< Mappings
>>>
>>> Phrase: have
>>> >>>>> Phrase
>>> <<<<< Phrase
>>>
>>> Phrase: a really bad headache,
>>> >>>>> Phrase
>>> really bad headache
>>> <<<<< Phrase
>>> >>>>> Mappings
>>> Meta Mapping (790):
>>>    660   C0205169:Bad [Qualitative Concept]
>>>    827   C0018681:HEADACHE (Headache) [Sign or Symptom]
>>> <<<<< Mappings
>>>
>>> Phrase: and
>>> >>>>> Phrase
>>> <<<<< Phrase
>>>
>>> Phrase: my joints
>>> >>>>> Phrase
>>> joints
>>> <<<<< Phrase
>>> >>>>> Mappings
>>> Meta Mapping (1000):
>>>   1000   C0022417:Joints [Body Space or Junction]
>>> Meta Mapping (1000):
>>>   1000   C0392905:Joints (Articular system) [Body System]
>>> <<<<< Mappings
>>>
>>> Phrase: ache.
>>> >>>>> Phrase
>>> ache
>>> <<<<< Phrase
>>> >>>>> Mappings
>>> Meta Mapping (1000):
>>>   1000   C0234238:ACHE (Ache) [Sign or Symptom]
>>> <<<<< Mappings
>>>
>>>
>>>
>>> On Mon, Jun 5, 2017 at 12:25 PM, Jennifer Wilson [email protected]
>>> [umls-similarity] <[email protected]> wrote:
>>>
>>>>
>>>>
>>>> Hey Ted,
>>>>
>>>> I'm (embarrassingly) having some trouble navigating the NLM site. I
>>>> think I have an account and am trying to download some of the MetaMap
>>>> software (I think that the "Lite" version is sufficient). But when I
>>>> download the bz2 file, it won't open because I think I need to authenticate
>>>> it. Do you know how I'm supposed to access this software? Sorry if this is
>>>> out of your realm, I can try someone else at NLM. This has just been a lot
>>>> more difficult and confusing than I thought it should be! Thanks,
>>>>
>>>> On Fri, Jun 2, 2017 at 7:07 PM, Ted Pedersen [email protected]
>>>> [umls-similarity] <[email protected]> wrote:
>>>>
>>>>>
>>>>>
>>>>> Hi Jennifer,
>>>>>
>>>>> Mapping terms to CUIs is it's own problem, and there are a few nice
>>>>> tools already available that might be of some use. We've used MetaMap to
>>>>> good effect for this problem, so you might  want to consider looking 
>>>>> there.
>>>>>
>>>>> https://metamap.nlm.nih.gov/
>>>>>
>>>>> I'd be curious if other users have recommendations as well..
>>>>>
>>>>> Good luck,
>>>>> Ted
>>>>>
>>>>> On Fri, Jun 2, 2017 at 7:56 PM, Jennifer Wilson
>>>>> [email protected] [umls-similarity] <
>>>>> [email protected]> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> Hi Ted,
>>>>>>
>>>>>> Thank you again for all of this. I'm sorry I had to put down this
>>>>>> project for a few days and am only now getting back to it.
>>>>>>
>>>>>> I see that ontologies change and reproducing that result might not be
>>>>>> the best sanity check on the scripts that I wrote.
>>>>>>
>>>>>> I'm going to try and figure out how to map to CUI terms and I'll be
>>>>>> in touch if I get stuck again. Thanks,
>>>>>>
>>>>>> On Sun, May 28, 2017 at 10:59 AM, Ted Pedersen [email protected]
>>>>>> [umls-similarity] <[email protected]> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> This is perhaps a bit more than you were looking for, but there are
>>>>>>> quite a few command line tools available with UMLS::Similarity when you
>>>>>>> install locally that can be helpful for digging into situations like 
>>>>>>> this.
>>>>>>> When I look for the path from each of these CUIs to the ROOT (of MSH) I
>>>>>>> find that one of them does not have a path to the root, while the other
>>>>>>> does (see command output below)
>>>>>>>
>>>>>>> The lack of a path to  the root is going to cause a lot of measures
>>>>>>> to report a -1 value (since path, for example, relies on finding this 
>>>>>>> path
>>>>>>> as a part of its computation). In fact, not having a path to the root 
>>>>>>> makes
>>>>>>> me question if C0156543 is in MSH at all, so it might even be that the 
>>>>>>> CUI
>>>>>>> is no longer a part of MSH (and not just lacking a path to the root). 
>>>>>>> But,
>>>>>>> regardless, clearly something has changed since 2009 that is causing 
>>>>>>> this
>>>>>>> measure to return a different value. This happens in some cases since 
>>>>>>> UMLS
>>>>>>> continues to evolve and CUIs are added, removed, etc. It's important to
>>>>>>> know what version of the UMLS a previous study has used if you are
>>>>>>> interested in getting a very exact comparison. In the case of our AMIA 
>>>>>>> 2009
>>>>>>> paper we used 2008AB, so things have no doubt changed a bit since then.
>>>>>>>
>>>>>>> tpederse@maraca:~$ findPathToRoot.pl C0156543
>>>>>>>
>>>>>>> UMLS-Interface Configuration Information:
>>>>>>> (Default Information - no config file)
>>>>>>>
>>>>>>>   Sources (SAB):
>>>>>>>      MSH
>>>>>>>   Relations (REL):
>>>>>>>      PAR
>>>>>>>      CHD
>>>>>>>
>>>>>>>   Sources (SABDEF):
>>>>>>>      UMLS_ALL
>>>>>>>   Relations (RELDEF):
>>>>>>>      UMLS_ALL
>>>>>>>
>>>>>>>
>>>>>>> There are no paths from the given C0156543 to the root.
>>>>>>> tpederse@maraca:~$ findPathToRoot.pl C0000786
>>>>>>>
>>>>>>>
>>>>>>> UMLS-Interface Configuration Information:
>>>>>>> (Default Information - no config file)
>>>>>>>
>>>>>>>   Sources (SAB):
>>>>>>>      MSH
>>>>>>>   Relations (REL):
>>>>>>>      PAR
>>>>>>>      CHD
>>>>>>>
>>>>>>>   Sources (SABDEF):
>>>>>>>      UMLS_ALL
>>>>>>>   Relations (RELDEF):
>>>>>>>      UMLS_ALL
>>>>>>>
>>>>>>>
>>>>>>> The paths between abortions, spontaneous (C0000786) and the root:
>>>>>>>   => C0000000 (**UMLS ROOT**) C1135584 (mesh headings) C1256739
>>>>>>> (mesh descriptors) C1256741 (topical descriptor) C0012674 (diseases 
>>>>>>> (mesh
>>>>>>> category)) C1720765 (female urogenital dis pregnancy compl) C0032962 
>>>>>>> (compl
>>>>>>> pregn) C0000786 (abortions, spontaneous)
>>>>>>>
>>>>>>>
>>>>>>> On Sun, May 28, 2017 at 12:43 PM, Ted Pedersen <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Jennifer,
>>>>>>>>
>>>>>>>> Thanks for sharing this question. I think in general if you have a
>>>>>>>> choice between using CUIs or terms with UMLS::Similarity, your best 
>>>>>>>> option
>>>>>>>> is to use the CUIs. Terms can map to multiple CUIs, and 
>>>>>>>> UMLS::Similarity
>>>>>>>> might pick a CUI associated with a sense of the term you aren't 
>>>>>>>> intending.
>>>>>>>> Also, if you misspell a term or don't specify it exactly correctly, 
>>>>>>>> then it
>>>>>>>> shows up as not found. One useful resource for replicating similarity
>>>>>>>> measure studies (like the one you cite) is the following page which
>>>>>>>> includes term mappings for several of the datasets we've worked with 
>>>>>>>> over
>>>>>>>> the years.
>>>>>>>>
>>>>>>>> http://www-users.cs.umn.edu/~bthomson/corpus/corpus.html
>>>>>>>>
>>>>>>>> I will admit to being a little puzzled about the case of abortion -
>>>>>>>> miscarriage. The paper you cite clearly reports a value based on MSH, 
>>>>>>>> but
>>>>>>>> as I try to run that query now I get a value of -1 (even when using the
>>>>>>>> CUIs). However, it appears that each of the CUIs is found in MSH, but 
>>>>>>>> that
>>>>>>>> somehow we are not able to compute some of the measures (a path 
>>>>>>>> length, for
>>>>>>>> example). This suggests that there is not a path between the two CUIs,
>>>>>>>> which has something to do with the structure of UMLS/MSH.
>>>>>>>>
>>>>>>>> One quick and dirty way to see if a CUI is in MSH is to find the
>>>>>>>> path length between a CUI and itself. If it is present in MSH, that 
>>>>>>>> value
>>>>>>>> will be 1. We see that for each of the CUIs used for abortion and
>>>>>>>> miscarriage.
>>>>>>>>
>>>>>>>> tpederse@maraca:~$ perl query-umls-similarity-webinterface.pl
>>>>>>>> --measure path --sab MSH C0156543 C0156543
>>>>>>>> Default Settings:
>>>>>>>>   --default http://atlas.ahc.umn.edu/
>>>>>>>>   --rel PAR/CHD
>>>>>>>> User Settings:
>>>>>>>>   --measure path
>>>>>>>>
>>>>>>>> 1<>Unspecified abortion NOS(C0156543)<>Unspecified abortion
>>>>>>>> NOS(C0156543)
>>>>>>>>
>>>>>>>> tpederse@maraca:~$ perl query-umls-similarity-webinterface.pl
>>>>>>>> --measure path --sab MSH C0000786 C0000786
>>>>>>>> Default Settings:
>>>>>>>>   --default http://atlas.ahc.umn.edu/
>>>>>>>>   --rel PAR/CHD
>>>>>>>> User Settings:
>>>>>>>>   --measure path
>>>>>>>>
>>>>>>>> 1<>Abortions.spontaneous(C0000786)<>Abortions.spontaneous(C0000786)
>>>>>>>>
>>>>>>>> However, when I try to find the path length between the two CUIs, I
>>>>>>>> get -1. This suggests that the CUIs are not jointed by PAR/CHD
>>>>>>>> relations...note that below you can see that the terms for the CUIs 
>>>>>>>> have
>>>>>>>> been looked up, which shows us that MSH knows about them...
>>>>>>>>
>>>>>>>> tpederse@maraca:~$ perl query-umls-similarity-webinterface.pl
>>>>>>>> --measure path --sab MSH C0156543 C0000786
>>>>>>>> Default Settings:
>>>>>>>>   --default http://atlas.ahc.umn.edu/
>>>>>>>>   --rel PAR/CHD
>>>>>>>> User Settings:
>>>>>>>>   --measure path
>>>>>>>>
>>>>>>>> -1<>Unspecified abortion NOS(C0156543)<>Abortions.spont
>>>>>>>> aneous(C0000786)
>>>>>>>>
>>>>>>>> So, in any case, it would appear that something has changed in the
>>>>>>>> structure of MSH since we reported our results in the 2009 AMIA paper 
>>>>>>>> you
>>>>>>>> mention. I'm not sure what that is. But, I think the general message is
>>>>>>>> that if you can use CUIs it will normally be more reliable to do that.
>>>>>>>> Mapping terms to CUIs is of course it's own problem, but 
>>>>>>>> UMLS::Similarity
>>>>>>>> doesn't do anything terribly fancy with that, and so probably whatever 
>>>>>>>> you
>>>>>>>> do will be more extensive and reliable than what UMLS::Similarity would
>>>>>>>> do...
>>>>>>>>
>>>>>>>> I hope this helps somehow, and please do feel free to follow up.
>>>>>>>> Thoughts from other users on this issue would also be most welcome!
>>>>>>>>
>>>>>>>> Cordially,
>>>>>>>> Ted
>>>>>>>>
>>>>>>>> On Sat, May 27, 2017 at 12:18 PM, Jennifer Wilson
>>>>>>>> [email protected] [umls-similarity] <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> I'm resending this now that I'm subscribed. Any advice would be
>>>>>>>>> much appreciated! Thank you,
>>>>>>>>>
>>>>>>>>> ---------- Forwarded message ----------
>>>>>>>>> From: Jennifer Wilson <[email protected]>
>>>>>>>>> Date: Tue, May 23, 2017 at 6:13 PM
>>>>>>>>> Subject: Help with the best approach for using the query-UMLS
>>>>>>>>> interface
>>>>>>>>> To: [email protected]
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hello UMLS similarity team,
>>>>>>>>>
>>>>>>>>> I am trying to compute the similarity between ~30K
>>>>>>>>> disease/phenotype terms. Ideally, I would have a matrix of similarity 
>>>>>>>>> for
>>>>>>>>> these terms.
>>>>>>>>>
>>>>>>>>> My first attempt was to write a python script to call the
>>>>>>>>> query-umls-similarity-webinterface.pl script. Though, before
>>>>>>>>> releasing the script on my dataset, I was trying to recreate the 
>>>>>>>>> scores
>>>>>>>>> from this paper (https://www.ncbi.nlm.nih.gov/
>>>>>>>>> pmc/articles/PMC2815481/) in table 1.
>>>>>>>>>
>>>>>>>>> Here's the command I am using:
>>>>>>>>>
>>>>>>>>> ./query-umls-similarity-webinterface.pl --sab MSH --rel PAR/CHD
>>>>>>>>> "Abortion" "Miscarriage"
>>>>>>>>>
>>>>>>>>> Default Settings:
>>>>>>>>>
>>>>>>>>>   --default http://atlas.ahc.umn.edu/
>>>>>>>>>
>>>>>>>>>   --measure path
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> User Settings:
>>>>>>>>>
>>>>>>>>>   --rel PAR/CHD
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> (-1.0, 'Abortion', 'Miscarriage')
>>>>>>>>>
>>>>>>>>> I also have not processed the text in my dataset much. I have
>>>>>>>>> basically pulled diseases and phenotypes from DisGeNet, OMIN, PheWas, 
>>>>>>>>> and
>>>>>>>>> the GWAS catalogue. If I'm using data from all of these sources - do 
>>>>>>>>> you
>>>>>>>>> recommend sending them directly to the query interface? Should I try 
>>>>>>>>> and
>>>>>>>>> map to CUI terms? (examples below)
>>>>>>>>>
>>>>>>>>> Before I download the database and attempt to query the database
>>>>>>>>> (it's not a language that I use in my current work), I just wanted an
>>>>>>>>> outside perspective to see if there are best practices for using this 
>>>>>>>>> data.
>>>>>>>>> Thank you in advance for your time!
>>>>>>>>>
>>>>>>>>> (examples)
>>>>>>>>> Here are two more examples showing the disease descriptions in my
>>>>>>>>> dataset. Is the UMLS interface robust to these various formats or do 
>>>>>>>>> they
>>>>>>>>> need to be an exact match?
>>>>>>>>>
>>>>>>>>> ./query-umls-similarity-webinterface.pl --sab MSH --rel PAR/CHD
>>>>>>>>> "Testicular Neoplasms" "Amelogenesis imperfecta local hypoplastic 
>>>>>>>>> form"
>>>>>>>>>
>>>>>>>>> Default Settings:
>>>>>>>>>
>>>>>>>>>   --default http://atlas.ahc.umn.edu/
>>>>>>>>>
>>>>>>>>>   --measure path
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> User Settings:
>>>>>>>>>
>>>>>>>>>   --rel PAR/CHD
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> (-1.0, 'Testicular Neoplasms', 'Amelogenesis imperfecta local
>>>>>>>>> hypoplastic form')
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ./query-umls-similarity-webinterface.pl --sab MSH --rel PAR/CHD
>>>>>>>>> "Hypotrichosis 2, 146520 (3)" "PERIODONTITIS, LOCALIZED AGGRESSIVE"
>>>>>>>>>
>>>>>>>>> Default Settings:
>>>>>>>>>
>>>>>>>>>   --default http://atlas.ahc.umn.edu/
>>>>>>>>>
>>>>>>>>>   --measure path
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> User Settings:
>>>>>>>>>
>>>>>>>>>   --rel PAR/CHD
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> (-1.0, 'Hypotrichosis 2, 146520 (3)', 'PERIODONTITIS, LOCALIZED
>>>>>>>>> AGGRESSIVE')
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Jennifer L. Wilson
>>>>>>>>> Bioengineering, Stanford University
>>>>>>>>> [email protected] / 703.969.3318 <(703)%20969-3318>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Jennifer L. Wilson
>>>>>>>>> Bioengineering, Stanford University
>>>>>>>>> [email protected] / 703.969.3318 <(703)%20969-3318>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Jennifer L. Wilson
>>>>>> Bioengineering, Stanford University
>>>>>> [email protected] / 703.969.3318 <(703)%20969-3318>
>>>>>> --
>>>>>> Jennifer L. Wilson
>>>>>> Bioengineering, Stanford University
>>>>>> [email protected] / 703.969.3318 <(703)%20969-3318>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Jennifer L. Wilson
>>>> Bioengineering, Stanford University
>>>> [email protected] / 703.969.3318 <(703)%20969-3318>
>>>>
>>>>
>>>
>>
>>
>> --
>> Jennifer L. Wilson
>> Bioengineering, Stanford University
>> [email protected] / 703.969.3318 <(703)%20969-3318>
>>
>
>
>
> --
> Jennifer L. Wilson
> Bioengineering, Stanford University
> [email protected] / 703.969.3318 <(703)%20969-3318>
>
> 
>

Reply via email to