Hey Ted,

I'm (embarrassingly) having some trouble navigating the NLM site. I think I
have an account and am trying to download some of the MetaMap software (I
think that the "Lite" version is sufficient). But when I download the bz2
file, it won't open because I think I need to authenticate it. Do you know
how I'm supposed to access this software? Sorry if this is out of your
realm, I can try someone else at NLM. This has just been a lot more
difficult and confusing than I thought it should be! Thanks,

On Fri, Jun 2, 2017 at 7:07 PM, Ted Pedersen [email protected]
[umls-similarity] <[email protected]> wrote:

>
>
> Hi Jennifer,
>
> Mapping terms to CUIs is it's own problem, and there are a few nice tools
> already available that might be of some use. We've used MetaMap to good
> effect for this problem, so you might  want to consider looking there.
>
> https://metamap.nlm.nih.gov/
>
> I'd be curious if other users have recommendations as well..
>
> Good luck,
> Ted
>
> On Fri, Jun 2, 2017 at 7:56 PM, Jennifer Wilson [email protected]
> [umls-similarity] <[email protected]> wrote:
>
>>
>>
>> Hi Ted,
>>
>> Thank you again for all of this. I'm sorry I had to put down this project
>> for a few days and am only now getting back to it.
>>
>> I see that ontologies change and reproducing that result might not be the
>> best sanity check on the scripts that I wrote.
>>
>> I'm going to try and figure out how to map to CUI terms and I'll be in
>> touch if I get stuck again. Thanks,
>>
>> On Sun, May 28, 2017 at 10:59 AM, Ted Pedersen [email protected]
>> [umls-similarity] <[email protected]> wrote:
>>
>>>
>>>
>>> This is perhaps a bit more than you were looking for, but there are
>>> quite a few command line tools available with UMLS::Similarity when you
>>> install locally that can be helpful for digging into situations like this.
>>> When I look for the path from each of these CUIs to the ROOT (of MSH) I
>>> find that one of them does not have a path to the root, while the other
>>> does (see command output below)
>>>
>>> The lack of a path to  the root is going to cause a lot of measures to
>>> report a -1 value (since path, for example, relies on finding this path as
>>> a part of its computation). In fact, not having a path to the root makes me
>>> question if C0156543 is in MSH at all, so it might even be that the CUI is
>>> no longer a part of MSH (and not just lacking a path to the root). But,
>>> regardless, clearly something has changed since 2009 that is causing this
>>> measure to return a different value. This happens in some cases since UMLS
>>> continues to evolve and CUIs are added, removed, etc. It's important to
>>> know what version of the UMLS a previous study has used if you are
>>> interested in getting a very exact comparison. In the case of our AMIA 2009
>>> paper we used 2008AB, so things have no doubt changed a bit since then.
>>>
>>> tpederse@maraca:~$ findPathToRoot.pl C0156543
>>>
>>> UMLS-Interface Configuration Information:
>>> (Default Information - no config file)
>>>
>>>   Sources (SAB):
>>>      MSH
>>>   Relations (REL):
>>>      PAR
>>>      CHD
>>>
>>>   Sources (SABDEF):
>>>      UMLS_ALL
>>>   Relations (RELDEF):
>>>      UMLS_ALL
>>>
>>>
>>> There are no paths from the given C0156543 to the root.
>>> tpederse@maraca:~$ findPathToRoot.pl C0000786
>>>
>>>
>>> UMLS-Interface Configuration Information:
>>> (Default Information - no config file)
>>>
>>>   Sources (SAB):
>>>      MSH
>>>   Relations (REL):
>>>      PAR
>>>      CHD
>>>
>>>   Sources (SABDEF):
>>>      UMLS_ALL
>>>   Relations (RELDEF):
>>>      UMLS_ALL
>>>
>>>
>>> The paths between abortions, spontaneous (C0000786) and the root:
>>>   => C0000000 (**UMLS ROOT**) C1135584 (mesh headings) C1256739 (mesh
>>> descriptors) C1256741 (topical descriptor) C0012674 (diseases (mesh
>>> category)) C1720765 (female urogenital dis pregnancy compl) C0032962 (compl
>>> pregn) C0000786 (abortions, spontaneous)
>>>
>>>
>>> On Sun, May 28, 2017 at 12:43 PM, Ted Pedersen <[email protected]>
>>> wrote:
>>>
>>>> Hi Jennifer,
>>>>
>>>> Thanks for sharing this question. I think in general if you have a
>>>> choice between using CUIs or terms with UMLS::Similarity, your best option
>>>> is to use the CUIs. Terms can map to multiple CUIs, and UMLS::Similarity
>>>> might pick a CUI associated with a sense of the term you aren't intending.
>>>> Also, if you misspell a term or don't specify it exactly correctly, then it
>>>> shows up as not found. One useful resource for replicating similarity
>>>> measure studies (like the one you cite) is the following page which
>>>> includes term mappings for several of the datasets we've worked with over
>>>> the years.
>>>>
>>>> http://www-users.cs.umn.edu/~bthomson/corpus/corpus.html
>>>>
>>>> I will admit to being a little puzzled about the case of abortion -
>>>> miscarriage. The paper you cite clearly reports a value based on MSH, but
>>>> as I try to run that query now I get a value of -1 (even when using the
>>>> CUIs). However, it appears that each of the CUIs is found in MSH, but that
>>>> somehow we are not able to compute some of the measures (a path length, for
>>>> example). This suggests that there is not a path between the two CUIs,
>>>> which has something to do with the structure of UMLS/MSH.
>>>>
>>>> One quick and dirty way to see if a CUI is in MSH is to find the path
>>>> length between a CUI and itself. If it is present in MSH, that value will
>>>> be 1. We see that for each of the CUIs used for abortion and miscarriage.
>>>>
>>>> tpederse@maraca:~$ perl query-umls-similarity-webinterface.pl
>>>> --measure path --sab MSH C0156543 C0156543
>>>> Default Settings:
>>>>   --default http://atlas.ahc.umn.edu/
>>>>   --rel PAR/CHD
>>>> User Settings:
>>>>   --measure path
>>>>
>>>> 1<>Unspecified abortion NOS(C0156543)<>Unspecified abortion
>>>> NOS(C0156543)
>>>>
>>>> tpederse@maraca:~$ perl query-umls-similarity-webinterface.pl
>>>> --measure path --sab MSH C0000786 C0000786
>>>> Default Settings:
>>>>   --default http://atlas.ahc.umn.edu/
>>>>   --rel PAR/CHD
>>>> User Settings:
>>>>   --measure path
>>>>
>>>> 1<>Abortions.spontaneous(C0000786)<>Abortions.spontaneous(C0000786)
>>>>
>>>> However, when I try to find the path length between the two CUIs, I get
>>>> -1. This suggests that the CUIs are not jointed by PAR/CHD relations...note
>>>> that below you can see that the terms for the CUIs have been looked up,
>>>> which shows us that MSH knows about them...
>>>>
>>>> tpederse@maraca:~$ perl query-umls-similarity-webinterface.pl
>>>> --measure path --sab MSH C0156543 C0000786
>>>> Default Settings:
>>>>   --default http://atlas.ahc.umn.edu/
>>>>   --rel PAR/CHD
>>>> User Settings:
>>>>   --measure path
>>>>
>>>> -1<>Unspecified abortion NOS(C0156543)<>Abortions.spontaneous(C0000786)
>>>>
>>>> So, in any case, it would appear that something has changed in the
>>>> structure of MSH since we reported our results in the 2009 AMIA paper you
>>>> mention. I'm not sure what that is. But, I think the general message is
>>>> that if you can use CUIs it will normally be more reliable to do that.
>>>> Mapping terms to CUIs is of course it's own problem, but UMLS::Similarity
>>>> doesn't do anything terribly fancy with that, and so probably whatever you
>>>> do will be more extensive and reliable than what UMLS::Similarity would
>>>> do...
>>>>
>>>> I hope this helps somehow, and please do feel free to follow up.
>>>> Thoughts from other users on this issue would also be most welcome!
>>>>
>>>> Cordially,
>>>> Ted
>>>>
>>>> On Sat, May 27, 2017 at 12:18 PM, Jennifer Wilson
>>>> [email protected] [umls-similarity] <
>>>> [email protected]> wrote:
>>>>
>>>>>
>>>>>
>>>>> Hi all,
>>>>>
>>>>> I'm resending this now that I'm subscribed. Any advice would be much
>>>>> appreciated! Thank you,
>>>>>
>>>>> ---------- Forwarded message ----------
>>>>> From: Jennifer Wilson <[email protected]>
>>>>> Date: Tue, May 23, 2017 at 6:13 PM
>>>>> Subject: Help with the best approach for using the query-UMLS interface
>>>>> To: [email protected]
>>>>>
>>>>>
>>>>> Hello UMLS similarity team,
>>>>>
>>>>> I am trying to compute the similarity between ~30K disease/phenotype
>>>>> terms. Ideally, I would have a matrix of similarity for these terms.
>>>>>
>>>>> My first attempt was to write a python script to call the
>>>>> query-umls-similarity-webinterface.pl script. Though, before
>>>>> releasing the script on my dataset, I was trying to recreate the scores
>>>>> from this paper (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2815481/)
>>>>> in table 1.
>>>>>
>>>>> Here's the command I am using:
>>>>>
>>>>> ./query-umls-similarity-webinterface.pl --sab MSH --rel PAR/CHD
>>>>> "Abortion" "Miscarriage"
>>>>>
>>>>> Default Settings:
>>>>>
>>>>>   --default http://atlas.ahc.umn.edu/
>>>>>
>>>>>   --measure path
>>>>>
>>>>>
>>>>> User Settings:
>>>>>
>>>>>   --rel PAR/CHD
>>>>>
>>>>>
>>>>> (-1.0, 'Abortion', 'Miscarriage')
>>>>>
>>>>> I also have not processed the text in my dataset much. I have
>>>>> basically pulled diseases and phenotypes from DisGeNet, OMIN, PheWas, and
>>>>> the GWAS catalogue. If I'm using data from all of these sources - do you
>>>>> recommend sending them directly to the query interface? Should I try and
>>>>> map to CUI terms? (examples below)
>>>>>
>>>>> Before I download the database and attempt to query the database (it's
>>>>> not a language that I use in my current work), I just wanted an outside
>>>>> perspective to see if there are best practices for using this data. Thank
>>>>> you in advance for your time!
>>>>>
>>>>> (examples)
>>>>> Here are two more examples showing the disease descriptions in my
>>>>> dataset. Is the UMLS interface robust to these various formats or do they
>>>>> need to be an exact match?
>>>>>
>>>>> ./query-umls-similarity-webinterface.pl --sab MSH --rel PAR/CHD
>>>>> "Testicular Neoplasms" "Amelogenesis imperfecta local hypoplastic form"
>>>>>
>>>>> Default Settings:
>>>>>
>>>>>   --default http://atlas.ahc.umn.edu/
>>>>>
>>>>>   --measure path
>>>>>
>>>>>
>>>>> User Settings:
>>>>>
>>>>>   --rel PAR/CHD
>>>>>
>>>>>
>>>>> (-1.0, 'Testicular Neoplasms', 'Amelogenesis imperfecta local
>>>>> hypoplastic form')
>>>>>
>>>>>
>>>>>
>>>>> ./query-umls-similarity-webinterface.pl --sab MSH --rel PAR/CHD
>>>>> "Hypotrichosis 2, 146520 (3)" "PERIODONTITIS, LOCALIZED AGGRESSIVE"
>>>>>
>>>>> Default Settings:
>>>>>
>>>>>   --default http://atlas.ahc.umn.edu/
>>>>>
>>>>>   --measure path
>>>>>
>>>>>
>>>>> User Settings:
>>>>>
>>>>>   --rel PAR/CHD
>>>>>
>>>>>
>>>>> (-1.0, 'Hypotrichosis 2, 146520 (3)', 'PERIODONTITIS, LOCALIZED
>>>>> AGGRESSIVE')
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Jennifer L. Wilson
>>>>> Bioengineering, Stanford University
>>>>> [email protected] / 703.969.3318 <(703)%20969-3318>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Jennifer L. Wilson
>>>>> Bioengineering, Stanford University
>>>>> [email protected] / 703.969.3318 <(703)%20969-3318>
>>>>>
>>>>>
>>>>
>>>
>>
>>
>> --
>> Jennifer L. Wilson
>> Bioengineering, Stanford University
>> [email protected] / 703.969.3318 <(703)%20969-3318>
>> --
>> Jennifer L. Wilson
>> Bioengineering, Stanford University
>> [email protected] / 703.969.3318 <(703)%20969-3318>
>>
>>
> 
>



-- 
Jennifer L. Wilson
Bioengineering, Stanford University
[email protected] / 703.969.3318

Reply via email to