This is perhaps a bit more than you were looking for, but there are quite a
few command line tools available with UMLS::Similarity when you install
locally that can be helpful for digging into situations like this. When I
look for the path from each of these CUIs to the ROOT (of MSH) I find that
one of them does not have a path to the root, while the other does (see
command output below)
The lack of a path to the root is going to cause a lot of measures to
report a -1 value (since path, for example, relies on finding this path as
a part of its computation). In fact, not having a path to the root makes me
question if C0156543 is in MSH at all, so it might even be that the CUI is
no longer a part of MSH (and not just lacking a path to the root). But,
regardless, clearly something has changed since 2009 that is causing this
measure to return a different value. This happens in some cases since UMLS
continues to evolve and CUIs are added, removed, etc. It's important to
know what version of the UMLS a previous study has used if you are
interested in getting a very exact comparison. In the case of our AMIA 2009
paper we used 2008AB, so things have no doubt changed a bit since then.
tpederse@maraca:~$ findPathToRoot.pl C0156543
UMLS-Interface Configuration Information:
(Default Information - no config file)
Sources (SAB):
MSH
Relations (REL):
PAR
CHD
Sources (SABDEF):
UMLS_ALL
Relations (RELDEF):
UMLS_ALL
There are no paths from the given C0156543 to the root.
tpederse@maraca:~$ findPathToRoot.pl C0000786
UMLS-Interface Configuration Information:
(Default Information - no config file)
Sources (SAB):
MSH
Relations (REL):
PAR
CHD
Sources (SABDEF):
UMLS_ALL
Relations (RELDEF):
UMLS_ALL
The paths between abortions, spontaneous (C0000786) and the root:
=> C0000000 (**UMLS ROOT**) C1135584 (mesh headings) C1256739 (mesh
descriptors) C1256741 (topical descriptor) C0012674 (diseases (mesh
category)) C1720765 (female urogenital dis pregnancy compl) C0032962 (compl
pregn) C0000786 (abortions, spontaneous)
On Sun, May 28, 2017 at 12:43 PM, Ted Pedersen <[email protected]> wrote:
> Hi Jennifer,
>
> Thanks for sharing this question. I think in general if you have a choice
> between using CUIs or terms with UMLS::Similarity, your best option is to
> use the CUIs. Terms can map to multiple CUIs, and UMLS::Similarity might
> pick a CUI associated with a sense of the term you aren't intending. Also,
> if you misspell a term or don't specify it exactly correctly, then it shows
> up as not found. One useful resource for replicating similarity measure
> studies (like the one you cite) is the following page which includes term
> mappings for several of the datasets we've worked with over the years.
>
> http://www-users.cs.umn.edu/~bthomson/corpus/corpus.html
>
> I will admit to being a little puzzled about the case of abortion -
> miscarriage. The paper you cite clearly reports a value based on MSH, but
> as I try to run that query now I get a value of -1 (even when using the
> CUIs). However, it appears that each of the CUIs is found in MSH, but that
> somehow we are not able to compute some of the measures (a path length, for
> example). This suggests that there is not a path between the two CUIs,
> which has something to do with the structure of UMLS/MSH.
>
> One quick and dirty way to see if a CUI is in MSH is to find the path
> length between a CUI and itself. If it is present in MSH, that value will
> be 1. We see that for each of the CUIs used for abortion and miscarriage.
>
> tpederse@maraca:~$ perl query-umls-similarity-webinterface.pl --measure
> path --sab MSH C0156543 C0156543
> Default Settings:
> --default http://atlas.ahc.umn.edu/
> --rel PAR/CHD
> User Settings:
> --measure path
>
> 1<>Unspecified abortion NOS(C0156543)<>Unspecified abortion NOS(C0156543)
>
> tpederse@maraca:~$ perl query-umls-similarity-webinterface.pl --measure
> path --sab MSH C0000786 C0000786
> Default Settings:
> --default http://atlas.ahc.umn.edu/
> --rel PAR/CHD
> User Settings:
> --measure path
>
> 1<>Abortions.spontaneous(C0000786)<>Abortions.spontaneous(C0000786)
>
> However, when I try to find the path length between the two CUIs, I get
> -1. This suggests that the CUIs are not jointed by PAR/CHD relations...note
> that below you can see that the terms for the CUIs have been looked up,
> which shows us that MSH knows about them...
>
> tpederse@maraca:~$ perl query-umls-similarity-webinterface.pl --measure
> path --sab MSH C0156543 C0000786
> Default Settings:
> --default http://atlas.ahc.umn.edu/
> --rel PAR/CHD
> User Settings:
> --measure path
>
> -1<>Unspecified abortion NOS(C0156543)<>Abortions.spontaneous(C0000786)
>
> So, in any case, it would appear that something has changed in the
> structure of MSH since we reported our results in the 2009 AMIA paper you
> mention. I'm not sure what that is. But, I think the general message is
> that if you can use CUIs it will normally be more reliable to do that.
> Mapping terms to CUIs is of course it's own problem, but UMLS::Similarity
> doesn't do anything terribly fancy with that, and so probably whatever you
> do will be more extensive and reliable than what UMLS::Similarity would
> do...
>
> I hope this helps somehow, and please do feel free to follow up. Thoughts
> from other users on this issue would also be most welcome!
>
> Cordially,
> Ted
>
> On Sat, May 27, 2017 at 12:18 PM, Jennifer Wilson [email protected]
> [umls-similarity] <[email protected]> wrote:
>
>>
>>
>> Hi all,
>>
>> I'm resending this now that I'm subscribed. Any advice would be much
>> appreciated! Thank you,
>>
>> ---------- Forwarded message ----------
>> From: Jennifer Wilson <[email protected]>
>> Date: Tue, May 23, 2017 at 6:13 PM
>> Subject: Help with the best approach for using the query-UMLS interface
>> To: [email protected]
>>
>>
>> Hello UMLS similarity team,
>>
>> I am trying to compute the similarity between ~30K disease/phenotype
>> terms. Ideally, I would have a matrix of similarity for these terms.
>>
>> My first attempt was to write a python script to call the
>> query-umls-similarity-webinterface.pl script. Though, before releasing
>> the script on my dataset, I was trying to recreate the scores from this
>> paper (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2815481/) in table 1.
>>
>> Here's the command I am using:
>>
>> ./query-umls-similarity-webinterface.pl --sab MSH --rel PAR/CHD
>> "Abortion" "Miscarriage"
>>
>> Default Settings:
>>
>> --default http://atlas.ahc.umn.edu/
>>
>> --measure path
>>
>>
>> User Settings:
>>
>> --rel PAR/CHD
>>
>>
>> (-1.0, 'Abortion', 'Miscarriage')
>>
>> I also have not processed the text in my dataset much. I have basically
>> pulled diseases and phenotypes from DisGeNet, OMIN, PheWas, and the GWAS
>> catalogue. If I'm using data from all of these sources - do you recommend
>> sending them directly to the query interface? Should I try and map to CUI
>> terms? (examples below)
>>
>> Before I download the database and attempt to query the database (it's
>> not a language that I use in my current work), I just wanted an outside
>> perspective to see if there are best practices for using this data. Thank
>> you in advance for your time!
>>
>> (examples)
>> Here are two more examples showing the disease descriptions in my
>> dataset. Is the UMLS interface robust to these various formats or do they
>> need to be an exact match?
>>
>> ./query-umls-similarity-webinterface.pl --sab MSH --rel PAR/CHD
>> "Testicular Neoplasms" "Amelogenesis imperfecta local hypoplastic form"
>>
>> Default Settings:
>>
>> --default http://atlas.ahc.umn.edu/
>>
>> --measure path
>>
>>
>> User Settings:
>>
>> --rel PAR/CHD
>>
>>
>> (-1.0, 'Testicular Neoplasms', 'Amelogenesis imperfecta local hypoplastic
>> form')
>>
>>
>>
>> ./query-umls-similarity-webinterface.pl --sab MSH --rel PAR/CHD
>> "Hypotrichosis 2, 146520 (3)" "PERIODONTITIS, LOCALIZED AGGRESSIVE"
>>
>> Default Settings:
>>
>> --default http://atlas.ahc.umn.edu/
>>
>> --measure path
>>
>>
>> User Settings:
>>
>> --rel PAR/CHD
>>
>>
>> (-1.0, 'Hypotrichosis 2, 146520 (3)', 'PERIODONTITIS, LOCALIZED
>> AGGRESSIVE')
>>
>>
>>
>> --
>> Jennifer L. Wilson
>> Bioengineering, Stanford University
>> [email protected] / 703.969.3318 <(703)%20969-3318>
>>
>>
>>
>> --
>> Jennifer L. Wilson
>> Bioengineering, Stanford University
>> [email protected] / 703.969.3318 <(703)%20969-3318>
>>
>>
>>
>
>