Hi Jen, I looked at those particular CUIs and don't think they are in MSH or SNOMEDCT - that's why you are getting the -1 even though one would imagine there is some similarity between them. To find some other examples using Alzheimer's I used UTS Metathesaurus to look up CUIs in MSH that included the term Alzheimer's (and 9 were found in MSH).
I took 2 of those and ran them with path and got -1, indicating no path found. However, when I used lesk or vector I found non-zero values. Lesk and vector are both based on comparing the definitions of two CUIs and do not rely on finding paths. tpederse@maraca:~$ perl query-umls-similarity-webinterface.pl C0002395 C0299337 --measure vector --sab MSH Default Settings: --default http://atlas.ahc.umn.edu/ --rel CUI/PAR/CHD/RB/RN User Settings: --measure vector 0.3131<>Disease, Alzheimer's(C0002395)<>familial Alzheimer's disease protein 1(C0299337) tpederse@maraca:~$ perl query-umls-similarity-webinterface.pl C0002395 C0299337 --measure lesk --sab MSH Default Settings: --default http://atlas.ahc.umn.edu/ --rel CUI/PAR/CHD/RB/RN User Settings: --measure lesk 19<>Disease, Alzheimer's(C0002395)<>familial Alzheimer's disease protein 1(C0299337) So, the tricky part is sometimes the coverage in different sources - two CUIs might be intuitively similar but simply not found in the source being used (or not path between them may exist) so will show a -1 value. I'm not sure this exactly answers your question, but I will think a little more and add what I can... More soon, Ted On Mon, Jun 5, 2017 at 5:41 PM, Jennifer Wilson [email protected] [umls-similarity] <[email protected]> wrote: > > > Hey Ted, > > So I haven't quite figured out the MetaMap, but I have a set of diseases > that I mapped to CUIs another way. I'm still getting negative results with > diseases that I think should be "similar". For example: > > ./query-umls-similarity-webinterface.pl --sab MSH --rel PAR/CHD > "C1864828" "C3810041" > > Default Settings: > > --default http://atlas.ahc.umn.edu/ > > --measure path > > > User Settings: > > --rel PAR/CHD > > > ["b'-1", 'ALZHEIMER DISEASE 10(C1864828)', "ALZHEIMER DISEASE > 18(C3810041)\\n'"] > > You can see my results on the last row. Could you advise- Would you expect > that these two CUIs would not be similar? I wanted to measure path as a > simple starting point, but could you recommend that another distance might > be more informative? Thanks again for your help! > > On Mon, Jun 5, 2017 at 1:43 PM, Jennifer Wilson <[email protected]> > wrote: > >> Hey Ted, >> >> Thanks for all of the help. I found the interactive interface really >> helpful and had been able to create inputs similar to what you shared. I >> have an open help ticket now on trying to get the file to download. He gave >> me some commands to try that I had already tried, so there must be >> something else to unzipping the code... >> >> Thanks again. Hopefully I'm close to a solution! >> >> On Mon, Jun 5, 2017 at 11:21 AM, Ted Pedersen [email protected] >> [umls-similarity] <[email protected]> wrote: >> >>> >>> >>> Hi Jen, >>> >>> Nothing to be embarrassed about at all!. If you haven't already used >>> MetaMap interactively you might want to try that before you attempt a local >>> install : >>> >>> https://ii.nlm.nih.gov/Interactive/UTS_Required/metamap.shtml >>> >>> (You would need to be logged into UTS for the link to work I think...) >>> >>> Anyway, once at that site on the right side there are some links for >>> using MetaMap interactively. Below is an example of what that looks like >>> (where the first line is my input and the rest is the output). I turned on >>> the option to show CUIs, since I think that is your desire output... >>> >>> About the bz2 file, I think you'd need to uncompress that with bunzip2, >>> although I have not done a local install for a while so I am not 100 >>> percent sure if that is the issue or not. But, I've cc'd the MetaMap help >>> line on this note, they are usually very good about following up on issues >>> like this. >>> >>> I hope this helps! >>> Ted >>> >>> Processing 00000000.tx.1: I have a really bad headache, and my joints ache. >>> >>> Phrase: I >>> >>>>> Phrase >>> i >>> <<<<< Phrase >>> >>>>> Mappings >>> Meta Mapping (1000): >>> 1000 C0021966:I- (Iodides) [Inorganic Chemical] >>> Meta Mapping (1000): >>> 1000 C0221138:I NOS (Blood group antibody I) [Amino Acid, Peptide, or >>> Protein,Immunologic Factor] >>> <<<<< Mappings >>> >>> Phrase: have >>> >>>>> Phrase >>> <<<<< Phrase >>> >>> Phrase: a really bad headache, >>> >>>>> Phrase >>> really bad headache >>> <<<<< Phrase >>> >>>>> Mappings >>> Meta Mapping (790): >>> 660 C0205169:Bad [Qualitative Concept] >>> 827 C0018681:HEADACHE (Headache) [Sign or Symptom] >>> <<<<< Mappings >>> >>> Phrase: and >>> >>>>> Phrase >>> <<<<< Phrase >>> >>> Phrase: my joints >>> >>>>> Phrase >>> joints >>> <<<<< Phrase >>> >>>>> Mappings >>> Meta Mapping (1000): >>> 1000 C0022417:Joints [Body Space or Junction] >>> Meta Mapping (1000): >>> 1000 C0392905:Joints (Articular system) [Body System] >>> <<<<< Mappings >>> >>> Phrase: ache. >>> >>>>> Phrase >>> ache >>> <<<<< Phrase >>> >>>>> Mappings >>> Meta Mapping (1000): >>> 1000 C0234238:ACHE (Ache) [Sign or Symptom] >>> <<<<< Mappings >>> >>> >>> >>> On Mon, Jun 5, 2017 at 12:25 PM, Jennifer Wilson [email protected] >>> [umls-similarity] <[email protected]> wrote: >>> >>>> >>>> >>>> Hey Ted, >>>> >>>> I'm (embarrassingly) having some trouble navigating the NLM site. I >>>> think I have an account and am trying to download some of the MetaMap >>>> software (I think that the "Lite" version is sufficient). But when I >>>> download the bz2 file, it won't open because I think I need to authenticate >>>> it. Do you know how I'm supposed to access this software? Sorry if this is >>>> out of your realm, I can try someone else at NLM. This has just been a lot >>>> more difficult and confusing than I thought it should be! Thanks, >>>> >>>> On Fri, Jun 2, 2017 at 7:07 PM, Ted Pedersen [email protected] >>>> [umls-similarity] <[email protected]> wrote: >>>> >>>>> >>>>> >>>>> Hi Jennifer, >>>>> >>>>> Mapping terms to CUIs is it's own problem, and there are a few nice >>>>> tools already available that might be of some use. We've used MetaMap to >>>>> good effect for this problem, so you might want to consider looking >>>>> there. >>>>> >>>>> https://metamap.nlm.nih.gov/ >>>>> >>>>> I'd be curious if other users have recommendations as well.. >>>>> >>>>> Good luck, >>>>> Ted >>>>> >>>>> On Fri, Jun 2, 2017 at 7:56 PM, Jennifer Wilson >>>>> [email protected] [umls-similarity] < >>>>> [email protected]> wrote: >>>>> >>>>>> >>>>>> >>>>>> Hi Ted, >>>>>> >>>>>> Thank you again for all of this. I'm sorry I had to put down this >>>>>> project for a few days and am only now getting back to it. >>>>>> >>>>>> I see that ontologies change and reproducing that result might not be >>>>>> the best sanity check on the scripts that I wrote. >>>>>> >>>>>> I'm going to try and figure out how to map to CUI terms and I'll be >>>>>> in touch if I get stuck again. Thanks, >>>>>> >>>>>> On Sun, May 28, 2017 at 10:59 AM, Ted Pedersen [email protected] >>>>>> [umls-similarity] <[email protected]> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> This is perhaps a bit more than you were looking for, but there are >>>>>>> quite a few command line tools available with UMLS::Similarity when you >>>>>>> install locally that can be helpful for digging into situations like >>>>>>> this. >>>>>>> When I look for the path from each of these CUIs to the ROOT (of MSH) I >>>>>>> find that one of them does not have a path to the root, while the other >>>>>>> does (see command output below) >>>>>>> >>>>>>> The lack of a path to the root is going to cause a lot of measures >>>>>>> to report a -1 value (since path, for example, relies on finding this >>>>>>> path >>>>>>> as a part of its computation). In fact, not having a path to the root >>>>>>> makes >>>>>>> me question if C0156543 is in MSH at all, so it might even be that the >>>>>>> CUI >>>>>>> is no longer a part of MSH (and not just lacking a path to the root). >>>>>>> But, >>>>>>> regardless, clearly something has changed since 2009 that is causing >>>>>>> this >>>>>>> measure to return a different value. This happens in some cases since >>>>>>> UMLS >>>>>>> continues to evolve and CUIs are added, removed, etc. It's important to >>>>>>> know what version of the UMLS a previous study has used if you are >>>>>>> interested in getting a very exact comparison. In the case of our AMIA >>>>>>> 2009 >>>>>>> paper we used 2008AB, so things have no doubt changed a bit since then. >>>>>>> >>>>>>> tpederse@maraca:~$ findPathToRoot.pl C0156543 >>>>>>> >>>>>>> UMLS-Interface Configuration Information: >>>>>>> (Default Information - no config file) >>>>>>> >>>>>>> Sources (SAB): >>>>>>> MSH >>>>>>> Relations (REL): >>>>>>> PAR >>>>>>> CHD >>>>>>> >>>>>>> Sources (SABDEF): >>>>>>> UMLS_ALL >>>>>>> Relations (RELDEF): >>>>>>> UMLS_ALL >>>>>>> >>>>>>> >>>>>>> There are no paths from the given C0156543 to the root. >>>>>>> tpederse@maraca:~$ findPathToRoot.pl C0000786 >>>>>>> >>>>>>> >>>>>>> UMLS-Interface Configuration Information: >>>>>>> (Default Information - no config file) >>>>>>> >>>>>>> Sources (SAB): >>>>>>> MSH >>>>>>> Relations (REL): >>>>>>> PAR >>>>>>> CHD >>>>>>> >>>>>>> Sources (SABDEF): >>>>>>> UMLS_ALL >>>>>>> Relations (RELDEF): >>>>>>> UMLS_ALL >>>>>>> >>>>>>> >>>>>>> The paths between abortions, spontaneous (C0000786) and the root: >>>>>>> => C0000000 (**UMLS ROOT**) C1135584 (mesh headings) C1256739 >>>>>>> (mesh descriptors) C1256741 (topical descriptor) C0012674 (diseases >>>>>>> (mesh >>>>>>> category)) C1720765 (female urogenital dis pregnancy compl) C0032962 >>>>>>> (compl >>>>>>> pregn) C0000786 (abortions, spontaneous) >>>>>>> >>>>>>> >>>>>>> On Sun, May 28, 2017 at 12:43 PM, Ted Pedersen <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Jennifer, >>>>>>>> >>>>>>>> Thanks for sharing this question. I think in general if you have a >>>>>>>> choice between using CUIs or terms with UMLS::Similarity, your best >>>>>>>> option >>>>>>>> is to use the CUIs. Terms can map to multiple CUIs, and >>>>>>>> UMLS::Similarity >>>>>>>> might pick a CUI associated with a sense of the term you aren't >>>>>>>> intending. >>>>>>>> Also, if you misspell a term or don't specify it exactly correctly, >>>>>>>> then it >>>>>>>> shows up as not found. One useful resource for replicating similarity >>>>>>>> measure studies (like the one you cite) is the following page which >>>>>>>> includes term mappings for several of the datasets we've worked with >>>>>>>> over >>>>>>>> the years. >>>>>>>> >>>>>>>> http://www-users.cs.umn.edu/~bthomson/corpus/corpus.html >>>>>>>> >>>>>>>> I will admit to being a little puzzled about the case of abortion - >>>>>>>> miscarriage. The paper you cite clearly reports a value based on MSH, >>>>>>>> but >>>>>>>> as I try to run that query now I get a value of -1 (even when using the >>>>>>>> CUIs). However, it appears that each of the CUIs is found in MSH, but >>>>>>>> that >>>>>>>> somehow we are not able to compute some of the measures (a path >>>>>>>> length, for >>>>>>>> example). This suggests that there is not a path between the two CUIs, >>>>>>>> which has something to do with the structure of UMLS/MSH. >>>>>>>> >>>>>>>> One quick and dirty way to see if a CUI is in MSH is to find the >>>>>>>> path length between a CUI and itself. If it is present in MSH, that >>>>>>>> value >>>>>>>> will be 1. We see that for each of the CUIs used for abortion and >>>>>>>> miscarriage. >>>>>>>> >>>>>>>> tpederse@maraca:~$ perl query-umls-similarity-webinterface.pl >>>>>>>> --measure path --sab MSH C0156543 C0156543 >>>>>>>> Default Settings: >>>>>>>> --default http://atlas.ahc.umn.edu/ >>>>>>>> --rel PAR/CHD >>>>>>>> User Settings: >>>>>>>> --measure path >>>>>>>> >>>>>>>> 1<>Unspecified abortion NOS(C0156543)<>Unspecified abortion >>>>>>>> NOS(C0156543) >>>>>>>> >>>>>>>> tpederse@maraca:~$ perl query-umls-similarity-webinterface.pl >>>>>>>> --measure path --sab MSH C0000786 C0000786 >>>>>>>> Default Settings: >>>>>>>> --default http://atlas.ahc.umn.edu/ >>>>>>>> --rel PAR/CHD >>>>>>>> User Settings: >>>>>>>> --measure path >>>>>>>> >>>>>>>> 1<>Abortions.spontaneous(C0000786)<>Abortions.spontaneous(C0000786) >>>>>>>> >>>>>>>> However, when I try to find the path length between the two CUIs, I >>>>>>>> get -1. This suggests that the CUIs are not jointed by PAR/CHD >>>>>>>> relations...note that below you can see that the terms for the CUIs >>>>>>>> have >>>>>>>> been looked up, which shows us that MSH knows about them... >>>>>>>> >>>>>>>> tpederse@maraca:~$ perl query-umls-similarity-webinterface.pl >>>>>>>> --measure path --sab MSH C0156543 C0000786 >>>>>>>> Default Settings: >>>>>>>> --default http://atlas.ahc.umn.edu/ >>>>>>>> --rel PAR/CHD >>>>>>>> User Settings: >>>>>>>> --measure path >>>>>>>> >>>>>>>> -1<>Unspecified abortion NOS(C0156543)<>Abortions.spont >>>>>>>> aneous(C0000786) >>>>>>>> >>>>>>>> So, in any case, it would appear that something has changed in the >>>>>>>> structure of MSH since we reported our results in the 2009 AMIA paper >>>>>>>> you >>>>>>>> mention. I'm not sure what that is. But, I think the general message is >>>>>>>> that if you can use CUIs it will normally be more reliable to do that. >>>>>>>> Mapping terms to CUIs is of course it's own problem, but >>>>>>>> UMLS::Similarity >>>>>>>> doesn't do anything terribly fancy with that, and so probably whatever >>>>>>>> you >>>>>>>> do will be more extensive and reliable than what UMLS::Similarity would >>>>>>>> do... >>>>>>>> >>>>>>>> I hope this helps somehow, and please do feel free to follow up. >>>>>>>> Thoughts from other users on this issue would also be most welcome! >>>>>>>> >>>>>>>> Cordially, >>>>>>>> Ted >>>>>>>> >>>>>>>> On Sat, May 27, 2017 at 12:18 PM, Jennifer Wilson >>>>>>>> [email protected] [umls-similarity] < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> I'm resending this now that I'm subscribed. Any advice would be >>>>>>>>> much appreciated! Thank you, >>>>>>>>> >>>>>>>>> ---------- Forwarded message ---------- >>>>>>>>> From: Jennifer Wilson <[email protected]> >>>>>>>>> Date: Tue, May 23, 2017 at 6:13 PM >>>>>>>>> Subject: Help with the best approach for using the query-UMLS >>>>>>>>> interface >>>>>>>>> To: [email protected] >>>>>>>>> >>>>>>>>> >>>>>>>>> Hello UMLS similarity team, >>>>>>>>> >>>>>>>>> I am trying to compute the similarity between ~30K >>>>>>>>> disease/phenotype terms. Ideally, I would have a matrix of similarity >>>>>>>>> for >>>>>>>>> these terms. >>>>>>>>> >>>>>>>>> My first attempt was to write a python script to call the >>>>>>>>> query-umls-similarity-webinterface.pl script. Though, before >>>>>>>>> releasing the script on my dataset, I was trying to recreate the >>>>>>>>> scores >>>>>>>>> from this paper (https://www.ncbi.nlm.nih.gov/ >>>>>>>>> pmc/articles/PMC2815481/) in table 1. >>>>>>>>> >>>>>>>>> Here's the command I am using: >>>>>>>>> >>>>>>>>> ./query-umls-similarity-webinterface.pl --sab MSH --rel PAR/CHD >>>>>>>>> "Abortion" "Miscarriage" >>>>>>>>> >>>>>>>>> Default Settings: >>>>>>>>> >>>>>>>>> --default http://atlas.ahc.umn.edu/ >>>>>>>>> >>>>>>>>> --measure path >>>>>>>>> >>>>>>>>> >>>>>>>>> User Settings: >>>>>>>>> >>>>>>>>> --rel PAR/CHD >>>>>>>>> >>>>>>>>> >>>>>>>>> (-1.0, 'Abortion', 'Miscarriage') >>>>>>>>> >>>>>>>>> I also have not processed the text in my dataset much. I have >>>>>>>>> basically pulled diseases and phenotypes from DisGeNet, OMIN, PheWas, >>>>>>>>> and >>>>>>>>> the GWAS catalogue. If I'm using data from all of these sources - do >>>>>>>>> you >>>>>>>>> recommend sending them directly to the query interface? Should I try >>>>>>>>> and >>>>>>>>> map to CUI terms? (examples below) >>>>>>>>> >>>>>>>>> Before I download the database and attempt to query the database >>>>>>>>> (it's not a language that I use in my current work), I just wanted an >>>>>>>>> outside perspective to see if there are best practices for using this >>>>>>>>> data. >>>>>>>>> Thank you in advance for your time! >>>>>>>>> >>>>>>>>> (examples) >>>>>>>>> Here are two more examples showing the disease descriptions in my >>>>>>>>> dataset. Is the UMLS interface robust to these various formats or do >>>>>>>>> they >>>>>>>>> need to be an exact match? >>>>>>>>> >>>>>>>>> ./query-umls-similarity-webinterface.pl --sab MSH --rel PAR/CHD >>>>>>>>> "Testicular Neoplasms" "Amelogenesis imperfecta local hypoplastic >>>>>>>>> form" >>>>>>>>> >>>>>>>>> Default Settings: >>>>>>>>> >>>>>>>>> --default http://atlas.ahc.umn.edu/ >>>>>>>>> >>>>>>>>> --measure path >>>>>>>>> >>>>>>>>> >>>>>>>>> User Settings: >>>>>>>>> >>>>>>>>> --rel PAR/CHD >>>>>>>>> >>>>>>>>> >>>>>>>>> (-1.0, 'Testicular Neoplasms', 'Amelogenesis imperfecta local >>>>>>>>> hypoplastic form') >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> ./query-umls-similarity-webinterface.pl --sab MSH --rel PAR/CHD >>>>>>>>> "Hypotrichosis 2, 146520 (3)" "PERIODONTITIS, LOCALIZED AGGRESSIVE" >>>>>>>>> >>>>>>>>> Default Settings: >>>>>>>>> >>>>>>>>> --default http://atlas.ahc.umn.edu/ >>>>>>>>> >>>>>>>>> --measure path >>>>>>>>> >>>>>>>>> >>>>>>>>> User Settings: >>>>>>>>> >>>>>>>>> --rel PAR/CHD >>>>>>>>> >>>>>>>>> >>>>>>>>> (-1.0, 'Hypotrichosis 2, 146520 (3)', 'PERIODONTITIS, LOCALIZED >>>>>>>>> AGGRESSIVE') >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Jennifer L. Wilson >>>>>>>>> Bioengineering, Stanford University >>>>>>>>> [email protected] / 703.969.3318 <(703)%20969-3318> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Jennifer L. Wilson >>>>>>>>> Bioengineering, Stanford University >>>>>>>>> [email protected] / 703.969.3318 <(703)%20969-3318> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Jennifer L. Wilson >>>>>> Bioengineering, Stanford University >>>>>> [email protected] / 703.969.3318 <(703)%20969-3318> >>>>>> -- >>>>>> Jennifer L. Wilson >>>>>> Bioengineering, Stanford University >>>>>> [email protected] / 703.969.3318 <(703)%20969-3318> >>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> Jennifer L. Wilson >>>> Bioengineering, Stanford University >>>> [email protected] / 703.969.3318 <(703)%20969-3318> >>>> >>>> >>> >> >> >> -- >> Jennifer L. Wilson >> Bioengineering, Stanford University >> [email protected] / 703.969.3318 <(703)%20969-3318> >> > > > > -- > Jennifer L. Wilson > Bioengineering, Stanford University > [email protected] / 703.969.3318 <(703)%20969-3318> > > >
