Hey Ted, I'm (embarrassingly) having some trouble navigating the NLM site. I think I have an account and am trying to download some of the MetaMap software (I think that the "Lite" version is sufficient). But when I download the bz2 file, it won't open because I think I need to authenticate it. Do you know how I'm supposed to access this software? Sorry if this is out of your realm, I can try someone else at NLM. This has just been a lot more difficult and confusing than I thought it should be! Thanks,
On Fri, Jun 2, 2017 at 7:07 PM, Ted Pedersen [email protected] [umls-similarity] <[email protected]> wrote: > > > Hi Jennifer, > > Mapping terms to CUIs is it's own problem, and there are a few nice tools > already available that might be of some use. We've used MetaMap to good > effect for this problem, so you might want to consider looking there. > > https://metamap.nlm.nih.gov/ > > I'd be curious if other users have recommendations as well.. > > Good luck, > Ted > > On Fri, Jun 2, 2017 at 7:56 PM, Jennifer Wilson [email protected] > [umls-similarity] <[email protected]> wrote: > >> >> >> Hi Ted, >> >> Thank you again for all of this. I'm sorry I had to put down this project >> for a few days and am only now getting back to it. >> >> I see that ontologies change and reproducing that result might not be the >> best sanity check on the scripts that I wrote. >> >> I'm going to try and figure out how to map to CUI terms and I'll be in >> touch if I get stuck again. Thanks, >> >> On Sun, May 28, 2017 at 10:59 AM, Ted Pedersen [email protected] >> [umls-similarity] <[email protected]> wrote: >> >>> >>> >>> This is perhaps a bit more than you were looking for, but there are >>> quite a few command line tools available with UMLS::Similarity when you >>> install locally that can be helpful for digging into situations like this. >>> When I look for the path from each of these CUIs to the ROOT (of MSH) I >>> find that one of them does not have a path to the root, while the other >>> does (see command output below) >>> >>> The lack of a path to the root is going to cause a lot of measures to >>> report a -1 value (since path, for example, relies on finding this path as >>> a part of its computation). In fact, not having a path to the root makes me >>> question if C0156543 is in MSH at all, so it might even be that the CUI is >>> no longer a part of MSH (and not just lacking a path to the root). But, >>> regardless, clearly something has changed since 2009 that is causing this >>> measure to return a different value. This happens in some cases since UMLS >>> continues to evolve and CUIs are added, removed, etc. It's important to >>> know what version of the UMLS a previous study has used if you are >>> interested in getting a very exact comparison. In the case of our AMIA 2009 >>> paper we used 2008AB, so things have no doubt changed a bit since then. >>> >>> tpederse@maraca:~$ findPathToRoot.pl C0156543 >>> >>> UMLS-Interface Configuration Information: >>> (Default Information - no config file) >>> >>> Sources (SAB): >>> MSH >>> Relations (REL): >>> PAR >>> CHD >>> >>> Sources (SABDEF): >>> UMLS_ALL >>> Relations (RELDEF): >>> UMLS_ALL >>> >>> >>> There are no paths from the given C0156543 to the root. >>> tpederse@maraca:~$ findPathToRoot.pl C0000786 >>> >>> >>> UMLS-Interface Configuration Information: >>> (Default Information - no config file) >>> >>> Sources (SAB): >>> MSH >>> Relations (REL): >>> PAR >>> CHD >>> >>> Sources (SABDEF): >>> UMLS_ALL >>> Relations (RELDEF): >>> UMLS_ALL >>> >>> >>> The paths between abortions, spontaneous (C0000786) and the root: >>> => C0000000 (**UMLS ROOT**) C1135584 (mesh headings) C1256739 (mesh >>> descriptors) C1256741 (topical descriptor) C0012674 (diseases (mesh >>> category)) C1720765 (female urogenital dis pregnancy compl) C0032962 (compl >>> pregn) C0000786 (abortions, spontaneous) >>> >>> >>> On Sun, May 28, 2017 at 12:43 PM, Ted Pedersen <[email protected]> >>> wrote: >>> >>>> Hi Jennifer, >>>> >>>> Thanks for sharing this question. I think in general if you have a >>>> choice between using CUIs or terms with UMLS::Similarity, your best option >>>> is to use the CUIs. Terms can map to multiple CUIs, and UMLS::Similarity >>>> might pick a CUI associated with a sense of the term you aren't intending. >>>> Also, if you misspell a term or don't specify it exactly correctly, then it >>>> shows up as not found. One useful resource for replicating similarity >>>> measure studies (like the one you cite) is the following page which >>>> includes term mappings for several of the datasets we've worked with over >>>> the years. >>>> >>>> http://www-users.cs.umn.edu/~bthomson/corpus/corpus.html >>>> >>>> I will admit to being a little puzzled about the case of abortion - >>>> miscarriage. The paper you cite clearly reports a value based on MSH, but >>>> as I try to run that query now I get a value of -1 (even when using the >>>> CUIs). However, it appears that each of the CUIs is found in MSH, but that >>>> somehow we are not able to compute some of the measures (a path length, for >>>> example). This suggests that there is not a path between the two CUIs, >>>> which has something to do with the structure of UMLS/MSH. >>>> >>>> One quick and dirty way to see if a CUI is in MSH is to find the path >>>> length between a CUI and itself. If it is present in MSH, that value will >>>> be 1. We see that for each of the CUIs used for abortion and miscarriage. >>>> >>>> tpederse@maraca:~$ perl query-umls-similarity-webinterface.pl >>>> --measure path --sab MSH C0156543 C0156543 >>>> Default Settings: >>>> --default http://atlas.ahc.umn.edu/ >>>> --rel PAR/CHD >>>> User Settings: >>>> --measure path >>>> >>>> 1<>Unspecified abortion NOS(C0156543)<>Unspecified abortion >>>> NOS(C0156543) >>>> >>>> tpederse@maraca:~$ perl query-umls-similarity-webinterface.pl >>>> --measure path --sab MSH C0000786 C0000786 >>>> Default Settings: >>>> --default http://atlas.ahc.umn.edu/ >>>> --rel PAR/CHD >>>> User Settings: >>>> --measure path >>>> >>>> 1<>Abortions.spontaneous(C0000786)<>Abortions.spontaneous(C0000786) >>>> >>>> However, when I try to find the path length between the two CUIs, I get >>>> -1. This suggests that the CUIs are not jointed by PAR/CHD relations...note >>>> that below you can see that the terms for the CUIs have been looked up, >>>> which shows us that MSH knows about them... >>>> >>>> tpederse@maraca:~$ perl query-umls-similarity-webinterface.pl >>>> --measure path --sab MSH C0156543 C0000786 >>>> Default Settings: >>>> --default http://atlas.ahc.umn.edu/ >>>> --rel PAR/CHD >>>> User Settings: >>>> --measure path >>>> >>>> -1<>Unspecified abortion NOS(C0156543)<>Abortions.spontaneous(C0000786) >>>> >>>> So, in any case, it would appear that something has changed in the >>>> structure of MSH since we reported our results in the 2009 AMIA paper you >>>> mention. I'm not sure what that is. But, I think the general message is >>>> that if you can use CUIs it will normally be more reliable to do that. >>>> Mapping terms to CUIs is of course it's own problem, but UMLS::Similarity >>>> doesn't do anything terribly fancy with that, and so probably whatever you >>>> do will be more extensive and reliable than what UMLS::Similarity would >>>> do... >>>> >>>> I hope this helps somehow, and please do feel free to follow up. >>>> Thoughts from other users on this issue would also be most welcome! >>>> >>>> Cordially, >>>> Ted >>>> >>>> On Sat, May 27, 2017 at 12:18 PM, Jennifer Wilson >>>> [email protected] [umls-similarity] < >>>> [email protected]> wrote: >>>> >>>>> >>>>> >>>>> Hi all, >>>>> >>>>> I'm resending this now that I'm subscribed. Any advice would be much >>>>> appreciated! Thank you, >>>>> >>>>> ---------- Forwarded message ---------- >>>>> From: Jennifer Wilson <[email protected]> >>>>> Date: Tue, May 23, 2017 at 6:13 PM >>>>> Subject: Help with the best approach for using the query-UMLS interface >>>>> To: [email protected] >>>>> >>>>> >>>>> Hello UMLS similarity team, >>>>> >>>>> I am trying to compute the similarity between ~30K disease/phenotype >>>>> terms. Ideally, I would have a matrix of similarity for these terms. >>>>> >>>>> My first attempt was to write a python script to call the >>>>> query-umls-similarity-webinterface.pl script. Though, before >>>>> releasing the script on my dataset, I was trying to recreate the scores >>>>> from this paper (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2815481/) >>>>> in table 1. >>>>> >>>>> Here's the command I am using: >>>>> >>>>> ./query-umls-similarity-webinterface.pl --sab MSH --rel PAR/CHD >>>>> "Abortion" "Miscarriage" >>>>> >>>>> Default Settings: >>>>> >>>>> --default http://atlas.ahc.umn.edu/ >>>>> >>>>> --measure path >>>>> >>>>> >>>>> User Settings: >>>>> >>>>> --rel PAR/CHD >>>>> >>>>> >>>>> (-1.0, 'Abortion', 'Miscarriage') >>>>> >>>>> I also have not processed the text in my dataset much. I have >>>>> basically pulled diseases and phenotypes from DisGeNet, OMIN, PheWas, and >>>>> the GWAS catalogue. If I'm using data from all of these sources - do you >>>>> recommend sending them directly to the query interface? Should I try and >>>>> map to CUI terms? (examples below) >>>>> >>>>> Before I download the database and attempt to query the database (it's >>>>> not a language that I use in my current work), I just wanted an outside >>>>> perspective to see if there are best practices for using this data. Thank >>>>> you in advance for your time! >>>>> >>>>> (examples) >>>>> Here are two more examples showing the disease descriptions in my >>>>> dataset. Is the UMLS interface robust to these various formats or do they >>>>> need to be an exact match? >>>>> >>>>> ./query-umls-similarity-webinterface.pl --sab MSH --rel PAR/CHD >>>>> "Testicular Neoplasms" "Amelogenesis imperfecta local hypoplastic form" >>>>> >>>>> Default Settings: >>>>> >>>>> --default http://atlas.ahc.umn.edu/ >>>>> >>>>> --measure path >>>>> >>>>> >>>>> User Settings: >>>>> >>>>> --rel PAR/CHD >>>>> >>>>> >>>>> (-1.0, 'Testicular Neoplasms', 'Amelogenesis imperfecta local >>>>> hypoplastic form') >>>>> >>>>> >>>>> >>>>> ./query-umls-similarity-webinterface.pl --sab MSH --rel PAR/CHD >>>>> "Hypotrichosis 2, 146520 (3)" "PERIODONTITIS, LOCALIZED AGGRESSIVE" >>>>> >>>>> Default Settings: >>>>> >>>>> --default http://atlas.ahc.umn.edu/ >>>>> >>>>> --measure path >>>>> >>>>> >>>>> User Settings: >>>>> >>>>> --rel PAR/CHD >>>>> >>>>> >>>>> (-1.0, 'Hypotrichosis 2, 146520 (3)', 'PERIODONTITIS, LOCALIZED >>>>> AGGRESSIVE') >>>>> >>>>> >>>>> >>>>> -- >>>>> Jennifer L. Wilson >>>>> Bioengineering, Stanford University >>>>> [email protected] / 703.969.3318 <(703)%20969-3318> >>>>> >>>>> >>>>> >>>>> -- >>>>> Jennifer L. Wilson >>>>> Bioengineering, Stanford University >>>>> [email protected] / 703.969.3318 <(703)%20969-3318> >>>>> >>>>> >>>> >>> >> >> >> -- >> Jennifer L. Wilson >> Bioengineering, Stanford University >> [email protected] / 703.969.3318 <(703)%20969-3318> >> -- >> Jennifer L. Wilson >> Bioengineering, Stanford University >> [email protected] / 703.969.3318 <(703)%20969-3318> >> >> > > -- Jennifer L. Wilson Bioengineering, Stanford University [email protected] / 703.969.3318
