Hi Jen, Nothing to be embarrassed about at all!. If you haven't already used MetaMap interactively you might want to try that before you attempt a local install :
https://ii.nlm.nih.gov/Interactive/UTS_Required/metamap.shtml (You would need to be logged into UTS for the link to work I think...) Anyway, once at that site on the right side there are some links for using MetaMap interactively. Below is an example of what that looks like (where the first line is my input and the rest is the output). I turned on the option to show CUIs, since I think that is your desire output... About the bz2 file, I think you'd need to uncompress that with bunzip2, although I have not done a local install for a while so I am not 100 percent sure if that is the issue or not. But, I've cc'd the MetaMap help line on this note, they are usually very good about following up on issues like this. I hope this helps! Ted Processing 00000000.tx.1: I have a really bad headache, and my joints ache. Phrase: I >>>>> Phrase i <<<<< Phrase >>>>> Mappings Meta Mapping (1000): 1000 C0021966:I- (Iodides) [Inorganic Chemical] Meta Mapping (1000): 1000 C0221138:I NOS (Blood group antibody I) [Amino Acid, Peptide, or Protein,Immunologic Factor] <<<<< Mappings Phrase: have >>>>> Phrase <<<<< Phrase Phrase: a really bad headache, >>>>> Phrase really bad headache <<<<< Phrase >>>>> Mappings Meta Mapping (790): 660 C0205169:Bad [Qualitative Concept] 827 C0018681:HEADACHE (Headache) [Sign or Symptom] <<<<< Mappings Phrase: and >>>>> Phrase <<<<< Phrase Phrase: my joints >>>>> Phrase joints <<<<< Phrase >>>>> Mappings Meta Mapping (1000): 1000 C0022417:Joints [Body Space or Junction] Meta Mapping (1000): 1000 C0392905:Joints (Articular system) [Body System] <<<<< Mappings Phrase: ache. >>>>> Phrase ache <<<<< Phrase >>>>> Mappings Meta Mapping (1000): 1000 C0234238:ACHE (Ache) [Sign or Symptom] <<<<< Mappings On Mon, Jun 5, 2017 at 12:25 PM, Jennifer Wilson [email protected] [umls-similarity] <[email protected]> wrote: > > > Hey Ted, > > I'm (embarrassingly) having some trouble navigating the NLM site. I think > I have an account and am trying to download some of the MetaMap software (I > think that the "Lite" version is sufficient). But when I download the bz2 > file, it won't open because I think I need to authenticate it. Do you know > how I'm supposed to access this software? Sorry if this is out of your > realm, I can try someone else at NLM. This has just been a lot more > difficult and confusing than I thought it should be! Thanks, > > On Fri, Jun 2, 2017 at 7:07 PM, Ted Pedersen [email protected] > [umls-similarity] <[email protected]> wrote: > >> >> >> Hi Jennifer, >> >> Mapping terms to CUIs is it's own problem, and there are a few nice tools >> already available that might be of some use. We've used MetaMap to good >> effect for this problem, so you might want to consider looking there. >> >> https://metamap.nlm.nih.gov/ >> >> I'd be curious if other users have recommendations as well.. >> >> Good luck, >> Ted >> >> On Fri, Jun 2, 2017 at 7:56 PM, Jennifer Wilson [email protected] >> [umls-similarity] <[email protected]> wrote: >> >>> >>> >>> Hi Ted, >>> >>> Thank you again for all of this. I'm sorry I had to put down this >>> project for a few days and am only now getting back to it. >>> >>> I see that ontologies change and reproducing that result might not be >>> the best sanity check on the scripts that I wrote. >>> >>> I'm going to try and figure out how to map to CUI terms and I'll be in >>> touch if I get stuck again. Thanks, >>> >>> On Sun, May 28, 2017 at 10:59 AM, Ted Pedersen [email protected] >>> [umls-similarity] <[email protected]> wrote: >>> >>>> >>>> >>>> This is perhaps a bit more than you were looking for, but there are >>>> quite a few command line tools available with UMLS::Similarity when you >>>> install locally that can be helpful for digging into situations like this. >>>> When I look for the path from each of these CUIs to the ROOT (of MSH) I >>>> find that one of them does not have a path to the root, while the other >>>> does (see command output below) >>>> >>>> The lack of a path to the root is going to cause a lot of measures to >>>> report a -1 value (since path, for example, relies on finding this path as >>>> a part of its computation). In fact, not having a path to the root makes me >>>> question if C0156543 is in MSH at all, so it might even be that the CUI is >>>> no longer a part of MSH (and not just lacking a path to the root). But, >>>> regardless, clearly something has changed since 2009 that is causing this >>>> measure to return a different value. This happens in some cases since UMLS >>>> continues to evolve and CUIs are added, removed, etc. It's important to >>>> know what version of the UMLS a previous study has used if you are >>>> interested in getting a very exact comparison. In the case of our AMIA 2009 >>>> paper we used 2008AB, so things have no doubt changed a bit since then. >>>> >>>> tpederse@maraca:~$ findPathToRoot.pl C0156543 >>>> >>>> UMLS-Interface Configuration Information: >>>> (Default Information - no config file) >>>> >>>> Sources (SAB): >>>> MSH >>>> Relations (REL): >>>> PAR >>>> CHD >>>> >>>> Sources (SABDEF): >>>> UMLS_ALL >>>> Relations (RELDEF): >>>> UMLS_ALL >>>> >>>> >>>> There are no paths from the given C0156543 to the root. >>>> tpederse@maraca:~$ findPathToRoot.pl C0000786 >>>> >>>> >>>> UMLS-Interface Configuration Information: >>>> (Default Information - no config file) >>>> >>>> Sources (SAB): >>>> MSH >>>> Relations (REL): >>>> PAR >>>> CHD >>>> >>>> Sources (SABDEF): >>>> UMLS_ALL >>>> Relations (RELDEF): >>>> UMLS_ALL >>>> >>>> >>>> The paths between abortions, spontaneous (C0000786) and the root: >>>> => C0000000 (**UMLS ROOT**) C1135584 (mesh headings) C1256739 (mesh >>>> descriptors) C1256741 (topical descriptor) C0012674 (diseases (mesh >>>> category)) C1720765 (female urogenital dis pregnancy compl) C0032962 (compl >>>> pregn) C0000786 (abortions, spontaneous) >>>> >>>> >>>> On Sun, May 28, 2017 at 12:43 PM, Ted Pedersen <[email protected]> >>>> wrote: >>>> >>>>> Hi Jennifer, >>>>> >>>>> Thanks for sharing this question. I think in general if you have a >>>>> choice between using CUIs or terms with UMLS::Similarity, your best option >>>>> is to use the CUIs. Terms can map to multiple CUIs, and UMLS::Similarity >>>>> might pick a CUI associated with a sense of the term you aren't intending. >>>>> Also, if you misspell a term or don't specify it exactly correctly, then >>>>> it >>>>> shows up as not found. One useful resource for replicating similarity >>>>> measure studies (like the one you cite) is the following page which >>>>> includes term mappings for several of the datasets we've worked with over >>>>> the years. >>>>> >>>>> http://www-users.cs.umn.edu/~bthomson/corpus/corpus.html >>>>> >>>>> I will admit to being a little puzzled about the case of abortion - >>>>> miscarriage. The paper you cite clearly reports a value based on MSH, but >>>>> as I try to run that query now I get a value of -1 (even when using the >>>>> CUIs). However, it appears that each of the CUIs is found in MSH, but that >>>>> somehow we are not able to compute some of the measures (a path length, >>>>> for >>>>> example). This suggests that there is not a path between the two CUIs, >>>>> which has something to do with the structure of UMLS/MSH. >>>>> >>>>> One quick and dirty way to see if a CUI is in MSH is to find the path >>>>> length between a CUI and itself. If it is present in MSH, that value will >>>>> be 1. We see that for each of the CUIs used for abortion and miscarriage. >>>>> >>>>> tpederse@maraca:~$ perl query-umls-similarity-webinterface.pl >>>>> --measure path --sab MSH C0156543 C0156543 >>>>> Default Settings: >>>>> --default http://atlas.ahc.umn.edu/ >>>>> --rel PAR/CHD >>>>> User Settings: >>>>> --measure path >>>>> >>>>> 1<>Unspecified abortion NOS(C0156543)<>Unspecified abortion >>>>> NOS(C0156543) >>>>> >>>>> tpederse@maraca:~$ perl query-umls-similarity-webinterface.pl >>>>> --measure path --sab MSH C0000786 C0000786 >>>>> Default Settings: >>>>> --default http://atlas.ahc.umn.edu/ >>>>> --rel PAR/CHD >>>>> User Settings: >>>>> --measure path >>>>> >>>>> 1<>Abortions.spontaneous(C0000786)<>Abortions.spontaneous(C0000786) >>>>> >>>>> However, when I try to find the path length between the two CUIs, I >>>>> get -1. This suggests that the CUIs are not jointed by PAR/CHD >>>>> relations...note that below you can see that the terms for the CUIs have >>>>> been looked up, which shows us that MSH knows about them... >>>>> >>>>> tpederse@maraca:~$ perl query-umls-similarity-webinterface.pl >>>>> --measure path --sab MSH C0156543 C0000786 >>>>> Default Settings: >>>>> --default http://atlas.ahc.umn.edu/ >>>>> --rel PAR/CHD >>>>> User Settings: >>>>> --measure path >>>>> >>>>> -1<>Unspecified abortion NOS(C0156543)<>Abortions.spont >>>>> aneous(C0000786) >>>>> >>>>> So, in any case, it would appear that something has changed in the >>>>> structure of MSH since we reported our results in the 2009 AMIA paper you >>>>> mention. I'm not sure what that is. But, I think the general message is >>>>> that if you can use CUIs it will normally be more reliable to do that. >>>>> Mapping terms to CUIs is of course it's own problem, but UMLS::Similarity >>>>> doesn't do anything terribly fancy with that, and so probably whatever you >>>>> do will be more extensive and reliable than what UMLS::Similarity would >>>>> do... >>>>> >>>>> I hope this helps somehow, and please do feel free to follow up. >>>>> Thoughts from other users on this issue would also be most welcome! >>>>> >>>>> Cordially, >>>>> Ted >>>>> >>>>> On Sat, May 27, 2017 at 12:18 PM, Jennifer Wilson >>>>> [email protected] [umls-similarity] < >>>>> [email protected]> wrote: >>>>> >>>>>> >>>>>> >>>>>> Hi all, >>>>>> >>>>>> I'm resending this now that I'm subscribed. Any advice would be much >>>>>> appreciated! Thank you, >>>>>> >>>>>> ---------- Forwarded message ---------- >>>>>> From: Jennifer Wilson <[email protected]> >>>>>> Date: Tue, May 23, 2017 at 6:13 PM >>>>>> Subject: Help with the best approach for using the query-UMLS >>>>>> interface >>>>>> To: [email protected] >>>>>> >>>>>> >>>>>> Hello UMLS similarity team, >>>>>> >>>>>> I am trying to compute the similarity between ~30K disease/phenotype >>>>>> terms. Ideally, I would have a matrix of similarity for these terms. >>>>>> >>>>>> My first attempt was to write a python script to call the >>>>>> query-umls-similarity-webinterface.pl script. Though, before >>>>>> releasing the script on my dataset, I was trying to recreate the scores >>>>>> from this paper (https://www.ncbi.nlm.nih.gov/ >>>>>> pmc/articles/PMC2815481/) in table 1. >>>>>> >>>>>> Here's the command I am using: >>>>>> >>>>>> ./query-umls-similarity-webinterface.pl --sab MSH --rel PAR/CHD >>>>>> "Abortion" "Miscarriage" >>>>>> >>>>>> Default Settings: >>>>>> >>>>>> --default http://atlas.ahc.umn.edu/ >>>>>> >>>>>> --measure path >>>>>> >>>>>> >>>>>> User Settings: >>>>>> >>>>>> --rel PAR/CHD >>>>>> >>>>>> >>>>>> (-1.0, 'Abortion', 'Miscarriage') >>>>>> >>>>>> I also have not processed the text in my dataset much. I have >>>>>> basically pulled diseases and phenotypes from DisGeNet, OMIN, PheWas, and >>>>>> the GWAS catalogue. If I'm using data from all of these sources - do you >>>>>> recommend sending them directly to the query interface? Should I try and >>>>>> map to CUI terms? (examples below) >>>>>> >>>>>> Before I download the database and attempt to query the database >>>>>> (it's not a language that I use in my current work), I just wanted an >>>>>> outside perspective to see if there are best practices for using this >>>>>> data. >>>>>> Thank you in advance for your time! >>>>>> >>>>>> (examples) >>>>>> Here are two more examples showing the disease descriptions in my >>>>>> dataset. Is the UMLS interface robust to these various formats or do they >>>>>> need to be an exact match? >>>>>> >>>>>> ./query-umls-similarity-webinterface.pl --sab MSH --rel PAR/CHD >>>>>> "Testicular Neoplasms" "Amelogenesis imperfecta local hypoplastic form" >>>>>> >>>>>> Default Settings: >>>>>> >>>>>> --default http://atlas.ahc.umn.edu/ >>>>>> >>>>>> --measure path >>>>>> >>>>>> >>>>>> User Settings: >>>>>> >>>>>> --rel PAR/CHD >>>>>> >>>>>> >>>>>> (-1.0, 'Testicular Neoplasms', 'Amelogenesis imperfecta local >>>>>> hypoplastic form') >>>>>> >>>>>> >>>>>> >>>>>> ./query-umls-similarity-webinterface.pl --sab MSH --rel PAR/CHD >>>>>> "Hypotrichosis 2, 146520 (3)" "PERIODONTITIS, LOCALIZED AGGRESSIVE" >>>>>> >>>>>> Default Settings: >>>>>> >>>>>> --default http://atlas.ahc.umn.edu/ >>>>>> >>>>>> --measure path >>>>>> >>>>>> >>>>>> User Settings: >>>>>> >>>>>> --rel PAR/CHD >>>>>> >>>>>> >>>>>> (-1.0, 'Hypotrichosis 2, 146520 (3)', 'PERIODONTITIS, LOCALIZED >>>>>> AGGRESSIVE') >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Jennifer L. Wilson >>>>>> Bioengineering, Stanford University >>>>>> [email protected] / 703.969.3318 <(703)%20969-3318> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Jennifer L. Wilson >>>>>> Bioengineering, Stanford University >>>>>> [email protected] / 703.969.3318 <(703)%20969-3318> >>>>>> >>>>>> >>>>> >>>> >>> >>> >>> -- >>> Jennifer L. Wilson >>> Bioengineering, Stanford University >>> [email protected] / 703.969.3318 <(703)%20969-3318> >>> -- >>> Jennifer L. Wilson >>> Bioengineering, Stanford University >>> [email protected] / 703.969.3318 <(703)%20969-3318> >>> >>> >> > > > -- > Jennifer L. Wilson > Bioengineering, Stanford University > [email protected] / 703.969.3318 <(703)%20969-3318> > > >
