Hi Jeff Many thanks for all your suggestions.
Things have settled down now. The blacklist feature has been very useful for suppressing "false" acronym detection and I will add a few synonyms to the dict script that have gone away. Also added some post processing code (that might be useful for others?) - when a range maps to two or more concepts in different semantic domains, I set the confidence level in each to 0.5. Like the gene CAD and the acronym CAD, for example. Peter On Fri, Aug 7, 2020 at 6:29 AM Jeffrey Miller <jeff...@gmail.com> wrote: > Hi Peter, > > Yes, I've chosen active subsets then I think I actually choose the select > sources to exclude option, but I don't believe that should matter. I leave > the precedence defaults alone. > > Jeff > > On Thu, Aug 6, 2020, 2:13 PM Peter Abramowitsch <pabramowit...@gmail.com> > wrote: > > > Hi Jeff > > > > You are absolutely right: when I use sno_rx with the term WBC in a > simple > > context it is not showing up as a T059. I was surprised about that > > > > I was wrong about the term I was looking at. Here's the scenario that > did > > change > > > > Text context > > afebrile, but has elevated WBC count; > > > > *Using sno_rx* > > canonical text: White blood cell count increased (lab result) > > CUI: C0750426, > > location: Leukocytes, > > location_snomed: 52501007 > > range_text: elevated WBC count, > > vocab_term: 414478003, > > vocab_type: SNOMEDCT_US > > ...other params. > > > > *Using new dict based on 2020AA* > > Missing: > > > > Reason: > > *grep elevated newdict_750426* > > INSERT INTO CUI_TERMS VALUES(750426,0,4,'elevated white blood > > count','elevated') > > INSERT INTO CUI_TERMS VALUES(750426,0,5,'elevated white blood cell > > count','elevated') > > *grep elevated olddict_750426* > > INSERT INTO CUI_TERMS VALUES(750426,0,4,'elevated white blood > > count','elevated') > > INSERT INTO CUI_TERMS VALUES(750426,1,3,'elevated wbc count','wbc') > > <---------------------- missing > > INSERT INTO CUI_TERMS VALUES(750426,0,5,'elevated white blood cell > > count','elevated') > > > > So back to your recommendation on using MMSYS > > > > You chose the ACTIVE_SUBSETS option - right? > > And on the Sources to Exclude/Include page, do you deselect all sources > to > > exclude? > > Have you tweaked the precedence of subsets or do you leave the default > > order alone? > > > > Many thanks, > > Peter > > > > On Thu, Aug 6, 2020 at 8:11 AM Jeffrey Miller <jeff...@gmail.com> wrote: > > > > > Peter, > > > > > > I have experienced similar issues with how text spans translate to > > > different CUIs depending on the included vocabularies as well. I had a > > > similar conversation with Sean on the dev forum last year I believe. > > > > > > I do not believe the behavior of 'wbc' has changed- if I run the > clinical > > > pipeline with sno_rx_16ab dictionary, it is tagged as an > > > AnatomicalSiteMention. Are you seeing something different? > > > > > > Jeff > > > > > > On Wed, Aug 5, 2020 at 11:24 PM Peter Abramowitsch < > > > pabramowit...@gmail.com> > > > wrote: > > > > > > > Hi Jeff > > > > > > > > I thought I did load them all, but I'll go back and check. > > > > > > > > When looking at my gene issue the result is that the lookup > > arbitrarily > > > > (seemingly anyway) flips between one and another when there are > > overlaps > > > > between vocabularies. Ie. I see that both Vocab A & B both contain > > > geneX > > > > and geneY. Neither of these are in SNOMED. So in my output, I get > one > > > of > > > > the genes associated with Vocab A and another with Vocab B. When I > > > remove > > > > Vocab B then obviously both are associated with Vocab A - which is > > what I > > > > wanted. > > > > > > > > If, for you, WBC is showing up as an anatomical location, rather > than a > > > > T059 then probably it's not getting the correct SNOMED code though. > > > > Wouldn't that be a problem for your researchers? > > > > > > > > Peter > > > > > > > > On Wed, Aug 5, 2020 at 5:37 PM Jeffrey Miller <jeff...@gmail.com> > > wrote: > > > > > > > > > Hi Peter, > > > > > > > > > > If I create a dictionary using UMLS 2020aa with just snomed and > > rxnorm > > > my > > > > > cTAKES dictionary still seems to have a CUI associated with the > > string > > > > > 'wbc' that links to the snomed term for Leukocyte (Cell). It is not > > > > mapping > > > > > to a lab result TUI, but rather an anatomical site, but it seems to > > be > > > > the > > > > > same CUI that 'wbc' resolves to in sno_rx_16ab. Maybe HGNC is > > > conflicting > > > > > with that too? > > > > > > > > > > Just to double check, when you installed UMLS through > Metamorphosys, > > > did > > > > > you install all of the available vocabularies? > > > > > > > > > > Jeff > > > > > > > > > > On Wed, Aug 5, 2020 at 6:52 PM Peter Abramowitsch < > > > > pabramowit...@gmail.com > > > > > > > > > > > wrote: > > > > > > > > > > > Hi All > > > > > > > > > > > > I've been setting up a custom dictionary using UMLS with the goal > > of > > > > > simply > > > > > > adding a comprehensive genetic vocabulary HGNC to the latest > UMLS > > > > SNOMED > > > > > > and RXNORM vocabularies in the hope of getting somewhere close to > > the > > > > > > cTakes default dictionary again. > > > > > > > > > > > > However, there are changes to concept vocabularies in UMLS2020AA > > that > > > > > > affect the ability of cTakes to work well with older notes and > > > possibly > > > > > the > > > > > > note-writing practices of older physicians and labs. Some of > the > > > > tried > > > > > > and true acronyms such as WBC for leukocytes, RBC, and EOS > > > (eosinophil > > > > > > count) are no longer part of SNOMED. Probably this is because > the > > > > > > components of these parameters are now broken out into more > > granular > > > > > > types. The other reason this may be is that a few of these > > acronyms > > > > now > > > > > > overlap the names of Genes. EOS is one of them. This is just > > > > > speculation. > > > > > > > > > > > > In order to have these common parameters re-included via their > > common > > > > lab > > > > > > acronyms, it is necessary to add another common US vocabulary > such > > as > > > > > > HL7-V3.0 or NCI_CDISC. Of course one can remap back into SNOMED > by > > > > > adding > > > > > > insert statements into the dictionary script, but it might be a > > > > > > non-scalable exercise. > > > > > > > > > > > > So my point here is that if, one day, we plan to create a new > > cTakes > > > > > > release, and with it, a new UMLS lookup, we may need to consider > > > > adding a > > > > > > third basic vocabulary into our current set of two. > > > > > > > > > > > > Thoughts? > > > > > > Peter > > > > > > > > > > > > > > > > > > > > >