Re: The 2020 UMLS dictionary and our default SNO_RX

Peter Abramowitsch Fri, 07 Aug 2020 09:35:08 -0700

Hi Jeff

Many thanks for all your suggestions.


Things have settled down now.  The blacklist feature has been very useful
for suppressing "false" acronym detection and I will add a few synonyms to
the dict script that have gone away.  Also added some post processing code
(that might be  useful for others?)  - when a range maps to two or more
concepts in different semantic domains, I set the confidence level in each
to 0.5.   Like the gene CAD and the acronym CAD, for example.

Peter

On Fri, Aug 7, 2020 at 6:29 AM Jeffrey Miller <jeff...@gmail.com> wrote:

> Hi Peter,
>
> Yes, I've chosen active subsets then I think I actually choose the select
> sources to exclude option, but I don't believe that should matter. I leave
> the precedence defaults alone.
>
> Jeff
>
> On Thu, Aug 6, 2020, 2:13 PM Peter Abramowitsch <pabramowit...@gmail.com>
> wrote:
>
> > Hi Jeff
> >
> > You are absolutely right:  when I use sno_rx with the term WBC in a
> simple
> > context it is not showing up as a T059.  I was surprised about that
> >
> > I was wrong about the term I was looking at.   Here's the scenario that
> did
> > change
> >
> > Text context
> > afebrile, but has elevated WBC count;
> >
> > *Using sno_rx*
> > canonical text:  White blood cell count increased (lab result)
> > CUI: C0750426,
> > location:  Leukocytes,
> > location_snomed: 52501007
> > range_text:  elevated WBC count,
> > vocab_term: 414478003,
> > vocab_type: SNOMEDCT_US
> > ...other params.
> >
> > *Using new dict based on 2020AA*
> > Missing:
> >
> > Reason:
> > *grep elevated newdict_750426*
> >     INSERT INTO CUI_TERMS VALUES(750426,0,4,'elevated white blood
> > count','elevated')
> >     INSERT INTO CUI_TERMS VALUES(750426,0,5,'elevated white blood cell
> > count','elevated')
> > *grep elevated olddict_750426*
> >     INSERT INTO CUI_TERMS VALUES(750426,0,4,'elevated white blood
> > count','elevated')
> >     INSERT INTO CUI_TERMS VALUES(750426,1,3,'elevated wbc count','wbc')
> > <----------------------  missing
> >     INSERT INTO CUI_TERMS VALUES(750426,0,5,'elevated white blood cell
> > count','elevated')
> >
> > So back to your recommendation on using MMSYS
> >
> > You chose the ACTIVE_SUBSETS option - right?
> > And on the Sources to Exclude/Include page, do you deselect all sources
> to
> > exclude?
> > Have you tweaked the precedence of subsets or do you leave the default
> > order alone?
> >
> > Many thanks,
> > Peter
> >
> > On Thu, Aug 6, 2020 at 8:11 AM Jeffrey Miller <jeff...@gmail.com> wrote:
> >
> > > Peter,
> > >
> > > I have experienced similar issues with how text spans translate to
> > > different CUIs depending on the included vocabularies as well. I had a
> > > similar conversation with Sean on the dev forum last year I believe.
> > >
> > > I do not believe the behavior of 'wbc' has changed- if I run the
> clinical
> > > pipeline with sno_rx_16ab dictionary, it is tagged as an
> > > AnatomicalSiteMention. Are you seeing something different?
> > >
> > > Jeff
> > >
> > > On Wed, Aug 5, 2020 at 11:24 PM Peter Abramowitsch <
> > > pabramowit...@gmail.com>
> > > wrote:
> > >
> > > > Hi Jeff
> > > >
> > > > I thought I did load them all, but I'll go back and check.
> > > >
> > > > When looking at my gene issue  the result is that the lookup
> > arbitrarily
> > > > (seemingly anyway) flips between one and another when there are
> > overlaps
> > > > between vocabularies.    Ie. I see that both Vocab A & B both contain
> > > geneX
> > > > and geneY.   Neither of these are in SNOMED. So in my output, I get
> one
> > > of
> > > > the genes associated with Vocab A and another with Vocab B.   When I
> > > remove
> > > > Vocab B then obviously both are associated with Vocab A - which is
> > what I
> > > > wanted.
> > > >
> > > > If, for you, WBC is showing up as an anatomical location, rather
> than a
> > > > T059  then probably it's not getting the correct SNOMED code though.
> > > > Wouldn't that be a problem for your researchers?
> > > >
> > > > Peter
> > > >
> > > > On Wed, Aug 5, 2020 at 5:37 PM Jeffrey Miller <jeff...@gmail.com>
> > wrote:
> > > >
> > > > > Hi Peter,
> > > > >
> > > > > If I create a dictionary using UMLS 2020aa with just snomed and
> > rxnorm
> > > my
> > > > > cTAKES dictionary still seems to have a CUI associated with the
> > string
> > > > > 'wbc' that links to the snomed term for Leukocyte (Cell). It is not
> > > > mapping
> > > > > to a lab result TUI, but rather an anatomical site, but it seems to
> > be
> > > > the
> > > > > same CUI that 'wbc' resolves to in sno_rx_16ab. Maybe HGNC is
> > > conflicting
> > > > > with that too?
> > > > >
> > > > > Just to double check, when you installed UMLS through
> Metamorphosys,
> > > did
> > > > > you install all of the available vocabularies?
> > > > >
> > > > > Jeff
> > > > >
> > > > > On Wed, Aug 5, 2020 at 6:52 PM Peter Abramowitsch <
> > > > pabramowit...@gmail.com
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi All
> > > > > >
> > > > > > I've been setting up a custom dictionary using UMLS with the goal
> > of
> > > > > simply
> > > > > > adding a comprehensive genetic vocabulary HGNC  to the latest
> UMLS
> > > > SNOMED
> > > > > > and RXNORM vocabularies in the hope of getting somewhere close to
> > the
> > > > > > cTakes default dictionary again.
> > > > > >
> > > > > > However, there are changes to concept vocabularies in UMLS2020AA
> > that
> > > > > > affect the ability of cTakes to work well with older notes and
> > > possibly
> > > > > the
> > > > > > note-writing practices of older physicians and labs.   Some of
> the
> > > > tried
> > > > > > and true acronyms such as WBC for leukocytes, RBC, and EOS
> > > (eosinophil
> > > > > > count) are no longer part of SNOMED.  Probably this is because
> the
> > > > > > components of these parameters are now broken out into  more
> > granular
> > > > > > types.   The other reason this may be is that a few of these
> > acronyms
> > > > now
> > > > > > overlap the names of Genes.  EOS is one of them.  This is just
> > > > > speculation.
> > > > > >
> > > > > > In order to have these common parameters re-included via their
> > common
> > > > lab
> > > > > > acronyms, it is necessary to add another common US vocabulary
> such
> > as
> > > > > > HL7-V3.0 or NCI_CDISC.  Of course one can remap back into SNOMED
> by
> > > > > adding
> > > > > > insert statements into the dictionary script, but it might be a
> > > > > > non-scalable exercise.
> > > > > >
> > > > > > So my point here is that if, one day, we plan to create a new
> > cTakes
> > > > > > release, and with it, a new UMLS lookup, we may need to consider
> > > > adding a
> > > > > > third basic vocabulary into our current set of two.
> > > > > >
> > > > > > Thoughts?
> > > > > > Peter
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: The 2020 UMLS dictionary and our default SNO_RX

Reply via email to