Re: Disambiguation --alignment with SNOMED

Peter Abramowitsch Fri, 04 Dec 2020 12:03:50 -0800

Hi Eugenia.  I may be wrong, but that XML definition is out of date (which
is why it is commented out).  Through the piper mechanism you have a
different choice.   Here follows a bit more.  I hope some of it is
useful....

Highly specific identification of terms is difficult and I am working on
some infrastructure to help in really capturing values - not only lab
values, but it will take a long time as I'm just doing it for fun.  But
your problem seems more like a dictionary issue.

I won't pretend to be an expert or to have tried out every possibility, but
I'll give you a few tips.  The important thing is to know that, for me at
least,  Ctakes is not a finished product but an eternal work in progress.
It takes years of experimentation and configuration.

First you need to understand what specific terms and contexts your
physicians are using and whether the punctuation is clean enough that you
can work with sentences or need to go down to the chunk level.

in the UMLS Dictionary Lookup mechanism , the WindowAnnotation param is
probably something you can supply in a piper file and it is the FQN of a
class that extends Annotation.   You could create your own Annotation &
Annotator, or you could try using a Chunk annotator upstream of the UMLS
lookup.   The piper creator helps you do that.   Then you would add the FQN
of a Chunk to the window param of your UMLS lookup annotator.   I used it a
long time ago and from what I remember it basically tries to identify
clauses within sentences.  By doing this - especially with the Overlap
Annotator, you'd prevent spilling the lookup across clauses within a
sentence.

You may want to play with the SentenceDetectorAnnotatorBIO instead of the
SentenceDetector to see which gets you the most workable sentences.  And
you may want to look at this file  EndOfSentenceScannerImpl.java

Customizing the dictionary usually means adding a synonym for each wording
that represents context in which your term will be found.  Now in your
specific example about a monocyte procedure vs a monocyte count result,
these are not just distinct in SNOMED terms but also distinct CUIs.   Here
are the two canonical terms with their CUIS as I found them, then each has
its synonyms.  As you can see that these SYNONYMS are woefully insufficient
and not only have the synonyms blurred the distinction you were looking
for, but the SNOMED mapping overlaps the two concepts.  This was probably
done as an expedient, but from an informatics perspective, you are right.
This is incorrect.

INSERT INTO PREFTERM VALUES(750880,'Monocyte count result')    (TUI 34)
SYNONYMS count monocytes,
SNO *365631001*

INSERT INTO PREFTERM VALUES(200637,'Monocyte count procedure')  (TUI 59)
SYNONYM monos, monocyte count
SNO 67776007, *365631001*

Check out how a row like this works.
INSERT INTO CUI_TERMS VALUES(CUI,INDEX,COUNT,'<context with
keyword>','<keyword>')

You can add these rows to match the language used by your physicians or in
your forms.

I had to do a fair bit of juggling to get what we needed and it's a job
that's never finished.   The way I save my changes is to produce sed files
of deletions, changes, additions made to the standard dictionary, and
archive those rather than the dictionary which is quite large

I hope this helps.

Peter

On Fri, Dec 4, 2020 at 5:07 PM Monogyiou, Eugenia <
eugenia.monogy...@nttdata.com> wrote:

> Thank you all for the support!
> Sean, Kean the labValueFinder works as described so thanks for pointing
> that out!
>
> Peter, I will ask for your help with the LookupWindow if you could please
> spare a bit more time... I have located the  UmlsOverlapLookupAnnotator
> file, thank you for that.
>
> I have located in the UmlsLookupAnnotator (in
> ctakes-dictionary-lookup-fast)
> <name>windowAnnotations</name>
>             <value>
>                <!--  LookupWindowAnnotation is supposed to be a refined
> Noun Phrase  -->
>
>  
> <!--<string>org.apache.ctakes.typesystem.type.textspan.LookupWindowAnnotation</string>-->
>                <!--  In some instances LookupWindowAnnotation is missing
> tokens and Sentence can be used -->
>
>  <string>org.apache.ctakes.typesystem.type.textspan.Sentence</string>
>             </value>
>
> I have gone through various java and typesystem files but I am not sure
> where I can find all the potential options for the Lookup Window and
> where/how I can set these. Also, if you could please let me know where in
> the code it is possible to see what symbols are considered "end-of
> sentence". I have noticed that ":" sometimes defines the end of a sentence
> but I haven't located anything relevant in the code ...
>
> Peter says :
> > > Sometimes you need to further customize your dictionary. (can you
> please elaborate ?)
>
> Many thanks in advance,
>
> Kind Regards,
>
> Eugenia Monogyiou | NTT Data UK
> Consulting & IT Solutions Ltd. 1 Royal Exchange, London EC3V 3DG
>
> Mob: +44 (0)7971623683 Email: eugenia.monogy...@nttdata.com
>
>
> -----Original Message-----
> From: Peter Abramowitsch <pabramowit...@gmail.com>
> Sent: 03 December 2020 18:54
> To: dev@ctakes.apache.org
> Subject: Re: Disambiguation --alignment with SNOMED
>
> I have this issue a lot.  There are many moving parts.   Sometimes it can
> be resolved by using the widest window in the DictionaryLookup or
> sometimes the TermOverlap lookup annotator.  Sometimes you need to further
> customize your dictionary.
>
> The problem arises when there isn't enough context to whittle down the
> lookup to the correct SNOMED entity. Or there isn't a synonym entry in the
> Dictionary that maps to the widest context in your texts.    If you look at
> how the UMLS SNO_RX dictionary is structured you'll see how it can happen.
>
> For starters, look at the raw XMI and see all the entries in the UmlsArray
> that were selected even if later, only the wrong one entry surfaced.
>
> Another issue is the LabValueFinder.  It has settings that allow it to
> clone procedures into lab values or vice versa (I can't remember).  This
> can lead to a lot of duplication
>
> Peter
>
> On Thu, Dec 3, 2020 at 2:23 PM Monogyiou, Eugenia <
> eugenia.monogy...@nttdata.com> wrote:
>
> > Hello,
> >
> > I think I have hit a wall in terms of applying disambiguation in the
> > cTakes context. I have come across the following example where what I
> > consider to be a lab result (Monocyte Count) is picked up as a
> > procedure, apparently, in alignment with UMLS
> > coding Scheme = SNOMED    Code =67776007,     CUI =C0200637  ,  TUI =T059
> > , preferredText = " Monocyte Count Procedure"
> > coding Scheme = SNOMED    Code =365631001,   CUI =C0200637  ,  TUI =T059
> ,
> > preferredText = " Monocyte Count Procedure"
> >
> > While they share the CUI (at UMLS level, due to the reconciliation of
> > different ontologies), they are quite different concepts. 67776007
> > stands for "Monocyte count (procedure)" while 365631001 stands for
> > "Finding of monocyte count (finding)". So is it fair to say that
> > cTakes is not fully aligned with SNOMED?  Is there a rule on how such
> > concepts may be merged under the same CUI? Would using YTEX resolve
> similar issues?
> >
> > And also I'm using cTakes 4.0.0 and the YTEX installation guide
> > appears to be outdated - the patch download is missing , names of files
> missing etc.
> > If YTEX is the answer are there any updated instructions? If it is not
> > are you using other UIMA-friendly solutions?
> >
> > Many thanks in advance,
> > Eugenia
> >
> > Disclaimer: This email and any attachments are sent in strictest
> > confidence for the sole use of the addressee and may contain legally
> > privileged, confidential, and proprietary data. If you are not the
> > intended recipient, please advise the sender by replying promptly to
> > this email and then delete and destroy this email and any attachments
> > without any further use, copying or forwarding.
> >
> Disclaimer: This email and any attachments are sent in strictest
> confidence for the sole use of the addressee and may contain legally
> privileged, confidential, and proprietary data. If you are not the intended
> recipient, please advise the sender by replying promptly to this email and
> then delete and destroy this email and any attachments without any further
> use, copying or forwarding.
>

Re: Disambiguation --alignment with SNOMED

Reply via email to