Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS]

Finan, Sean Wed, 19 May 2021 05:30:32 -0700

Hi all,
Correct.

Tim  is correct in the sense that he is using a custom dictionary (custom 
synonyms, cuis, etc.) which kind of changes the "rules" of what the standard 
dictionary lookup considers a valid term based upon available tokens in the 
text.  There are other simple settings that further qualify how the standard 
dictionary lookup accepts or discards synonyms.


I think that what Greg is asking about is something with introduced "logic" 
that can alter or remove terms already discovered by the standard dictionary 
lookup.

Peter and Kean both outline some custom annotators that they have created to 
use logic that can alter/add/remove terms discovered by the standard dictionary 
lookup.  I do the same thing for different projects and advise everybody that 
applies ctakes to specific domains do the same.  

ctakes is a general purpose tool and results can definitely be improved when 
catered to a more narrow purpose.

Back to Greg, I got the feeling that he might be interested in a more versatile 
annotator.  Introducing an engine that can utilize something like ruta has 
several advantages:
1.  You  can "easily" add complex rules in one place.
2.  You can change rules external to code ...
  2a. the same pipeline can be catered to different projects without changing 
code in an annotator or creating a new annotator.
  2b.  An end user who knows nothing about ctakes can change a ruta script to 
fit their purposes.
3. Rules are supported and documented by uima ruta, so you don't have to worry 
about that extra headache.
4. Once Greg adds it to apache ctakes (right? ;^) everybody in the community 
can apply ruta rules to their project.

When I looked at it a few years ago it was for reason 2b.  In the end we went 
for different annotators like Peter and Kean outlined and just use piper file 
changes to satisfy #2 as that is definitely much easier.  However, it doesn't 
benefit the community as a whole (#4).

Cheers all, this is a great conversation!

Sean




________________________________________
From: Kean Kaufmann <[email protected]>
Sent: Wednesday, May 19, 2021 7:50 AM
To: [email protected]
Subject: Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS]

* External Email - Caution *


> yes,  the line between "lookup" and rule execution is a little blurry
sometimes.

Sure is.  I blur it with a set of annotators that extend dictionary
annotations based on words or annotations covered by the same Chunk, e.g.

DiseaseDisorderMention + /screen(ing)?/i = ProcedureMention
MedicationMention + /dependenc[ey]|addiction/i = DiseaseDisorderMention
DiseaseDisorderMention + AnatomicalSiteMention in same Chunk =
DiseaseDisorderMention
ProcedureMention + AnatomicalSiteMention in same Chunk = ProcedureMention

Higher recall than the regular UmlsLookupAnnotator;
higher precision than the UmlsOverlapLookupAnnotator (which skips a
specified number of tokens regardless of syntax).

I've been wanting a more general framework to fit this into, and thinking
it might be Ruta.
Thanks for the pointer to TokensRegex; I'll look at that as well.


On Tue, May 18, 2021 at 5:39 PM Peter Abramowitsch <[email protected]>
wrote:

> Hi All,  yes,  the line between "lookup" and rule execution is a little
> blurry sometimes.   Here's some more blurriness.
>
> I've done something related, adapting a UIMA tokens regex engine for
> Ctakes.  You create a new type in the TypeSystem.  In my case it uses
> CONLLDEP Annotations as the tokens to reason over.   You can set up
> expressions (rules) that look like this.
> (Yes, this case is already covered in the dictionary, but it's an example)
>
> Matcher A:   (lemma=="be");
> Matcher B:   /partially|partly/;
> Matcher C:   /vaccinated/;
>
> Rule  vaccinated|CUI1234|SNOMED5678:  A? B?  C;
>
> You get the Annotation you've delegated to this task, with the entity
> value  "vaccinated|1234|5678"  and the range which spanned the tokens that
> caused the annotation rule to fire
>
> (See Stanford's Tokens Regex)
>
> Peter
>
>
> On Tue, May 18, 2021 at 1:29 PM Miller, Timothy <
> [email protected]> wrote:
>
> > But Sean, isn't what he's asking for essentially already implemented in
> > cTAKES as the custom dictionary? I'm currently using that approach for my
> > covid container:
> >
> >
> https://urldefense.com/v3/__https://github.com/Machine-Learning-for-Medical-Language/ctakes-covid-container__;!!NZvER7FxgEiBAiR_!7ZopTIhXKalQFx0xET_yET0agN2ZT8JWoa0UyqGSrXa4w-h_9-tRCEeiS4pr6s2Y-T4elV3bYac$
> > Tim
> >
> > ________________________________________
> > From: Finan, Sean <[email protected]>
> > Sent: Tuesday, May 18, 2021 11:55 AM
> > To: [email protected]
> > Cc: Himanshu Shekhar Sahoo
> > Subject: Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS]
> >
> > * External Email - Caution *
> >
> >
> > Hi Greg,
> >
> > From 30,000 ft, I think that you would want to use the RutaEngine.
> >
> >
> >
> https://urldefense.com/v3/__https://uima.apache.org/d/ruta-current/tools.ruta.book.html*ugr.tools.ruta.ae.basic__;Iw!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWickztninUTU$
> >
> >
> https://urldefense.com/v3/__https://javadoc.io/doc/org.apache.uima/ruta-core/latest/org/apache/uima/ruta/engine/RutaEngine.html__;!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWickzI7QF5CI$
> >
> >
> https://urldefense.com/v3/__http://svn.apache.org/repos/asf/uima/ruta/trunk/ruta-core/src/main/java/org/apache/uima/ruta/engine/RutaEngine.java__;!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWickzJJ96zT4$
> >
> > That seems to be the actual analysis engine that loads and uses rules to
> > create annotations.
> > While you could use an xml descriptor or use the piper "set" command and
> > do things like mapping ruta to ctakes type systems, I would take the
> > alternate approach of "copying" the initialize(..) and process (..)
> methods
> > and modify them to use ctakes types directly.
> >
> > Disclaimer:  I know very little about uima ruta.  At some point I did
> look
> > into it but it was for a specific (ctakes-derivative) project and I
> didn't
> > go further than basic doc perusal.
> >
> > If you move forward with this please let us all know what you find.  I
> > think that there will be great interest in the community.
> >
> > Sean
> > ________________________________________
> > From: Greg Silverman <[email protected]>
> > Sent: Tuesday, May 18, 2021 11:13 AM
> > To: [email protected]
> > Cc: Himanshu Shekhar Sahoo
> > Subject: Re: rule-based lookup for custom lexicon [EXTERNAL]
> >
> > * External Email - Caution *
> >
> >
> > Hi Sean,
> > I was wondering if there was a way to use rule-base lookup of a custom
> > lexicon within cTAKES (say a locally curated list of covd-19 symptoms).
> > When I Googled around, I stumbled on UIMA Ruta, but couldn't find
> anything
> > wrt to cTAKES specifics.
> >
> > Thanks!
> >
> >
> > Greg--
> >
> > On Tue, May 18, 2021 at 10:04 AM Finan, Sean <
> > [email protected]> wrote:
> >
> > >  To which ctakes component(s) are you referring?
> > > ________________________________________
> > > From: Greg Silverman <[email protected]>
> > > Sent: Sunday, May 16, 2021 6:02 PM
> > > To: [email protected]; Himanshu Shekhar Sahoo
> > > Subject: rule-based lookup for custom lexicon [EXTERNAL]
> > >
> > > * External Email - Caution *
> > >
> > >
> > > I looked all over and could not find any information on how to add this
> > > pipeline component to cTAKES. I assume it uses UIMA Ruta?
> > >
> > > Thanks in advance!
> > >
> > > Greg--
> > > --
> > > Greg M. Silverman
> > > Senior Systems Developer
> > > NLP/IE <
> > >
> >
> https://urldefense.com/v3/__https://healthinformatics.umn.edu/research/nlpie-group__;!!NZvER7FxgEiBAiR_!6hN356eDesvWNYzsrDMaXgF6IkZw313QU2QUQw5M8Jysvh1K1JxjEBeztZicX1DM2jC0o7_0qAA$
> > > >
> > > Department of Surgery
> > > University of Minnesota
> > > [email protected]
> > >
> >
> >
> > --
> > Greg M. Silverman
> > Senior Systems Developer
> > NLP/IE <
> >
> https://urldefense.com/v3/__https://healthinformatics.umn.edu/research/nlpie-group__;!!NZvER7FxgEiBAiR_!8uKf_4SXyKdCmvlMHvRGddxlzofg64D4_zsPdCThqeMAyn2akyMNI8wqM6yNUZA2N93F-aAsR7I$
> > >
> > Department of Surgery
> > University of Minnesota
> > [email protected]
> >
>

Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS]

Reply via email to