Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS]

Kean Kaufmann Wed, 19 May 2021 13:08:10 -0700

>
> If anybody out there in the general community is interested, please reply
> on this thread and maybe we can coordinate a single presentation time.



Yes please. Thanks, Sean and (other) Peter!

On Wed, May 19, 2021 at 3:42 PM Finan, Sean <
[email protected]> wrote:

> Hi (other) Peter,
>
> Many thanks for jumping in on this!
>
> I would definitely be interested in seeing some examples, even though I
> don't have any specific use case right now.
>
> I will ask a few local people and see if they are interested in an
> informal video chat.  If anybody out there in the general community is
> interested, please reply on this thread and maybe we can coordinate a
> single presentation time.
>
> Cheers,
>
> Sean
> ________________________________________
> From: Peter Klügl <[email protected]>
> Sent: Wednesday, May 19, 2021 3:33 PM
> To: [email protected]
> Subject: Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS]
>
> * External Email - Caution *
>
>
> Hi all,
>
>
> if you are interested in UIMA Ruta and want to know more about it, you
> can always ask on the UIMA user list or me directly (I am the creator of
> UIMA Ruta). I can also prepare some slides and we can have an informal
> video chat where I give an overview of Ruta.
>
>
> I am of course not objective here (for several reasons) but I think UIMA
> Ruta could be really useful for cTAKES. It was originally developed for
> segmenting and processing discharge letters and similar clincial
> documents. Since then (>10 years), Ruta has always been applied to
> clincial documents and is being deployed in production by several
> companies. The language has some advantages and disadvantages compared
> to other rule languages. In the context of cTAKES, the
> direct/comprehensive support of UIMA and the IDE dev support are maybe
> the most relevant advantages.
>
>
> I was thinking about creating some introductory examples for the
> combination and usage of UIMA Ruta and cTAKES. If you have a good use
> case, let me know.
>
>
> Best,
>
>
> (another) Peter
>
>
> Am 19.05.2021 um 14:30 schrieb Finan, Sean:
> > Hi all,
> > Correct.
> >
> > Tim  is correct in the sense that he is using a custom dictionary
> (custom synonyms, cuis, etc.) which kind of changes the "rules" of what the
> standard dictionary lookup considers a valid term based upon available
> tokens in the text.  There are other simple settings that further qualify
> how the standard dictionary lookup accepts or discards synonyms.
> >
> > I think that what Greg is asking about is something with introduced
> "logic" that can alter or remove terms already discovered by the standard
> dictionary lookup.
> >
> > Peter and Kean both outline some custom annotators that they have
> created to use logic that can alter/add/remove terms discovered by the
> standard dictionary lookup.  I do the same thing for different projects and
> advise everybody that applies ctakes to specific domains do the same.
> >
> > ctakes is a general purpose tool and results can definitely be improved
> when catered to a more narrow purpose.
> >
> > Back to Greg, I got the feeling that he might be interested in a more
> versatile annotator.  Introducing an engine that can utilize something like
> ruta has several advantages:
> > 1.  You  can "easily" add complex rules in one place.
> > 2.  You can change rules external to code ...
> >   2a. the same pipeline can be catered to different projects without
> changing code in an annotator or creating a new annotator.
> >   2b.  An end user who knows nothing about ctakes can change a ruta
> script to fit their purposes.
> > 3. Rules are supported and documented by uima ruta, so you don't have to
> worry about that extra headache.
> > 4. Once Greg adds it to apache ctakes (right? ;^) everybody in the
> community can apply ruta rules to their project.
> >
> > When I looked at it a few years ago it was for reason 2b.  In the end we
> went for different annotators like Peter and Kean outlined and just use
> piper file changes to satisfy #2 as that is definitely much easier.
> However, it doesn't benefit the community as a whole (#4).
> >
> > Cheers all, this is a great conversation!
> >
> > Sean
> >
> >
> >
> >
> > ________________________________________
> > From: Kean Kaufmann <[email protected]>
> > Sent: Wednesday, May 19, 2021 7:50 AM
> > To: [email protected]
> > Subject: Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS]
> >
> > * External Email - Caution *
> >
> >
> >> yes,  the line between "lookup" and rule execution is a little blurry
> > sometimes.
> >
> > Sure is.  I blur it with a set of annotators that extend dictionary
> > annotations based on words or annotations covered by the same Chunk, e.g.
> >
> > DiseaseDisorderMention + /screen(ing)?/i = ProcedureMention
> > MedicationMention + /dependenc[ey]|addiction/i = DiseaseDisorderMention
> > DiseaseDisorderMention + AnatomicalSiteMention in same Chunk =
> > DiseaseDisorderMention
> > ProcedureMention + AnatomicalSiteMention in same Chunk = ProcedureMention
> >
> > Higher recall than the regular UmlsLookupAnnotator;
> > higher precision than the UmlsOverlapLookupAnnotator (which skips a
> > specified number of tokens regardless of syntax).
> >
> > I've been wanting a more general framework to fit this into, and thinking
> > it might be Ruta.
> > Thanks for the pointer to TokensRegex; I'll look at that as well.
> >
> >
> > On Tue, May 18, 2021 at 5:39 PM Peter Abramowitsch <
> [email protected]>
> > wrote:
> >
> >> Hi All,  yes,  the line between "lookup" and rule execution is a little
> >> blurry sometimes.   Here's some more blurriness.
> >>
> >> I've done something related, adapting a UIMA tokens regex engine for
> >> Ctakes.  You create a new type in the TypeSystem.  In my case it uses
> >> CONLLDEP Annotations as the tokens to reason over.   You can set up
> >> expressions (rules) that look like this.
> >> (Yes, this case is already covered in the dictionary, but it's an
> example)
> >>
> >> Matcher A:   (lemma=="be");
> >> Matcher B:   /partially|partly/;
> >> Matcher C:   /vaccinated/;
> >>
> >> Rule  vaccinated|CUI1234|SNOMED5678:  A? B?  C;
> >>
> >> You get the Annotation you've delegated to this task, with the entity
> >> value  "vaccinated|1234|5678"  and the range which spanned the tokens
> that
> >> caused the annotation rule to fire
> >>
> >> (See Stanford's Tokens Regex)
> >>
> >> Peter
> >>
> >>
> >> On Tue, May 18, 2021 at 1:29 PM Miller, Timothy <
> >> [email protected]> wrote:
> >>
> >>> But Sean, isn't what he's asking for essentially already implemented in
> >>> cTAKES as the custom dictionary? I'm currently using that approach for
> my
> >>> covid container:
> >>>
> >>>
> >>
> https://urldefense.com/v3/__https://github.com/Machine-Learning-for-Medical-Language/ctakes-covid-container__;!!NZvER7FxgEiBAiR_!7ZopTIhXKalQFx0xET_yET0agN2ZT8JWoa0UyqGSrXa4w-h_9-tRCEeiS4pr6s2Y-T4elV3bYac$
> >>> Tim
> >>>
> >>> ________________________________________
> >>> From: Finan, Sean <[email protected]>
> >>> Sent: Tuesday, May 18, 2021 11:55 AM
> >>> To: [email protected]
> >>> Cc: Himanshu Shekhar Sahoo
> >>> Subject: Re: rule-based lookup for custom lexicon [EXTERNAL]
> [SUSPICIOUS]
> >>>
> >>> * External Email - Caution *
> >>>
> >>>
> >>> Hi Greg,
> >>>
> >>> From 30,000 ft, I think that you would want to use the RutaEngine.
> >>>
> >>>
> >>>
> >>
> https://urldefense.com/v3/__https://uima.apache.org/d/ruta-current/tools.ruta.book.html*ugr.tools.ruta.ae.basic__;Iw!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWickztninUTU$
> >>>
> >>
> https://urldefense.com/v3/__https://javadoc.io/doc/org.apache.uima/ruta-core/latest/org/apache/uima/ruta/engine/RutaEngine.html__;!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWickzI7QF5CI$
> >>>
> >>
> https://urldefense.com/v3/__http://svn.apache.org/repos/asf/uima/ruta/trunk/ruta-core/src/main/java/org/apache/uima/ruta/engine/RutaEngine.java__;!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWickzJJ96zT4$
> >>> That seems to be the actual analysis engine that loads and uses rules
> to
> >>> create annotations.
> >>> While you could use an xml descriptor or use the piper "set" command
> and
> >>> do things like mapping ruta to ctakes type systems, I would take the
> >>> alternate approach of "copying" the initialize(..) and process (..)
> >> methods
> >>> and modify them to use ctakes types directly.
> >>>
> >>> Disclaimer:  I know very little about uima ruta.  At some point I did
> >> look
> >>> into it but it was for a specific (ctakes-derivative) project and I
> >> didn't
> >>> go further than basic doc perusal.
> >>>
> >>> If you move forward with this please let us all know what you find.  I
> >>> think that there will be great interest in the community.
> >>>
> >>> Sean
> >>> ________________________________________
> >>> From: Greg Silverman <[email protected]>
> >>> Sent: Tuesday, May 18, 2021 11:13 AM
> >>> To: [email protected]
> >>> Cc: Himanshu Shekhar Sahoo
> >>> Subject: Re: rule-based lookup for custom lexicon [EXTERNAL]
> >>>
> >>> * External Email - Caution *
> >>>
> >>>
> >>> Hi Sean,
> >>> I was wondering if there was a way to use rule-base lookup of a custom
> >>> lexicon within cTAKES (say a locally curated list of covd-19 symptoms).
> >>> When I Googled around, I stumbled on UIMA Ruta, but couldn't find
> >> anything
> >>> wrt to cTAKES specifics.
> >>>
> >>> Thanks!
> >>>
> >>>
> >>> Greg--
> >>>
> >>> On Tue, May 18, 2021 at 10:04 AM Finan, Sean <
> >>> [email protected]> wrote:
> >>>
> >>>>  To which ctakes component(s) are you referring?
> >>>> ________________________________________
> >>>> From: Greg Silverman <[email protected]>
> >>>> Sent: Sunday, May 16, 2021 6:02 PM
> >>>> To: [email protected]; Himanshu Shekhar Sahoo
> >>>> Subject: rule-based lookup for custom lexicon [EXTERNAL]
> >>>>
> >>>> * External Email - Caution *
> >>>>
> >>>>
> >>>> I looked all over and could not find any information on how to add
> this
> >>>> pipeline component to cTAKES. I assume it uses UIMA Ruta?
> >>>>
> >>>> Thanks in advance!
> >>>>
> >>>> Greg--
> >>>> --
> >>>> Greg M. Silverman
> >>>> Senior Systems Developer
> >>>> NLP/IE <
> >>>>
> >>
> https://urldefense.com/v3/__https://healthinformatics.umn.edu/research/nlpie-group__;!!NZvER7FxgEiBAiR_!6hN356eDesvWNYzsrDMaXgF6IkZw313QU2QUQw5M8Jysvh1K1JxjEBeztZicX1DM2jC0o7_0qAA$
> >>>> Department of Surgery
> >>>> University of Minnesota
> >>>> [email protected]
> >>>>
> >>>
> >>> --
> >>> Greg M. Silverman
> >>> Senior Systems Developer
> >>> NLP/IE <
> >>>
> >>
> https://urldefense.com/v3/__https://healthinformatics.umn.edu/research/nlpie-group__;!!NZvER7FxgEiBAiR_!8uKf_4SXyKdCmvlMHvRGddxlzofg64D4_zsPdCThqeMAyn2akyMNI8wqM6yNUZA2N93F-aAsR7I$
> >>> Department of Surgery
> >>> University of Minnesota
> >>> [email protected]
> >>>
> --
> Dr. Peter Klügl
> Head of Text Mining/Machine Learning
>
> Averbis GmbH
> Salzstr. 15
> 79098 Freiburg
> Germany
>
> Fon: +49 761 708 394 0
> Fax: +49 761 708 394 10
> Email: [email protected]
> Web:
> https://urldefense.com/v3/__https://averbis.com__;!!NZvER7FxgEiBAiR_!8k8JQUqNQYj-fQWELRFtxlACk1xSqLtVEnIHDmvmw6QnGtc3id_S4IOLqa6-Y9F4mOzpTfAOWo4$
>
> Headquarters: Freiburg im Breisgau
> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
>
>

Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS]

Reply via email to