Sean & everyone,  totally agree.   Ruta is an obvious candidate because it
is already so tightly coupled to UIMA.  It provides a very rich overlay to
the annotations and the type system.  Does anyone know if Ruta instances
are thread safe (assuming the JCAS is in thread-local storage)?   I saw one
conversation from a while ago asking the same question, but don't think I
saw an answer)

At times I've wondered whether a more generic rules engine that exposed
rules to the CAS could also be useful.  The logic wouldn't be restricted to
doing text interrogation.  Like  Ruta it would access the jCas via a Rules
Language but a predicate wiring API could provide support for a wide range
of operations involving external logic and data.   Also the ability to
invoke the rules stage at multiple times in the same pipeline with
different rule sets.   Perhaps all this could already be handled in Ruta's
extension mechanism.

Peter


On Wed, May 19, 2021 at 5:30 AM Finan, Sean <
sean.fi...@childrens.harvard.edu> wrote:

> Hi all,
> Correct.
>
> Tim  is correct in the sense that he is using a custom dictionary (custom
> synonyms, cuis, etc.) which kind of changes the "rules" of what the
> standard dictionary lookup considers a valid term based upon available
> tokens in the text.  There are other simple settings that further qualify
> how the standard dictionary lookup accepts or discards synonyms.
>
> I think that what Greg is asking about is something with introduced
> "logic" that can alter or remove terms already discovered by the standard
> dictionary lookup.
>
> Peter and Kean both outline some custom annotators that they have created
> to use logic that can alter/add/remove terms discovered by the standard
> dictionary lookup.  I do the same thing for different projects and advise
> everybody that applies ctakes to specific domains do the same.
>
> ctakes is a general purpose tool and results can definitely be improved
> when catered to a more narrow purpose.
>
> Back to Greg, I got the feeling that he might be interested in a more
> versatile annotator.  Introducing an engine that can utilize something like
> ruta has several advantages:
> 1.  You  can "easily" add complex rules in one place.
> 2.  You can change rules external to code ...
>   2a. the same pipeline can be catered to different projects without
> changing code in an annotator or creating a new annotator.
>   2b.  An end user who knows nothing about ctakes can change a ruta script
> to fit their purposes.
> 3. Rules are supported and documented by uima ruta, so you don't have to
> worry about that extra headache.
> 4. Once Greg adds it to apache ctakes (right? ;^) everybody in the
> community can apply ruta rules to their project.
>
> When I looked at it a few years ago it was for reason 2b.  In the end we
> went for different annotators like Peter and Kean outlined and just use
> piper file changes to satisfy #2 as that is definitely much easier.
> However, it doesn't benefit the community as a whole (#4).
>
> Cheers all, this is a great conversation!
>
> Sean
>
>
>
>
> ________________________________________
> From: Kean Kaufmann <k...@recordsone.com>
> Sent: Wednesday, May 19, 2021 7:50 AM
> To: dev@ctakes.apache.org
> Subject: Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS]
>
> * External Email - Caution *
>
>
> > yes,  the line between "lookup" and rule execution is a little blurry
> sometimes.
>
> Sure is.  I blur it with a set of annotators that extend dictionary
> annotations based on words or annotations covered by the same Chunk, e.g.
>
> DiseaseDisorderMention + /screen(ing)?/i = ProcedureMention
> MedicationMention + /dependenc[ey]|addiction/i = DiseaseDisorderMention
> DiseaseDisorderMention + AnatomicalSiteMention in same Chunk =
> DiseaseDisorderMention
> ProcedureMention + AnatomicalSiteMention in same Chunk = ProcedureMention
>
> Higher recall than the regular UmlsLookupAnnotator;
> higher precision than the UmlsOverlapLookupAnnotator (which skips a
> specified number of tokens regardless of syntax).
>
> I've been wanting a more general framework to fit this into, and thinking
> it might be Ruta.
> Thanks for the pointer to TokensRegex; I'll look at that as well.
>
>
> On Tue, May 18, 2021 at 5:39 PM Peter Abramowitsch <
> pabramowit...@gmail.com>
> wrote:
>
> > Hi All,  yes,  the line between "lookup" and rule execution is a little
> > blurry sometimes.   Here's some more blurriness.
> >
> > I've done something related, adapting a UIMA tokens regex engine for
> > Ctakes.  You create a new type in the TypeSystem.  In my case it uses
> > CONLLDEP Annotations as the tokens to reason over.   You can set up
> > expressions (rules) that look like this.
> > (Yes, this case is already covered in the dictionary, but it's an
> example)
> >
> > Matcher A:   (lemma=="be");
> > Matcher B:   /partially|partly/;
> > Matcher C:   /vaccinated/;
> >
> > Rule  vaccinated|CUI1234|SNOMED5678:  A? B?  C;
> >
> > You get the Annotation you've delegated to this task, with the entity
> > value  "vaccinated|1234|5678"  and the range which spanned the tokens
> that
> > caused the annotation rule to fire
> >
> > (See Stanford's Tokens Regex)
> >
> > Peter
> >
> >
> > On Tue, May 18, 2021 at 1:29 PM Miller, Timothy <
> > timothy.mil...@childrens.harvard.edu> wrote:
> >
> > > But Sean, isn't what he's asking for essentially already implemented in
> > > cTAKES as the custom dictionary? I'm currently using that approach for
> my
> > > covid container:
> > >
> > >
> >
> https://urldefense.com/v3/__https://github.com/Machine-Learning-for-Medical-Language/ctakes-covid-container__;!!NZvER7FxgEiBAiR_!7ZopTIhXKalQFx0xET_yET0agN2ZT8JWoa0UyqGSrXa4w-h_9-tRCEeiS4pr6s2Y-T4elV3bYac$
> > > Tim
> > >
> > > ________________________________________
> > > From: Finan, Sean <sean.fi...@childrens.harvard.edu>
> > > Sent: Tuesday, May 18, 2021 11:55 AM
> > > To: dev@ctakes.apache.org
> > > Cc: Himanshu Shekhar Sahoo
> > > Subject: Re: rule-based lookup for custom lexicon [EXTERNAL]
> [SUSPICIOUS]
> > >
> > > * External Email - Caution *
> > >
> > >
> > > Hi Greg,
> > >
> > > From 30,000 ft, I think that you would want to use the RutaEngine.
> > >
> > >
> > >
> >
> https://urldefense.com/v3/__https://uima.apache.org/d/ruta-current/tools.ruta.book.html*ugr.tools.ruta.ae.basic__;Iw!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWickztninUTU$
> > >
> > >
> >
> https://urldefense.com/v3/__https://javadoc.io/doc/org.apache.uima/ruta-core/latest/org/apache/uima/ruta/engine/RutaEngine.html__;!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWickzI7QF5CI$
> > >
> > >
> >
> https://urldefense.com/v3/__http://svn.apache.org/repos/asf/uima/ruta/trunk/ruta-core/src/main/java/org/apache/uima/ruta/engine/RutaEngine.java__;!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWickzJJ96zT4$
> > >
> > > That seems to be the actual analysis engine that loads and uses rules
> to
> > > create annotations.
> > > While you could use an xml descriptor or use the piper "set" command
> and
> > > do things like mapping ruta to ctakes type systems, I would take the
> > > alternate approach of "copying" the initialize(..) and process (..)
> > methods
> > > and modify them to use ctakes types directly.
> > >
> > > Disclaimer:  I know very little about uima ruta.  At some point I did
> > look
> > > into it but it was for a specific (ctakes-derivative) project and I
> > didn't
> > > go further than basic doc perusal.
> > >
> > > If you move forward with this please let us all know what you find.  I
> > > think that there will be great interest in the community.
> > >
> > > Sean
> > > ________________________________________
> > > From: Greg Silverman <g...@umn.edu.INVALID>
> > > Sent: Tuesday, May 18, 2021 11:13 AM
> > > To: dev@ctakes.apache.org
> > > Cc: Himanshu Shekhar Sahoo
> > > Subject: Re: rule-based lookup for custom lexicon [EXTERNAL]
> > >
> > > * External Email - Caution *
> > >
> > >
> > > Hi Sean,
> > > I was wondering if there was a way to use rule-base lookup of a custom
> > > lexicon within cTAKES (say a locally curated list of covd-19 symptoms).
> > > When I Googled around, I stumbled on UIMA Ruta, but couldn't find
> > anything
> > > wrt to cTAKES specifics.
> > >
> > > Thanks!
> > >
> > >
> > > Greg--
> > >
> > > On Tue, May 18, 2021 at 10:04 AM Finan, Sean <
> > > sean.fi...@childrens.harvard.edu> wrote:
> > >
> > > >  To which ctakes component(s) are you referring?
> > > > ________________________________________
> > > > From: Greg Silverman <g...@umn.edu.INVALID>
> > > > Sent: Sunday, May 16, 2021 6:02 PM
> > > > To: dev@ctakes.apache.org; Himanshu Shekhar Sahoo
> > > > Subject: rule-based lookup for custom lexicon [EXTERNAL]
> > > >
> > > > * External Email - Caution *
> > > >
> > > >
> > > > I looked all over and could not find any information on how to add
> this
> > > > pipeline component to cTAKES. I assume it uses UIMA Ruta?
> > > >
> > > > Thanks in advance!
> > > >
> > > > Greg--
> > > > --
> > > > Greg M. Silverman
> > > > Senior Systems Developer
> > > > NLP/IE <
> > > >
> > >
> >
> https://urldefense.com/v3/__https://healthinformatics.umn.edu/research/nlpie-group__;!!NZvER7FxgEiBAiR_!6hN356eDesvWNYzsrDMaXgF6IkZw313QU2QUQw5M8Jysvh1K1JxjEBeztZicX1DM2jC0o7_0qAA$
> > > > >
> > > > Department of Surgery
> > > > University of Minnesota
> > > > g...@umn.edu
> > > >
> > >
> > >
> > > --
> > > Greg M. Silverman
> > > Senior Systems Developer
> > > NLP/IE <
> > >
> >
> https://urldefense.com/v3/__https://healthinformatics.umn.edu/research/nlpie-group__;!!NZvER7FxgEiBAiR_!8uKf_4SXyKdCmvlMHvRGddxlzofg64D4_zsPdCThqeMAyn2akyMNI8wqM6yNUZA2N93F-aAsR7I$
> > > >
> > > Department of Surgery
> > > University of Minnesota
> > > g...@umn.edu
> > >
> >
>

Reply via email to