Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS]

Peter Klügl Wed, 19 May 2021 12:33:29 -0700

Hi all,


if you are interested in UIMA Ruta and want to know more about it, you
can always ask on the UIMA user list or me directly (I am the creator of
UIMA Ruta). I can also prepare some slides and we can have an informal
video chat where I give an overview of Ruta.


I am of course not objective here (for several reasons) but I think UIMA
Ruta could be really useful for cTAKES. It was originally developed for
segmenting and processing discharge letters and similar clincial
documents. Since then (>10 years), Ruta has always been applied to
clincial documents and is being deployed in production by several
companies. The language has some advantages and disadvantages compared
to other rule languages. In the context of cTAKES, the
direct/comprehensive support of UIMA and the IDE dev support are maybe
the most relevant advantages.


I was thinking about creating some introductory examples for the
combination and usage of UIMA Ruta and cTAKES. If you have a good use
case, let me know.


Best,


(another) Peter


Am 19.05.2021 um 14:30 schrieb Finan, Sean:
> Hi all,
> Correct.
>
> Tim  is correct in the sense that he is using a custom dictionary (custom 
> synonyms, cuis, etc.) which kind of changes the "rules" of what the standard 
> dictionary lookup considers a valid term based upon available tokens in the 
> text.  There are other simple settings that further qualify how the standard 
> dictionary lookup accepts or discards synonyms.
>
> I think that what Greg is asking about is something with introduced "logic" 
> that can alter or remove terms already discovered by the standard dictionary 
> lookup.
>
> Peter and Kean both outline some custom annotators that they have created to 
> use logic that can alter/add/remove terms discovered by the standard 
> dictionary lookup.  I do the same thing for different projects and advise 
> everybody that applies ctakes to specific domains do the same.  
>
> ctakes is a general purpose tool and results can definitely be improved when 
> catered to a more narrow purpose.
>
> Back to Greg, I got the feeling that he might be interested in a more 
> versatile annotator.  Introducing an engine that can utilize something like 
> ruta has several advantages:
> 1.  You  can "easily" add complex rules in one place.
> 2.  You can change rules external to code ...
>   2a. the same pipeline can be catered to different projects without changing 
> code in an annotator or creating a new annotator.
>   2b.  An end user who knows nothing about ctakes can change a ruta script to 
> fit their purposes.
> 3. Rules are supported and documented by uima ruta, so you don't have to 
> worry about that extra headache.
> 4. Once Greg adds it to apache ctakes (right? ;^) everybody in the community 
> can apply ruta rules to their project.
>
> When I looked at it a few years ago it was for reason 2b.  In the end we went 
> for different annotators like Peter and Kean outlined and just use piper file 
> changes to satisfy #2 as that is definitely much easier.  However, it doesn't 
> benefit the community as a whole (#4).
>
> Cheers all, this is a great conversation!
>
> Sean
>
>
>
>
> ________________________________________
> From: Kean Kaufmann <[email protected]>
> Sent: Wednesday, May 19, 2021 7:50 AM
> To: [email protected]
> Subject: Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS]
>
> * External Email - Caution *
>
>
>> yes,  the line between "lookup" and rule execution is a little blurry
> sometimes.
>
> Sure is.  I blur it with a set of annotators that extend dictionary
> annotations based on words or annotations covered by the same Chunk, e.g.
>
> DiseaseDisorderMention + /screen(ing)?/i = ProcedureMention
> MedicationMention + /dependenc[ey]|addiction/i = DiseaseDisorderMention
> DiseaseDisorderMention + AnatomicalSiteMention in same Chunk =
> DiseaseDisorderMention
> ProcedureMention + AnatomicalSiteMention in same Chunk = ProcedureMention
>
> Higher recall than the regular UmlsLookupAnnotator;
> higher precision than the UmlsOverlapLookupAnnotator (which skips a
> specified number of tokens regardless of syntax).
>
> I've been wanting a more general framework to fit this into, and thinking
> it might be Ruta.
> Thanks for the pointer to TokensRegex; I'll look at that as well.
>
>
> On Tue, May 18, 2021 at 5:39 PM Peter Abramowitsch <[email protected]>
> wrote:
>
>> Hi All,  yes,  the line between "lookup" and rule execution is a little
>> blurry sometimes.   Here's some more blurriness.
>>
>> I've done something related, adapting a UIMA tokens regex engine for
>> Ctakes.  You create a new type in the TypeSystem.  In my case it uses
>> CONLLDEP Annotations as the tokens to reason over.   You can set up
>> expressions (rules) that look like this.
>> (Yes, this case is already covered in the dictionary, but it's an example)
>>
>> Matcher A:   (lemma=="be");
>> Matcher B:   /partially|partly/;
>> Matcher C:   /vaccinated/;
>>
>> Rule  vaccinated|CUI1234|SNOMED5678:  A? B?  C;
>>
>> You get the Annotation you've delegated to this task, with the entity
>> value  "vaccinated|1234|5678"  and the range which spanned the tokens that
>> caused the annotation rule to fire
>>
>> (See Stanford's Tokens Regex)
>>
>> Peter
>>
>>
>> On Tue, May 18, 2021 at 1:29 PM Miller, Timothy <
>> [email protected]> wrote:
>>
>>> But Sean, isn't what he's asking for essentially already implemented in
>>> cTAKES as the custom dictionary? I'm currently using that approach for my
>>> covid container:
>>>
>>>
>> https://urldefense.com/v3/__https://github.com/Machine-Learning-for-Medical-Language/ctakes-covid-container__;!!NZvER7FxgEiBAiR_!7ZopTIhXKalQFx0xET_yET0agN2ZT8JWoa0UyqGSrXa4w-h_9-tRCEeiS4pr6s2Y-T4elV3bYac$
>>> Tim
>>>
>>> ________________________________________
>>> From: Finan, Sean <[email protected]>
>>> Sent: Tuesday, May 18, 2021 11:55 AM
>>> To: [email protected]
>>> Cc: Himanshu Shekhar Sahoo
>>> Subject: Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS]
>>>
>>> * External Email - Caution *
>>>
>>>
>>> Hi Greg,
>>>
>>> From 30,000 ft, I think that you would want to use the RutaEngine.
>>>
>>>
>>>
>> https://urldefense.com/v3/__https://uima.apache.org/d/ruta-current/tools.ruta.book.html*ugr.tools.ruta.ae.basic__;Iw!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWickztninUTU$
>>>
>> https://urldefense.com/v3/__https://javadoc.io/doc/org.apache.uima/ruta-core/latest/org/apache/uima/ruta/engine/RutaEngine.html__;!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWickzI7QF5CI$
>>>
>> https://urldefense.com/v3/__http://svn.apache.org/repos/asf/uima/ruta/trunk/ruta-core/src/main/java/org/apache/uima/ruta/engine/RutaEngine.java__;!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWickzJJ96zT4$
>>> That seems to be the actual analysis engine that loads and uses rules to
>>> create annotations.
>>> While you could use an xml descriptor or use the piper "set" command and
>>> do things like mapping ruta to ctakes type systems, I would take the
>>> alternate approach of "copying" the initialize(..) and process (..)
>> methods
>>> and modify them to use ctakes types directly.
>>>
>>> Disclaimer:  I know very little about uima ruta.  At some point I did
>> look
>>> into it but it was for a specific (ctakes-derivative) project and I
>> didn't
>>> go further than basic doc perusal.
>>>
>>> If you move forward with this please let us all know what you find.  I
>>> think that there will be great interest in the community.
>>>
>>> Sean
>>> ________________________________________
>>> From: Greg Silverman <[email protected]>
>>> Sent: Tuesday, May 18, 2021 11:13 AM
>>> To: [email protected]
>>> Cc: Himanshu Shekhar Sahoo
>>> Subject: Re: rule-based lookup for custom lexicon [EXTERNAL]
>>>
>>> * External Email - Caution *
>>>
>>>
>>> Hi Sean,
>>> I was wondering if there was a way to use rule-base lookup of a custom
>>> lexicon within cTAKES (say a locally curated list of covd-19 symptoms).
>>> When I Googled around, I stumbled on UIMA Ruta, but couldn't find
>> anything
>>> wrt to cTAKES specifics.
>>>
>>> Thanks!
>>>
>>>
>>> Greg--
>>>
>>> On Tue, May 18, 2021 at 10:04 AM Finan, Sean <
>>> [email protected]> wrote:
>>>
>>>>  To which ctakes component(s) are you referring?
>>>> ________________________________________
>>>> From: Greg Silverman <[email protected]>
>>>> Sent: Sunday, May 16, 2021 6:02 PM
>>>> To: [email protected]; Himanshu Shekhar Sahoo
>>>> Subject: rule-based lookup for custom lexicon [EXTERNAL]
>>>>
>>>> * External Email - Caution *
>>>>
>>>>
>>>> I looked all over and could not find any information on how to add this
>>>> pipeline component to cTAKES. I assume it uses UIMA Ruta?
>>>>
>>>> Thanks in advance!
>>>>
>>>> Greg--
>>>> --
>>>> Greg M. Silverman
>>>> Senior Systems Developer
>>>> NLP/IE <
>>>>
>> https://urldefense.com/v3/__https://healthinformatics.umn.edu/research/nlpie-group__;!!NZvER7FxgEiBAiR_!6hN356eDesvWNYzsrDMaXgF6IkZw313QU2QUQw5M8Jysvh1K1JxjEBeztZicX1DM2jC0o7_0qAA$
>>>> Department of Surgery
>>>> University of Minnesota
>>>> [email protected]
>>>>
>>>
>>> --
>>> Greg M. Silverman
>>> Senior Systems Developer
>>> NLP/IE <
>>>
>> https://urldefense.com/v3/__https://healthinformatics.umn.edu/research/nlpie-group__;!!NZvER7FxgEiBAiR_!8uKf_4SXyKdCmvlMHvRGddxlzofg64D4_zsPdCThqeMAyn2akyMNI8wqM6yNUZA2N93F-aAsR7I$
>>> Department of Surgery
>>> University of Minnesota
>>> [email protected]
>>>
-- 
Dr. Peter Klügl
Head of Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: [email protected]
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó

Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS]

Reply via email to