> > If anybody out there in the general community is interested, please reply > on this thread and maybe we can coordinate a single presentation time.
Yes please. Thanks, Sean and (other) Peter! On Wed, May 19, 2021 at 3:42 PM Finan, Sean < sean.fi...@childrens.harvard.edu> wrote: > Hi (other) Peter, > > Many thanks for jumping in on this! > > I would definitely be interested in seeing some examples, even though I > don't have any specific use case right now. > > I will ask a few local people and see if they are interested in an > informal video chat. If anybody out there in the general community is > interested, please reply on this thread and maybe we can coordinate a > single presentation time. > > Cheers, > > Sean > ________________________________________ > From: Peter Klügl <peter.klu...@averbis.com> > Sent: Wednesday, May 19, 2021 3:33 PM > To: dev@ctakes.apache.org > Subject: Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS] > > * External Email - Caution * > > > Hi all, > > > if you are interested in UIMA Ruta and want to know more about it, you > can always ask on the UIMA user list or me directly (I am the creator of > UIMA Ruta). I can also prepare some slides and we can have an informal > video chat where I give an overview of Ruta. > > > I am of course not objective here (for several reasons) but I think UIMA > Ruta could be really useful for cTAKES. It was originally developed for > segmenting and processing discharge letters and similar clincial > documents. Since then (>10 years), Ruta has always been applied to > clincial documents and is being deployed in production by several > companies. The language has some advantages and disadvantages compared > to other rule languages. In the context of cTAKES, the > direct/comprehensive support of UIMA and the IDE dev support are maybe > the most relevant advantages. > > > I was thinking about creating some introductory examples for the > combination and usage of UIMA Ruta and cTAKES. If you have a good use > case, let me know. > > > Best, > > > (another) Peter > > > Am 19.05.2021 um 14:30 schrieb Finan, Sean: > > Hi all, > > Correct. > > > > Tim is correct in the sense that he is using a custom dictionary > (custom synonyms, cuis, etc.) which kind of changes the "rules" of what the > standard dictionary lookup considers a valid term based upon available > tokens in the text. There are other simple settings that further qualify > how the standard dictionary lookup accepts or discards synonyms. > > > > I think that what Greg is asking about is something with introduced > "logic" that can alter or remove terms already discovered by the standard > dictionary lookup. > > > > Peter and Kean both outline some custom annotators that they have > created to use logic that can alter/add/remove terms discovered by the > standard dictionary lookup. I do the same thing for different projects and > advise everybody that applies ctakes to specific domains do the same. > > > > ctakes is a general purpose tool and results can definitely be improved > when catered to a more narrow purpose. > > > > Back to Greg, I got the feeling that he might be interested in a more > versatile annotator. Introducing an engine that can utilize something like > ruta has several advantages: > > 1. You can "easily" add complex rules in one place. > > 2. You can change rules external to code ... > > 2a. the same pipeline can be catered to different projects without > changing code in an annotator or creating a new annotator. > > 2b. An end user who knows nothing about ctakes can change a ruta > script to fit their purposes. > > 3. Rules are supported and documented by uima ruta, so you don't have to > worry about that extra headache. > > 4. Once Greg adds it to apache ctakes (right? ;^) everybody in the > community can apply ruta rules to their project. > > > > When I looked at it a few years ago it was for reason 2b. In the end we > went for different annotators like Peter and Kean outlined and just use > piper file changes to satisfy #2 as that is definitely much easier. > However, it doesn't benefit the community as a whole (#4). > > > > Cheers all, this is a great conversation! > > > > Sean > > > > > > > > > > ________________________________________ > > From: Kean Kaufmann <k...@recordsone.com> > > Sent: Wednesday, May 19, 2021 7:50 AM > > To: dev@ctakes.apache.org > > Subject: Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS] > > > > * External Email - Caution * > > > > > >> yes, the line between "lookup" and rule execution is a little blurry > > sometimes. > > > > Sure is. I blur it with a set of annotators that extend dictionary > > annotations based on words or annotations covered by the same Chunk, e.g. > > > > DiseaseDisorderMention + /screen(ing)?/i = ProcedureMention > > MedicationMention + /dependenc[ey]|addiction/i = DiseaseDisorderMention > > DiseaseDisorderMention + AnatomicalSiteMention in same Chunk = > > DiseaseDisorderMention > > ProcedureMention + AnatomicalSiteMention in same Chunk = ProcedureMention > > > > Higher recall than the regular UmlsLookupAnnotator; > > higher precision than the UmlsOverlapLookupAnnotator (which skips a > > specified number of tokens regardless of syntax). > > > > I've been wanting a more general framework to fit this into, and thinking > > it might be Ruta. > > Thanks for the pointer to TokensRegex; I'll look at that as well. > > > > > > On Tue, May 18, 2021 at 5:39 PM Peter Abramowitsch < > pabramowit...@gmail.com> > > wrote: > > > >> Hi All, yes, the line between "lookup" and rule execution is a little > >> blurry sometimes. Here's some more blurriness. > >> > >> I've done something related, adapting a UIMA tokens regex engine for > >> Ctakes. You create a new type in the TypeSystem. In my case it uses > >> CONLLDEP Annotations as the tokens to reason over. You can set up > >> expressions (rules) that look like this. > >> (Yes, this case is already covered in the dictionary, but it's an > example) > >> > >> Matcher A: (lemma=="be"); > >> Matcher B: /partially|partly/; > >> Matcher C: /vaccinated/; > >> > >> Rule vaccinated|CUI1234|SNOMED5678: A? B? C; > >> > >> You get the Annotation you've delegated to this task, with the entity > >> value "vaccinated|1234|5678" and the range which spanned the tokens > that > >> caused the annotation rule to fire > >> > >> (See Stanford's Tokens Regex) > >> > >> Peter > >> > >> > >> On Tue, May 18, 2021 at 1:29 PM Miller, Timothy < > >> timothy.mil...@childrens.harvard.edu> wrote: > >> > >>> But Sean, isn't what he's asking for essentially already implemented in > >>> cTAKES as the custom dictionary? I'm currently using that approach for > my > >>> covid container: > >>> > >>> > >> > https://urldefense.com/v3/__https://github.com/Machine-Learning-for-Medical-Language/ctakes-covid-container__;!!NZvER7FxgEiBAiR_!7ZopTIhXKalQFx0xET_yET0agN2ZT8JWoa0UyqGSrXa4w-h_9-tRCEeiS4pr6s2Y-T4elV3bYac$ > >>> Tim > >>> > >>> ________________________________________ > >>> From: Finan, Sean <sean.fi...@childrens.harvard.edu> > >>> Sent: Tuesday, May 18, 2021 11:55 AM > >>> To: dev@ctakes.apache.org > >>> Cc: Himanshu Shekhar Sahoo > >>> Subject: Re: rule-based lookup for custom lexicon [EXTERNAL] > [SUSPICIOUS] > >>> > >>> * External Email - Caution * > >>> > >>> > >>> Hi Greg, > >>> > >>> From 30,000 ft, I think that you would want to use the RutaEngine. > >>> > >>> > >>> > >> > https://urldefense.com/v3/__https://uima.apache.org/d/ruta-current/tools.ruta.book.html*ugr.tools.ruta.ae.basic__;Iw!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWickztninUTU$ > >>> > >> > https://urldefense.com/v3/__https://javadoc.io/doc/org.apache.uima/ruta-core/latest/org/apache/uima/ruta/engine/RutaEngine.html__;!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWickzI7QF5CI$ > >>> > >> > https://urldefense.com/v3/__http://svn.apache.org/repos/asf/uima/ruta/trunk/ruta-core/src/main/java/org/apache/uima/ruta/engine/RutaEngine.java__;!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWickzJJ96zT4$ > >>> That seems to be the actual analysis engine that loads and uses rules > to > >>> create annotations. > >>> While you could use an xml descriptor or use the piper "set" command > and > >>> do things like mapping ruta to ctakes type systems, I would take the > >>> alternate approach of "copying" the initialize(..) and process (..) > >> methods > >>> and modify them to use ctakes types directly. > >>> > >>> Disclaimer: I know very little about uima ruta. At some point I did > >> look > >>> into it but it was for a specific (ctakes-derivative) project and I > >> didn't > >>> go further than basic doc perusal. > >>> > >>> If you move forward with this please let us all know what you find. I > >>> think that there will be great interest in the community. > >>> > >>> Sean > >>> ________________________________________ > >>> From: Greg Silverman <g...@umn.edu.INVALID> > >>> Sent: Tuesday, May 18, 2021 11:13 AM > >>> To: dev@ctakes.apache.org > >>> Cc: Himanshu Shekhar Sahoo > >>> Subject: Re: rule-based lookup for custom lexicon [EXTERNAL] > >>> > >>> * External Email - Caution * > >>> > >>> > >>> Hi Sean, > >>> I was wondering if there was a way to use rule-base lookup of a custom > >>> lexicon within cTAKES (say a locally curated list of covd-19 symptoms). > >>> When I Googled around, I stumbled on UIMA Ruta, but couldn't find > >> anything > >>> wrt to cTAKES specifics. > >>> > >>> Thanks! > >>> > >>> > >>> Greg-- > >>> > >>> On Tue, May 18, 2021 at 10:04 AM Finan, Sean < > >>> sean.fi...@childrens.harvard.edu> wrote: > >>> > >>>> To which ctakes component(s) are you referring? > >>>> ________________________________________ > >>>> From: Greg Silverman <g...@umn.edu.INVALID> > >>>> Sent: Sunday, May 16, 2021 6:02 PM > >>>> To: dev@ctakes.apache.org; Himanshu Shekhar Sahoo > >>>> Subject: rule-based lookup for custom lexicon [EXTERNAL] > >>>> > >>>> * External Email - Caution * > >>>> > >>>> > >>>> I looked all over and could not find any information on how to add > this > >>>> pipeline component to cTAKES. I assume it uses UIMA Ruta? > >>>> > >>>> Thanks in advance! > >>>> > >>>> Greg-- > >>>> -- > >>>> Greg M. Silverman > >>>> Senior Systems Developer > >>>> NLP/IE < > >>>> > >> > https://urldefense.com/v3/__https://healthinformatics.umn.edu/research/nlpie-group__;!!NZvER7FxgEiBAiR_!6hN356eDesvWNYzsrDMaXgF6IkZw313QU2QUQw5M8Jysvh1K1JxjEBeztZicX1DM2jC0o7_0qAA$ > >>>> Department of Surgery > >>>> University of Minnesota > >>>> g...@umn.edu > >>>> > >>> > >>> -- > >>> Greg M. Silverman > >>> Senior Systems Developer > >>> NLP/IE < > >>> > >> > https://urldefense.com/v3/__https://healthinformatics.umn.edu/research/nlpie-group__;!!NZvER7FxgEiBAiR_!8uKf_4SXyKdCmvlMHvRGddxlzofg64D4_zsPdCThqeMAyn2akyMNI8wqM6yNUZA2N93F-aAsR7I$ > >>> Department of Surgery > >>> University of Minnesota > >>> g...@umn.edu > >>> > -- > Dr. Peter Klügl > Head of Text Mining/Machine Learning > > Averbis GmbH > Salzstr. 15 > 79098 Freiburg > Germany > > Fon: +49 761 708 394 0 > Fax: +49 761 708 394 10 > Email: peter.klu...@averbis.com > Web: > https://urldefense.com/v3/__https://averbis.com__;!!NZvER7FxgEiBAiR_!8k8JQUqNQYj-fQWELRFtxlACk1xSqLtVEnIHDmvmw6QnGtc3id_S4IOLqa6-Y9F4mOzpTfAOWo4$ > > Headquarters: Freiburg im Breisgau > Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080 > Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó > >