Hi all,
if you are interested in UIMA Ruta and want to know more about it, you can always ask on the UIMA user list or me directly (I am the creator of UIMA Ruta). I can also prepare some slides and we can have an informal video chat where I give an overview of Ruta. I am of course not objective here (for several reasons) but I think UIMA Ruta could be really useful for cTAKES. It was originally developed for segmenting and processing discharge letters and similar clincial documents. Since then (>10 years), Ruta has always been applied to clincial documents and is being deployed in production by several companies. The language has some advantages and disadvantages compared to other rule languages. In the context of cTAKES, the direct/comprehensive support of UIMA and the IDE dev support are maybe the most relevant advantages. I was thinking about creating some introductory examples for the combination and usage of UIMA Ruta and cTAKES. If you have a good use case, let me know. Best, (another) Peter Am 19.05.2021 um 14:30 schrieb Finan, Sean: > Hi all, > Correct. > > Tim is correct in the sense that he is using a custom dictionary (custom > synonyms, cuis, etc.) which kind of changes the "rules" of what the standard > dictionary lookup considers a valid term based upon available tokens in the > text. There are other simple settings that further qualify how the standard > dictionary lookup accepts or discards synonyms. > > I think that what Greg is asking about is something with introduced "logic" > that can alter or remove terms already discovered by the standard dictionary > lookup. > > Peter and Kean both outline some custom annotators that they have created to > use logic that can alter/add/remove terms discovered by the standard > dictionary lookup. I do the same thing for different projects and advise > everybody that applies ctakes to specific domains do the same. > > ctakes is a general purpose tool and results can definitely be improved when > catered to a more narrow purpose. > > Back to Greg, I got the feeling that he might be interested in a more > versatile annotator. Introducing an engine that can utilize something like > ruta has several advantages: > 1. You can "easily" add complex rules in one place. > 2. You can change rules external to code ... > 2a. the same pipeline can be catered to different projects without changing > code in an annotator or creating a new annotator. > 2b. An end user who knows nothing about ctakes can change a ruta script to > fit their purposes. > 3. Rules are supported and documented by uima ruta, so you don't have to > worry about that extra headache. > 4. Once Greg adds it to apache ctakes (right? ;^) everybody in the community > can apply ruta rules to their project. > > When I looked at it a few years ago it was for reason 2b. In the end we went > for different annotators like Peter and Kean outlined and just use piper file > changes to satisfy #2 as that is definitely much easier. However, it doesn't > benefit the community as a whole (#4). > > Cheers all, this is a great conversation! > > Sean > > > > > ________________________________________ > From: Kean Kaufmann <k...@recordsone.com> > Sent: Wednesday, May 19, 2021 7:50 AM > To: dev@ctakes.apache.org > Subject: Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS] > > * External Email - Caution * > > >> yes, the line between "lookup" and rule execution is a little blurry > sometimes. > > Sure is. I blur it with a set of annotators that extend dictionary > annotations based on words or annotations covered by the same Chunk, e.g. > > DiseaseDisorderMention + /screen(ing)?/i = ProcedureMention > MedicationMention + /dependenc[ey]|addiction/i = DiseaseDisorderMention > DiseaseDisorderMention + AnatomicalSiteMention in same Chunk = > DiseaseDisorderMention > ProcedureMention + AnatomicalSiteMention in same Chunk = ProcedureMention > > Higher recall than the regular UmlsLookupAnnotator; > higher precision than the UmlsOverlapLookupAnnotator (which skips a > specified number of tokens regardless of syntax). > > I've been wanting a more general framework to fit this into, and thinking > it might be Ruta. > Thanks for the pointer to TokensRegex; I'll look at that as well. > > > On Tue, May 18, 2021 at 5:39 PM Peter Abramowitsch <pabramowit...@gmail.com> > wrote: > >> Hi All, yes, the line between "lookup" and rule execution is a little >> blurry sometimes. Here's some more blurriness. >> >> I've done something related, adapting a UIMA tokens regex engine for >> Ctakes. You create a new type in the TypeSystem. In my case it uses >> CONLLDEP Annotations as the tokens to reason over. You can set up >> expressions (rules) that look like this. >> (Yes, this case is already covered in the dictionary, but it's an example) >> >> Matcher A: (lemma=="be"); >> Matcher B: /partially|partly/; >> Matcher C: /vaccinated/; >> >> Rule vaccinated|CUI1234|SNOMED5678: A? B? C; >> >> You get the Annotation you've delegated to this task, with the entity >> value "vaccinated|1234|5678" and the range which spanned the tokens that >> caused the annotation rule to fire >> >> (See Stanford's Tokens Regex) >> >> Peter >> >> >> On Tue, May 18, 2021 at 1:29 PM Miller, Timothy < >> timothy.mil...@childrens.harvard.edu> wrote: >> >>> But Sean, isn't what he's asking for essentially already implemented in >>> cTAKES as the custom dictionary? I'm currently using that approach for my >>> covid container: >>> >>> >> https://urldefense.com/v3/__https://github.com/Machine-Learning-for-Medical-Language/ctakes-covid-container__;!!NZvER7FxgEiBAiR_!7ZopTIhXKalQFx0xET_yET0agN2ZT8JWoa0UyqGSrXa4w-h_9-tRCEeiS4pr6s2Y-T4elV3bYac$ >>> Tim >>> >>> ________________________________________ >>> From: Finan, Sean <sean.fi...@childrens.harvard.edu> >>> Sent: Tuesday, May 18, 2021 11:55 AM >>> To: dev@ctakes.apache.org >>> Cc: Himanshu Shekhar Sahoo >>> Subject: Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS] >>> >>> * External Email - Caution * >>> >>> >>> Hi Greg, >>> >>> From 30,000 ft, I think that you would want to use the RutaEngine. >>> >>> >>> >> https://urldefense.com/v3/__https://uima.apache.org/d/ruta-current/tools.ruta.book.html*ugr.tools.ruta.ae.basic__;Iw!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWickztninUTU$ >>> >> https://urldefense.com/v3/__https://javadoc.io/doc/org.apache.uima/ruta-core/latest/org/apache/uima/ruta/engine/RutaEngine.html__;!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWickzI7QF5CI$ >>> >> https://urldefense.com/v3/__http://svn.apache.org/repos/asf/uima/ruta/trunk/ruta-core/src/main/java/org/apache/uima/ruta/engine/RutaEngine.java__;!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWickzJJ96zT4$ >>> That seems to be the actual analysis engine that loads and uses rules to >>> create annotations. >>> While you could use an xml descriptor or use the piper "set" command and >>> do things like mapping ruta to ctakes type systems, I would take the >>> alternate approach of "copying" the initialize(..) and process (..) >> methods >>> and modify them to use ctakes types directly. >>> >>> Disclaimer: I know very little about uima ruta. At some point I did >> look >>> into it but it was for a specific (ctakes-derivative) project and I >> didn't >>> go further than basic doc perusal. >>> >>> If you move forward with this please let us all know what you find. I >>> think that there will be great interest in the community. >>> >>> Sean >>> ________________________________________ >>> From: Greg Silverman <g...@umn.edu.INVALID> >>> Sent: Tuesday, May 18, 2021 11:13 AM >>> To: dev@ctakes.apache.org >>> Cc: Himanshu Shekhar Sahoo >>> Subject: Re: rule-based lookup for custom lexicon [EXTERNAL] >>> >>> * External Email - Caution * >>> >>> >>> Hi Sean, >>> I was wondering if there was a way to use rule-base lookup of a custom >>> lexicon within cTAKES (say a locally curated list of covd-19 symptoms). >>> When I Googled around, I stumbled on UIMA Ruta, but couldn't find >> anything >>> wrt to cTAKES specifics. >>> >>> Thanks! >>> >>> >>> Greg-- >>> >>> On Tue, May 18, 2021 at 10:04 AM Finan, Sean < >>> sean.fi...@childrens.harvard.edu> wrote: >>> >>>> To which ctakes component(s) are you referring? >>>> ________________________________________ >>>> From: Greg Silverman <g...@umn.edu.INVALID> >>>> Sent: Sunday, May 16, 2021 6:02 PM >>>> To: dev@ctakes.apache.org; Himanshu Shekhar Sahoo >>>> Subject: rule-based lookup for custom lexicon [EXTERNAL] >>>> >>>> * External Email - Caution * >>>> >>>> >>>> I looked all over and could not find any information on how to add this >>>> pipeline component to cTAKES. I assume it uses UIMA Ruta? >>>> >>>> Thanks in advance! >>>> >>>> Greg-- >>>> -- >>>> Greg M. Silverman >>>> Senior Systems Developer >>>> NLP/IE < >>>> >> https://urldefense.com/v3/__https://healthinformatics.umn.edu/research/nlpie-group__;!!NZvER7FxgEiBAiR_!6hN356eDesvWNYzsrDMaXgF6IkZw313QU2QUQw5M8Jysvh1K1JxjEBeztZicX1DM2jC0o7_0qAA$ >>>> Department of Surgery >>>> University of Minnesota >>>> g...@umn.edu >>>> >>> >>> -- >>> Greg M. Silverman >>> Senior Systems Developer >>> NLP/IE < >>> >> https://urldefense.com/v3/__https://healthinformatics.umn.edu/research/nlpie-group__;!!NZvER7FxgEiBAiR_!8uKf_4SXyKdCmvlMHvRGddxlzofg64D4_zsPdCThqeMAyn2akyMNI8wqM6yNUZA2N93F-aAsR7I$ >>> Department of Surgery >>> University of Minnesota >>> g...@umn.edu >>> -- Dr. Peter Klügl Head of Text Mining/Machine Learning Averbis GmbH Salzstr. 15 79098 Freiburg Germany Fon: +49 761 708 394 0 Fax: +49 761 708 394 10 Email: peter.klu...@averbis.com Web: https://averbis.com Headquarters: Freiburg im Breisgau Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080 Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó