Thanks everyone! This indeed is an enlightening conversation. Best!
On Wed, May 19, 2021 at 3:10 PM Shyam Bhimani <sbhim...@targetrwe.com> wrote: > I am interested. Thank you > > Shyam Bhimani > Software Engineer > > > > > CONFIDENTIALITY NOTICE: The contents of this email message and any > attachments are intended solely for the addressee(s) and may > contain confidential and/or privileged information and may be legally > protected from disclosure. > > -----Original Message----- > From: Kean Kaufmann <k...@recordsone.com> > Sent: Wednesday, May 19, 2021 2:08 PM > To: dev@ctakes.apache.org > Subject: Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS] > > ** WARNING: This email originated from outside of Target RWE. ** > > > > > > If anybody out there in the general community is interested, please > > reply on this thread and maybe we can coordinate a single presentation > time. > > > Yes please. Thanks, Sean and (other) Peter! > > On Wed, May 19, 2021 at 3:42 PM Finan, Sean < > sean.fi...@childrens.harvard.edu> wrote: > > > Hi (other) Peter, > > > > Many thanks for jumping in on this! > > > > I would definitely be interested in seeing some examples, even though > > I don't have any specific use case right now. > > > > I will ask a few local people and see if they are interested in an > > informal video chat. If anybody out there in the general community is > > interested, please reply on this thread and maybe we can coordinate a > > single presentation time. > > > > Cheers, > > > > Sean > > ________________________________________ > > From: Peter Klügl <peter.klu...@averbis.com> > > Sent: Wednesday, May 19, 2021 3:33 PM > > To: dev@ctakes.apache.org > > Subject: Re: rule-based lookup for custom lexicon [EXTERNAL] > > [SUSPICIOUS] > > > > * External Email - Caution * > > > > > > Hi all, > > > > > > if you are interested in UIMA Ruta and want to know more about it, you > > can always ask on the UIMA user list or me directly (I am the creator > > of UIMA Ruta). I can also prepare some slides and we can have an > > informal video chat where I give an overview of Ruta. > > > > > > I am of course not objective here (for several reasons) but I think > > UIMA Ruta could be really useful for cTAKES. It was originally > > developed for segmenting and processing discharge letters and similar > > clincial documents. Since then (>10 years), Ruta has always been > > applied to clincial documents and is being deployed in production by > > several companies. The language has some advantages and disadvantages > > compared to other rule languages. In the context of cTAKES, the > > direct/comprehensive support of UIMA and the IDE dev support are maybe > > the most relevant advantages. > > > > > > I was thinking about creating some introductory examples for the > > combination and usage of UIMA Ruta and cTAKES. If you have a good use > > case, let me know. > > > > > > Best, > > > > > > (another) Peter > > > > > > Am 19.05.2021 um 14:30 schrieb Finan, Sean: > > > Hi all, > > > Correct. > > > > > > Tim is correct in the sense that he is using a custom dictionary > > (custom synonyms, cuis, etc.) which kind of changes the "rules" of > > what the standard dictionary lookup considers a valid term based upon > > available tokens in the text. There are other simple settings that > > further qualify how the standard dictionary lookup accepts or discards > synonyms. > > > > > > I think that what Greg is asking about is something with introduced > > "logic" that can alter or remove terms already discovered by the > > standard dictionary lookup. > > > > > > Peter and Kean both outline some custom annotators that they have > > created to use logic that can alter/add/remove terms discovered by the > > standard dictionary lookup. I do the same thing for different > > projects and advise everybody that applies ctakes to specific domains do > the same. > > > > > > ctakes is a general purpose tool and results can definitely be > > > improved > > when catered to a more narrow purpose. > > > > > > Back to Greg, I got the feeling that he might be interested in a > > > more > > versatile annotator. Introducing an engine that can utilize something > > like ruta has several advantages: > > > 1. You can "easily" add complex rules in one place. > > > 2. You can change rules external to code ... > > > 2a. the same pipeline can be catered to different projects without > > changing code in an annotator or creating a new annotator. > > > 2b. An end user who knows nothing about ctakes can change a ruta > > script to fit their purposes. > > > 3. Rules are supported and documented by uima ruta, so you don't > > > have to > > worry about that extra headache. > > > 4. Once Greg adds it to apache ctakes (right? ;^) everybody in the > > community can apply ruta rules to their project. > > > > > > When I looked at it a few years ago it was for reason 2b. In the > > > end we > > went for different annotators like Peter and Kean outlined and just > > use piper file changes to satisfy #2 as that is definitely much easier. > > However, it doesn't benefit the community as a whole (#4). > > > > > > Cheers all, this is a great conversation! > > > > > > Sean > > > > > > > > > > > > > > > ________________________________________ > > > From: Kean Kaufmann <k...@recordsone.com> > > > Sent: Wednesday, May 19, 2021 7:50 AM > > > To: dev@ctakes.apache.org > > > Subject: Re: rule-based lookup for custom lexicon [EXTERNAL] > > > [SUSPICIOUS] > > > > > > * External Email - Caution * > > > > > > > > >> yes, the line between "lookup" and rule execution is a little > > >> blurry > > > sometimes. > > > > > > Sure is. I blur it with a set of annotators that extend dictionary > > > annotations based on words or annotations covered by the same Chunk, > e.g. > > > > > > DiseaseDisorderMention + /screen(ing)?/i = ProcedureMention > > > MedicationMention + /dependenc[ey]|addiction/i = > > > DiseaseDisorderMention DiseaseDisorderMention + > > > AnatomicalSiteMention in same Chunk = DiseaseDisorderMention > > > ProcedureMention + AnatomicalSiteMention in same Chunk = > > > ProcedureMention > > > > > > Higher recall than the regular UmlsLookupAnnotator; higher precision > > > than the UmlsOverlapLookupAnnotator (which skips a specified number > > > of tokens regardless of syntax). > > > > > > I've been wanting a more general framework to fit this into, and > > > thinking it might be Ruta. > > > Thanks for the pointer to TokensRegex; I'll look at that as well. > > > > > > > > > On Tue, May 18, 2021 at 5:39 PM Peter Abramowitsch < > > pabramowit...@gmail.com> > > > wrote: > > > > > >> Hi All, yes, the line between "lookup" and rule execution is a > little > > >> blurry sometimes. Here's some more blurriness. > > >> > > >> I've done something related, adapting a UIMA tokens regex engine > > >> for Ctakes. You create a new type in the TypeSystem. In my case it > uses > > >> CONLLDEP Annotations as the tokens to reason over. You can set up > > >> expressions (rules) that look like this. > > >> (Yes, this case is already covered in the dictionary, but it's an > > example) > > >> > > >> Matcher A: (lemma=="be"); > > >> Matcher B: /partially|partly/; > > >> Matcher C: /vaccinated/; > > >> > > >> Rule vaccinated|CUI1234|SNOMED5678: A? B? C; > > >> > > >> You get the Annotation you've delegated to this task, with the > > >> entity value "vaccinated|1234|5678" and the range which spanned > > >> the tokens > > that > > >> caused the annotation rule to fire > > >> > > >> (See Stanford's Tokens Regex) > > >> > > >> Peter > > >> > > >> > > >> On Tue, May 18, 2021 at 1:29 PM Miller, Timothy < > > >> timothy.mil...@childrens.harvard.edu> wrote: > > >> > > >>> But Sean, isn't what he's asking for essentially already > > >>> implemented in cTAKES as the custom dictionary? I'm currently > > >>> using that approach for > > my > > >>> covid container: > > >>> > > >>> > > >> > > https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Furld > > efense.com%2Fv3%2F__https%3A%2F%2Fgithub.com%2FMachine-Learning-for-Me > > dical-Language%2Fctakes-covid-container__%3B!!NZvER7FxgEiBAiR_!7ZopTIh > > XKalQFx0xET_yET0agN2ZT8JWoa0UyqGSrXa4w-h_9-tRCEeiS4pr6s2Y-T4elV3bYac%2 > > 4&data=04%7C01%7C%7C2c06b48172e64effe38208d91b01d138%7Cd09f6c4846d > > 241f380993e0f7df7a48e%7C1%7C0%7C637570516886398095%7CUnknown%7CTWFpbGZ > > sb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3 > > D%7C1000&sdata=9sq3Mkcfzpq6ky5VxRTJYX5fg96K9jLQ84ZuAZtfkBw%3D& > > reserved=0 > > >>> Tim > > >>> > > >>> ________________________________________ > > >>> From: Finan, Sean <sean.fi...@childrens.harvard.edu> > > >>> Sent: Tuesday, May 18, 2021 11:55 AM > > >>> To: dev@ctakes.apache.org > > >>> Cc: Himanshu Shekhar Sahoo > > >>> Subject: Re: rule-based lookup for custom lexicon [EXTERNAL] > > [SUSPICIOUS] > > >>> > > >>> * External Email - Caution * > > >>> > > >>> > > >>> Hi Greg, > > >>> > > >>> From 30,000 ft, I think that you would want to use the RutaEngine. > > >>> > > >>> > > >>> > > >> > > https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Furld > > efense.com%2Fv3%2F__https%3A%2F%2Fuima.apache.org%2Fd%2Fruta-current%2 > > Ftools.ruta.book.html*ugr.tools.ruta.ae.basic__%3BIw!!NZvER7FxgEiBAiR_ > > !6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWi > > ckztninUTU%24&data=04%7C01%7C%7C2c06b48172e64effe38208d91b01d138%7 > > Cd09f6c4846d241f380993e0f7df7a48e%7C1%7C0%7C637570516886398095%7CUnkno > > wn%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiL > > CJXVCI6Mn0%3D%7C1000&sdata=NplkaaVc1VSAzprb2eKYEWDZyjlceT%2FIzx0X9 > > Y23yco%3D&reserved=0 > > >>> > > >> > > https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Furld > > efense.com%2Fv3%2F__https%3A%2F%2Fjavadoc.io%2Fdoc%2Forg.apache.uima%2 > > Fruta-core%2Flatest%2Forg%2Fapache%2Fuima%2Fruta%2Fengine%2FRutaEngine > > .html__%3B!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde- > > iOew4ufAIPpkFHnsJxSvv8r5GjWickzI7QF5CI%24&data=04%7C01%7C%7C2c06b4 > > 8172e64effe38208d91b01d138%7Cd09f6c4846d241f380993e0f7df7a48e%7C1%7C0% > > 7C637570516886398095%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQI > > joiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=wsLHHngunn8 > > M%2B8IIJpCLuUeHEreCkFbJsYxN41%2FErrc%3D&reserved=0 > > >>> > > >> > > https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Furld > > efense.com%2Fv3%2F__http%3A%2F%2Fsvn.apache.org%2Frepos%2Fasf%2Fuima%2 > > Fruta%2Ftrunk%2Fruta-core%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fuima%2F > > ruta%2Fengine%2FRutaEngine.java__%3B!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAv > > Lt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWickzJJ96zT4%24&am > > p;data=04%7C01%7C%7C2c06b48172e64effe38208d91b01d138%7Cd09f6c4846d241f > > 380993e0f7df7a48e%7C1%7C0%7C637570516886398095%7CUnknown%7CTWFpbGZsb3d > > 8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C > > 1000&sdata=8e53AJqf9xK5ZKj%2BhKk7wy%2BzQSEcHybEe65SM7etn5I%3D& > > reserved=0 > > >>> That seems to be the actual analysis engine that loads and uses > > >>> rules > > to > > >>> create annotations. > > >>> While you could use an xml descriptor or use the piper "set" > > >>> command > > and > > >>> do things like mapping ruta to ctakes type systems, I would take > > >>> the alternate approach of "copying" the initialize(..) and process > > >>> (..) > > >> methods > > >>> and modify them to use ctakes types directly. > > >>> > > >>> Disclaimer: I know very little about uima ruta. At some point I > > >>> did > > >> look > > >>> into it but it was for a specific (ctakes-derivative) project and > > >>> I > > >> didn't > > >>> go further than basic doc perusal. > > >>> > > >>> If you move forward with this please let us all know what you > > >>> find. I think that there will be great interest in the community. > > >>> > > >>> Sean > > >>> ________________________________________ > > >>> From: Greg Silverman <g...@umn.edu.INVALID> > > >>> Sent: Tuesday, May 18, 2021 11:13 AM > > >>> To: dev@ctakes.apache.org > > >>> Cc: Himanshu Shekhar Sahoo > > >>> Subject: Re: rule-based lookup for custom lexicon [EXTERNAL] > > >>> > > >>> * External Email - Caution * > > >>> > > >>> > > >>> Hi Sean, > > >>> I was wondering if there was a way to use rule-base lookup of a > > >>> custom lexicon within cTAKES (say a locally curated list of covd-19 > symptoms). > > >>> When I Googled around, I stumbled on UIMA Ruta, but couldn't find > > >> anything > > >>> wrt to cTAKES specifics. > > >>> > > >>> Thanks! > > >>> > > >>> > > >>> Greg-- > > >>> > > >>> On Tue, May 18, 2021 at 10:04 AM Finan, Sean < > > >>> sean.fi...@childrens.harvard.edu> wrote: > > >>> > > >>>> To which ctakes component(s) are you referring? > > >>>> ________________________________________ > > >>>> From: Greg Silverman <g...@umn.edu.INVALID> > > >>>> Sent: Sunday, May 16, 2021 6:02 PM > > >>>> To: dev@ctakes.apache.org; Himanshu Shekhar Sahoo > > >>>> Subject: rule-based lookup for custom lexicon [EXTERNAL] > > >>>> > > >>>> * External Email - Caution * > > >>>> > > >>>> > > >>>> I looked all over and could not find any information on how to > > >>>> add > > this > > >>>> pipeline component to cTAKES. I assume it uses UIMA Ruta? > > >>>> > > >>>> Thanks in advance! > > >>>> > > >>>> Greg-- > > >>>> -- > > >>>> Greg M. Silverman > > >>>> Senior Systems Developer > > >>>> NLP/IE < > > >>>> > > >> > > https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Furld > > efense.com%2Fv3%2F__https%3A%2F%2Fhealthinformatics.umn.edu%2Fresearch > > %2Fnlpie-group__%3B!!NZvER7FxgEiBAiR_!6hN356eDesvWNYzsrDMaXgF6IkZw313Q > > U2QUQw5M8Jysvh1K1JxjEBeztZicX1DM2jC0o7_0qAA%24&data=04%7C01%7C%7C2 > > c06b48172e64effe38208d91b01d138%7Cd09f6c4846d241f380993e0f7df7a48e%7C1 > > %7C0%7C637570516886398095%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAi > > LCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=0WN0yw > > j9IqYGirnJL2cF4EhcJCyqLR2E6gjrGH8r%2BPo%3D&reserved=0 > > >>>> Department of Surgery > > >>>> University of Minnesota > > >>>> g...@umn.edu > > >>>> > > >>> > > >>> -- > > >>> Greg M. Silverman > > >>> Senior Systems Developer > > >>> NLP/IE < > > >>> > > >> > > https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Furld > > efense.com%2Fv3%2F__https%3A%2F%2Fhealthinformatics.umn.edu%2Fresearch > > %2Fnlpie-group__%3B!!NZvER7FxgEiBAiR_!8uKf_4SXyKdCmvlMHvRGddxlzofg64D4 > > _zsPdCThqeMAyn2akyMNI8wqM6yNUZA2N93F-aAsR7I%24&data=04%7C01%7C%7C2 > > c06b48172e64effe38208d91b01d138%7Cd09f6c4846d241f380993e0f7df7a48e%7C1 > > %7C0%7C637570516886408094%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAi > > LCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=aUEAqH > > Dqep4MURX9a5ZXabQ4W1LzM89AEPNHTqzG1Yw%3D&reserved=0 > > >>> Department of Surgery > > >>> University of Minnesota > > >>> g...@umn.edu > > >>> > > -- > > Dr. Peter Klügl > > Head of Text Mining/Machine Learning > > > > Averbis GmbH > > Salzstr. 15 > > 79098 Freiburg > > Germany > > > > Fon: +49 761 708 394 0 > > Fax: +49 761 708 394 10 > > Email: peter.klu...@averbis.com > > Web: > > https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Furld > > efense.com%2Fv3%2F__https%3A%2F%2Faverbis.com__%3B!!NZvER7FxgEiBAiR_!8 > > k8JQUqNQYj-fQWELRFtxlACk1xSqLtVEnIHDmvmw6QnGtc3id_S4IOLqa6-Y9F4mOzpTfA > > OWo4%24&data=04%7C01%7C%7C2c06b48172e64effe38208d91b01d138%7Cd09f6 > > c4846d241f380993e0f7df7a48e%7C1%7C0%7C637570516886408094%7CUnknown%7CT > > WFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI > > 6Mn0%3D%7C1000&sdata=EQcNZBDQoEHOCGnJRWPyz%2B2a8tulfifkkFGI1Py4SIs > > %3D&reserved=0 > > > > Headquarters: Freiburg im Breisgau > > Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080 Managing > > Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó > > > > > -- Greg M. Silverman Senior Systems Developer NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group> Department of Surgery University of Minnesota g...@umn.edu