I am interested. Thank you Shyam Bhimani Software Engineer
CONFIDENTIALITY NOTICE: The contents of this email message and any attachments are intended solely for the addressee(s) and may contain confidential and/or privileged information and may be legally protected from disclosure. -----Original Message----- From: Kean Kaufmann <k...@recordsone.com> Sent: Wednesday, May 19, 2021 2:08 PM To: dev@ctakes.apache.org Subject: Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS] ** WARNING: This email originated from outside of Target RWE. ** > > If anybody out there in the general community is interested, please > reply on this thread and maybe we can coordinate a single presentation time. Yes please. Thanks, Sean and (other) Peter! On Wed, May 19, 2021 at 3:42 PM Finan, Sean < sean.fi...@childrens.harvard.edu> wrote: > Hi (other) Peter, > > Many thanks for jumping in on this! > > I would definitely be interested in seeing some examples, even though > I don't have any specific use case right now. > > I will ask a few local people and see if they are interested in an > informal video chat. If anybody out there in the general community is > interested, please reply on this thread and maybe we can coordinate a > single presentation time. > > Cheers, > > Sean > ________________________________________ > From: Peter Klügl <peter.klu...@averbis.com> > Sent: Wednesday, May 19, 2021 3:33 PM > To: dev@ctakes.apache.org > Subject: Re: rule-based lookup for custom lexicon [EXTERNAL] > [SUSPICIOUS] > > * External Email - Caution * > > > Hi all, > > > if you are interested in UIMA Ruta and want to know more about it, you > can always ask on the UIMA user list or me directly (I am the creator > of UIMA Ruta). I can also prepare some slides and we can have an > informal video chat where I give an overview of Ruta. > > > I am of course not objective here (for several reasons) but I think > UIMA Ruta could be really useful for cTAKES. It was originally > developed for segmenting and processing discharge letters and similar > clincial documents. Since then (>10 years), Ruta has always been > applied to clincial documents and is being deployed in production by > several companies. The language has some advantages and disadvantages > compared to other rule languages. In the context of cTAKES, the > direct/comprehensive support of UIMA and the IDE dev support are maybe > the most relevant advantages. > > > I was thinking about creating some introductory examples for the > combination and usage of UIMA Ruta and cTAKES. If you have a good use > case, let me know. > > > Best, > > > (another) Peter > > > Am 19.05.2021 um 14:30 schrieb Finan, Sean: > > Hi all, > > Correct. > > > > Tim is correct in the sense that he is using a custom dictionary > (custom synonyms, cuis, etc.) which kind of changes the "rules" of > what the standard dictionary lookup considers a valid term based upon > available tokens in the text. There are other simple settings that > further qualify how the standard dictionary lookup accepts or discards > synonyms. > > > > I think that what Greg is asking about is something with introduced > "logic" that can alter or remove terms already discovered by the > standard dictionary lookup. > > > > Peter and Kean both outline some custom annotators that they have > created to use logic that can alter/add/remove terms discovered by the > standard dictionary lookup. I do the same thing for different > projects and advise everybody that applies ctakes to specific domains do the > same. > > > > ctakes is a general purpose tool and results can definitely be > > improved > when catered to a more narrow purpose. > > > > Back to Greg, I got the feeling that he might be interested in a > > more > versatile annotator. Introducing an engine that can utilize something > like ruta has several advantages: > > 1. You can "easily" add complex rules in one place. > > 2. You can change rules external to code ... > > 2a. the same pipeline can be catered to different projects without > changing code in an annotator or creating a new annotator. > > 2b. An end user who knows nothing about ctakes can change a ruta > script to fit their purposes. > > 3. Rules are supported and documented by uima ruta, so you don't > > have to > worry about that extra headache. > > 4. Once Greg adds it to apache ctakes (right? ;^) everybody in the > community can apply ruta rules to their project. > > > > When I looked at it a few years ago it was for reason 2b. In the > > end we > went for different annotators like Peter and Kean outlined and just > use piper file changes to satisfy #2 as that is definitely much easier. > However, it doesn't benefit the community as a whole (#4). > > > > Cheers all, this is a great conversation! > > > > Sean > > > > > > > > > > ________________________________________ > > From: Kean Kaufmann <k...@recordsone.com> > > Sent: Wednesday, May 19, 2021 7:50 AM > > To: dev@ctakes.apache.org > > Subject: Re: rule-based lookup for custom lexicon [EXTERNAL] > > [SUSPICIOUS] > > > > * External Email - Caution * > > > > > >> yes, the line between "lookup" and rule execution is a little > >> blurry > > sometimes. > > > > Sure is. I blur it with a set of annotators that extend dictionary > > annotations based on words or annotations covered by the same Chunk, e.g. > > > > DiseaseDisorderMention + /screen(ing)?/i = ProcedureMention > > MedicationMention + /dependenc[ey]|addiction/i = > > DiseaseDisorderMention DiseaseDisorderMention + > > AnatomicalSiteMention in same Chunk = DiseaseDisorderMention > > ProcedureMention + AnatomicalSiteMention in same Chunk = > > ProcedureMention > > > > Higher recall than the regular UmlsLookupAnnotator; higher precision > > than the UmlsOverlapLookupAnnotator (which skips a specified number > > of tokens regardless of syntax). > > > > I've been wanting a more general framework to fit this into, and > > thinking it might be Ruta. > > Thanks for the pointer to TokensRegex; I'll look at that as well. > > > > > > On Tue, May 18, 2021 at 5:39 PM Peter Abramowitsch < > pabramowit...@gmail.com> > > wrote: > > > >> Hi All, yes, the line between "lookup" and rule execution is a little > >> blurry sometimes. Here's some more blurriness. > >> > >> I've done something related, adapting a UIMA tokens regex engine > >> for Ctakes. You create a new type in the TypeSystem. In my case it uses > >> CONLLDEP Annotations as the tokens to reason over. You can set up > >> expressions (rules) that look like this. > >> (Yes, this case is already covered in the dictionary, but it's an > example) > >> > >> Matcher A: (lemma=="be"); > >> Matcher B: /partially|partly/; > >> Matcher C: /vaccinated/; > >> > >> Rule vaccinated|CUI1234|SNOMED5678: A? B? C; > >> > >> You get the Annotation you've delegated to this task, with the > >> entity value "vaccinated|1234|5678" and the range which spanned > >> the tokens > that > >> caused the annotation rule to fire > >> > >> (See Stanford's Tokens Regex) > >> > >> Peter > >> > >> > >> On Tue, May 18, 2021 at 1:29 PM Miller, Timothy < > >> timothy.mil...@childrens.harvard.edu> wrote: > >> > >>> But Sean, isn't what he's asking for essentially already > >>> implemented in cTAKES as the custom dictionary? I'm currently > >>> using that approach for > my > >>> covid container: > >>> > >>> > >> > https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Furld > efense.com%2Fv3%2F__https%3A%2F%2Fgithub.com%2FMachine-Learning-for-Me > dical-Language%2Fctakes-covid-container__%3B!!NZvER7FxgEiBAiR_!7ZopTIh > XKalQFx0xET_yET0agN2ZT8JWoa0UyqGSrXa4w-h_9-tRCEeiS4pr6s2Y-T4elV3bYac%2 > 4&data=04%7C01%7C%7C2c06b48172e64effe38208d91b01d138%7Cd09f6c4846d > 241f380993e0f7df7a48e%7C1%7C0%7C637570516886398095%7CUnknown%7CTWFpbGZ > sb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3 > D%7C1000&sdata=9sq3Mkcfzpq6ky5VxRTJYX5fg96K9jLQ84ZuAZtfkBw%3D& > reserved=0 > >>> Tim > >>> > >>> ________________________________________ > >>> From: Finan, Sean <sean.fi...@childrens.harvard.edu> > >>> Sent: Tuesday, May 18, 2021 11:55 AM > >>> To: dev@ctakes.apache.org > >>> Cc: Himanshu Shekhar Sahoo > >>> Subject: Re: rule-based lookup for custom lexicon [EXTERNAL] > [SUSPICIOUS] > >>> > >>> * External Email - Caution * > >>> > >>> > >>> Hi Greg, > >>> > >>> From 30,000 ft, I think that you would want to use the RutaEngine. > >>> > >>> > >>> > >> > https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Furld > efense.com%2Fv3%2F__https%3A%2F%2Fuima.apache.org%2Fd%2Fruta-current%2 > Ftools.ruta.book.html*ugr.tools.ruta.ae.basic__%3BIw!!NZvER7FxgEiBAiR_ > !6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWi > ckztninUTU%24&data=04%7C01%7C%7C2c06b48172e64effe38208d91b01d138%7 > Cd09f6c4846d241f380993e0f7df7a48e%7C1%7C0%7C637570516886398095%7CUnkno > wn%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiL > CJXVCI6Mn0%3D%7C1000&sdata=NplkaaVc1VSAzprb2eKYEWDZyjlceT%2FIzx0X9 > Y23yco%3D&reserved=0 > >>> > >> > https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Furld > efense.com%2Fv3%2F__https%3A%2F%2Fjavadoc.io%2Fdoc%2Forg.apache.uima%2 > Fruta-core%2Flatest%2Forg%2Fapache%2Fuima%2Fruta%2Fengine%2FRutaEngine > .html__%3B!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde- > iOew4ufAIPpkFHnsJxSvv8r5GjWickzI7QF5CI%24&data=04%7C01%7C%7C2c06b4 > 8172e64effe38208d91b01d138%7Cd09f6c4846d241f380993e0f7df7a48e%7C1%7C0% > 7C637570516886398095%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQI > joiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=wsLHHngunn8 > M%2B8IIJpCLuUeHEreCkFbJsYxN41%2FErrc%3D&reserved=0 > >>> > >> > https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Furld > efense.com%2Fv3%2F__http%3A%2F%2Fsvn.apache.org%2Frepos%2Fasf%2Fuima%2 > Fruta%2Ftrunk%2Fruta-core%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fuima%2F > ruta%2Fengine%2FRutaEngine.java__%3B!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAv > Lt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWickzJJ96zT4%24&am > p;data=04%7C01%7C%7C2c06b48172e64effe38208d91b01d138%7Cd09f6c4846d241f > 380993e0f7df7a48e%7C1%7C0%7C637570516886398095%7CUnknown%7CTWFpbGZsb3d > 8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C > 1000&sdata=8e53AJqf9xK5ZKj%2BhKk7wy%2BzQSEcHybEe65SM7etn5I%3D& > reserved=0 > >>> That seems to be the actual analysis engine that loads and uses > >>> rules > to > >>> create annotations. > >>> While you could use an xml descriptor or use the piper "set" > >>> command > and > >>> do things like mapping ruta to ctakes type systems, I would take > >>> the alternate approach of "copying" the initialize(..) and process > >>> (..) > >> methods > >>> and modify them to use ctakes types directly. > >>> > >>> Disclaimer: I know very little about uima ruta. At some point I > >>> did > >> look > >>> into it but it was for a specific (ctakes-derivative) project and > >>> I > >> didn't > >>> go further than basic doc perusal. > >>> > >>> If you move forward with this please let us all know what you > >>> find. I think that there will be great interest in the community. > >>> > >>> Sean > >>> ________________________________________ > >>> From: Greg Silverman <g...@umn.edu.INVALID> > >>> Sent: Tuesday, May 18, 2021 11:13 AM > >>> To: dev@ctakes.apache.org > >>> Cc: Himanshu Shekhar Sahoo > >>> Subject: Re: rule-based lookup for custom lexicon [EXTERNAL] > >>> > >>> * External Email - Caution * > >>> > >>> > >>> Hi Sean, > >>> I was wondering if there was a way to use rule-base lookup of a > >>> custom lexicon within cTAKES (say a locally curated list of covd-19 > >>> symptoms). > >>> When I Googled around, I stumbled on UIMA Ruta, but couldn't find > >> anything > >>> wrt to cTAKES specifics. > >>> > >>> Thanks! > >>> > >>> > >>> Greg-- > >>> > >>> On Tue, May 18, 2021 at 10:04 AM Finan, Sean < > >>> sean.fi...@childrens.harvard.edu> wrote: > >>> > >>>> To which ctakes component(s) are you referring? > >>>> ________________________________________ > >>>> From: Greg Silverman <g...@umn.edu.INVALID> > >>>> Sent: Sunday, May 16, 2021 6:02 PM > >>>> To: dev@ctakes.apache.org; Himanshu Shekhar Sahoo > >>>> Subject: rule-based lookup for custom lexicon [EXTERNAL] > >>>> > >>>> * External Email - Caution * > >>>> > >>>> > >>>> I looked all over and could not find any information on how to > >>>> add > this > >>>> pipeline component to cTAKES. I assume it uses UIMA Ruta? > >>>> > >>>> Thanks in advance! > >>>> > >>>> Greg-- > >>>> -- > >>>> Greg M. Silverman > >>>> Senior Systems Developer > >>>> NLP/IE < > >>>> > >> > https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Furld > efense.com%2Fv3%2F__https%3A%2F%2Fhealthinformatics.umn.edu%2Fresearch > %2Fnlpie-group__%3B!!NZvER7FxgEiBAiR_!6hN356eDesvWNYzsrDMaXgF6IkZw313Q > U2QUQw5M8Jysvh1K1JxjEBeztZicX1DM2jC0o7_0qAA%24&data=04%7C01%7C%7C2 > c06b48172e64effe38208d91b01d138%7Cd09f6c4846d241f380993e0f7df7a48e%7C1 > %7C0%7C637570516886398095%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAi > LCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=0WN0yw > j9IqYGirnJL2cF4EhcJCyqLR2E6gjrGH8r%2BPo%3D&reserved=0 > >>>> Department of Surgery > >>>> University of Minnesota > >>>> g...@umn.edu > >>>> > >>> > >>> -- > >>> Greg M. Silverman > >>> Senior Systems Developer > >>> NLP/IE < > >>> > >> > https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Furld > efense.com%2Fv3%2F__https%3A%2F%2Fhealthinformatics.umn.edu%2Fresearch > %2Fnlpie-group__%3B!!NZvER7FxgEiBAiR_!8uKf_4SXyKdCmvlMHvRGddxlzofg64D4 > _zsPdCThqeMAyn2akyMNI8wqM6yNUZA2N93F-aAsR7I%24&data=04%7C01%7C%7C2 > c06b48172e64effe38208d91b01d138%7Cd09f6c4846d241f380993e0f7df7a48e%7C1 > %7C0%7C637570516886408094%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAi > LCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=aUEAqH > Dqep4MURX9a5ZXabQ4W1LzM89AEPNHTqzG1Yw%3D&reserved=0 > >>> Department of Surgery > >>> University of Minnesota > >>> g...@umn.edu > >>> > -- > Dr. Peter Klügl > Head of Text Mining/Machine Learning > > Averbis GmbH > Salzstr. 15 > 79098 Freiburg > Germany > > Fon: +49 761 708 394 0 > Fax: +49 761 708 394 10 > Email: peter.klu...@averbis.com > Web: > https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Furld > efense.com%2Fv3%2F__https%3A%2F%2Faverbis.com__%3B!!NZvER7FxgEiBAiR_!8 > k8JQUqNQYj-fQWELRFtxlACk1xSqLtVEnIHDmvmw6QnGtc3id_S4IOLqa6-Y9F4mOzpTfA > OWo4%24&data=04%7C01%7C%7C2c06b48172e64effe38208d91b01d138%7Cd09f6 > c4846d241f380993e0f7df7a48e%7C1%7C0%7C637570516886408094%7CUnknown%7CT > WFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI > 6Mn0%3D%7C1000&sdata=EQcNZBDQoEHOCGnJRWPyz%2B2a8tulfifkkFGI1Py4SIs > %3D&reserved=0 > > Headquarters: Freiburg im Breisgau > Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080 Managing > Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó > >