Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS]

Greg Silverman Wed, 19 May 2021 13:13:15 -0700

Thanks everyone! This indeed is an enlightening conversation.

Best!


On Wed, May 19, 2021 at 3:10 PM Shyam Bhimani <[email protected]>
wrote:

> I am interested. Thank you
>
> Shyam Bhimani
> Software Engineer
>
>
>
>
> CONFIDENTIALITY NOTICE: The contents of this email message and any
> attachments are intended solely for the addressee(s) and may
> contain confidential and/or privileged information and may be legally
> protected from disclosure.
>
> -----Original Message-----
> From: Kean Kaufmann <[email protected]>
> Sent: Wednesday, May 19, 2021 2:08 PM
> To: [email protected]
> Subject: Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS]
>
> ** WARNING: This email originated from outside of Target RWE. **
>
>
> >
> > If anybody out there in the general community is interested, please
> > reply on this thread and maybe we can coordinate a single presentation
> time.
>
>
> Yes please. Thanks, Sean and (other) Peter!
>
> On Wed, May 19, 2021 at 3:42 PM Finan, Sean <
> [email protected]> wrote:
>
> > Hi (other) Peter,
> >
> > Many thanks for jumping in on this!
> >
> > I would definitely be interested in seeing some examples, even though
> > I don't have any specific use case right now.
> >
> > I will ask a few local people and see if they are interested in an
> > informal video chat.  If anybody out there in the general community is
> > interested, please reply on this thread and maybe we can coordinate a
> > single presentation time.
> >
> > Cheers,
> >
> > Sean
> > ________________________________________
> > From: Peter Klügl <[email protected]>
> > Sent: Wednesday, May 19, 2021 3:33 PM
> > To: [email protected]
> > Subject: Re: rule-based lookup for custom lexicon [EXTERNAL]
> > [SUSPICIOUS]
> >
> > * External Email - Caution *
> >
> >
> > Hi all,
> >
> >
> > if you are interested in UIMA Ruta and want to know more about it, you
> > can always ask on the UIMA user list or me directly (I am the creator
> > of UIMA Ruta). I can also prepare some slides and we can have an
> > informal video chat where I give an overview of Ruta.
> >
> >
> > I am of course not objective here (for several reasons) but I think
> > UIMA Ruta could be really useful for cTAKES. It was originally
> > developed for segmenting and processing discharge letters and similar
> > clincial documents. Since then (>10 years), Ruta has always been
> > applied to clincial documents and is being deployed in production by
> > several companies. The language has some advantages and disadvantages
> > compared to other rule languages. In the context of cTAKES, the
> > direct/comprehensive support of UIMA and the IDE dev support are maybe
> > the most relevant advantages.
> >
> >
> > I was thinking about creating some introductory examples for the
> > combination and usage of UIMA Ruta and cTAKES. If you have a good use
> > case, let me know.
> >
> >
> > Best,
> >
> >
> > (another) Peter
> >
> >
> > Am 19.05.2021 um 14:30 schrieb Finan, Sean:
> > > Hi all,
> > > Correct.
> > >
> > > Tim  is correct in the sense that he is using a custom dictionary
> > (custom synonyms, cuis, etc.) which kind of changes the "rules" of
> > what the standard dictionary lookup considers a valid term based upon
> > available tokens in the text.  There are other simple settings that
> > further qualify how the standard dictionary lookup accepts or discards
> synonyms.
> > >
> > > I think that what Greg is asking about is something with introduced
> > "logic" that can alter or remove terms already discovered by the
> > standard dictionary lookup.
> > >
> > > Peter and Kean both outline some custom annotators that they have
> > created to use logic that can alter/add/remove terms discovered by the
> > standard dictionary lookup.  I do the same thing for different
> > projects and advise everybody that applies ctakes to specific domains do
> the same.
> > >
> > > ctakes is a general purpose tool and results can definitely be
> > > improved
> > when catered to a more narrow purpose.
> > >
> > > Back to Greg, I got the feeling that he might be interested in a
> > > more
> > versatile annotator.  Introducing an engine that can utilize something
> > like ruta has several advantages:
> > > 1.  You  can "easily" add complex rules in one place.
> > > 2.  You can change rules external to code ...
> > >   2a. the same pipeline can be catered to different projects without
> > changing code in an annotator or creating a new annotator.
> > >   2b.  An end user who knows nothing about ctakes can change a ruta
> > script to fit their purposes.
> > > 3. Rules are supported and documented by uima ruta, so you don't
> > > have to
> > worry about that extra headache.
> > > 4. Once Greg adds it to apache ctakes (right? ;^) everybody in the
> > community can apply ruta rules to their project.
> > >
> > > When I looked at it a few years ago it was for reason 2b.  In the
> > > end we
> > went for different annotators like Peter and Kean outlined and just
> > use piper file changes to satisfy #2 as that is definitely much easier.
> > However, it doesn't benefit the community as a whole (#4).
> > >
> > > Cheers all, this is a great conversation!
> > >
> > > Sean
> > >
> > >
> > >
> > >
> > > ________________________________________
> > > From: Kean Kaufmann <[email protected]>
> > > Sent: Wednesday, May 19, 2021 7:50 AM
> > > To: [email protected]
> > > Subject: Re: rule-based lookup for custom lexicon [EXTERNAL]
> > > [SUSPICIOUS]
> > >
> > > * External Email - Caution *
> > >
> > >
> > >> yes,  the line between "lookup" and rule execution is a little
> > >> blurry
> > > sometimes.
> > >
> > > Sure is.  I blur it with a set of annotators that extend dictionary
> > > annotations based on words or annotations covered by the same Chunk,
> e.g.
> > >
> > > DiseaseDisorderMention + /screen(ing)?/i = ProcedureMention
> > > MedicationMention + /dependenc[ey]|addiction/i =
> > > DiseaseDisorderMention DiseaseDisorderMention +
> > > AnatomicalSiteMention in same Chunk = DiseaseDisorderMention
> > > ProcedureMention + AnatomicalSiteMention in same Chunk =
> > > ProcedureMention
> > >
> > > Higher recall than the regular UmlsLookupAnnotator; higher precision
> > > than the UmlsOverlapLookupAnnotator (which skips a specified number
> > > of tokens regardless of syntax).
> > >
> > > I've been wanting a more general framework to fit this into, and
> > > thinking it might be Ruta.
> > > Thanks for the pointer to TokensRegex; I'll look at that as well.
> > >
> > >
> > > On Tue, May 18, 2021 at 5:39 PM Peter Abramowitsch <
> > [email protected]>
> > > wrote:
> > >
> > >> Hi All,  yes,  the line between "lookup" and rule execution is a
> little
> > >> blurry sometimes.   Here's some more blurriness.
> > >>
> > >> I've done something related, adapting a UIMA tokens regex engine
> > >> for Ctakes.  You create a new type in the TypeSystem.  In my case it
> uses
> > >> CONLLDEP Annotations as the tokens to reason over.   You can set up
> > >> expressions (rules) that look like this.
> > >> (Yes, this case is already covered in the dictionary, but it's an
> > example)
> > >>
> > >> Matcher A:   (lemma=="be");
> > >> Matcher B:   /partially|partly/;
> > >> Matcher C:   /vaccinated/;
> > >>
> > >> Rule  vaccinated|CUI1234|SNOMED5678:  A? B?  C;
> > >>
> > >> You get the Annotation you've delegated to this task, with the
> > >> entity value  "vaccinated|1234|5678"  and the range which spanned
> > >> the tokens
> > that
> > >> caused the annotation rule to fire
> > >>
> > >> (See Stanford's Tokens Regex)
> > >>
> > >> Peter
> > >>
> > >>
> > >> On Tue, May 18, 2021 at 1:29 PM Miller, Timothy <
> > >> [email protected]> wrote:
> > >>
> > >>> But Sean, isn't what he's asking for essentially already
> > >>> implemented in cTAKES as the custom dictionary? I'm currently
> > >>> using that approach for
> > my
> > >>> covid container:
> > >>>
> > >>>
> > >>
> > https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Furld
> > efense.com%2Fv3%2F__https%3A%2F%2Fgithub.com%2FMachine-Learning-for-Me
> > dical-Language%2Fctakes-covid-container__%3B!!NZvER7FxgEiBAiR_!7ZopTIh
> > XKalQFx0xET_yET0agN2ZT8JWoa0UyqGSrXa4w-h_9-tRCEeiS4pr6s2Y-T4elV3bYac%2
> > 4&amp;data=04%7C01%7C%7C2c06b48172e64effe38208d91b01d138%7Cd09f6c4846d
> > 241f380993e0f7df7a48e%7C1%7C0%7C637570516886398095%7CUnknown%7CTWFpbGZ
> > sb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3
> > D%7C1000&amp;sdata=9sq3Mkcfzpq6ky5VxRTJYX5fg96K9jLQ84ZuAZtfkBw%3D&amp;
> > reserved=0
> > >>> Tim
> > >>>
> > >>> ________________________________________
> > >>> From: Finan, Sean <[email protected]>
> > >>> Sent: Tuesday, May 18, 2021 11:55 AM
> > >>> To: [email protected]
> > >>> Cc: Himanshu Shekhar Sahoo
> > >>> Subject: Re: rule-based lookup for custom lexicon [EXTERNAL]
> > [SUSPICIOUS]
> > >>>
> > >>> * External Email - Caution *
> > >>>
> > >>>
> > >>> Hi Greg,
> > >>>
> > >>> From 30,000 ft, I think that you would want to use the RutaEngine.
> > >>>
> > >>>
> > >>>
> > >>
> > https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Furld
> > efense.com%2Fv3%2F__https%3A%2F%2Fuima.apache.org%2Fd%2Fruta-current%2
> > Ftools.ruta.book.html*ugr.tools.ruta.ae.basic__%3BIw!!NZvER7FxgEiBAiR_
> > !6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWi
> > ckztninUTU%24&amp;data=04%7C01%7C%7C2c06b48172e64effe38208d91b01d138%7
> > Cd09f6c4846d241f380993e0f7df7a48e%7C1%7C0%7C637570516886398095%7CUnkno
> > wn%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiL
> > CJXVCI6Mn0%3D%7C1000&amp;sdata=NplkaaVc1VSAzprb2eKYEWDZyjlceT%2FIzx0X9
> > Y23yco%3D&amp;reserved=0
> > >>>
> > >>
> > https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Furld
> > efense.com%2Fv3%2F__https%3A%2F%2Fjavadoc.io%2Fdoc%2Forg.apache.uima%2
> > Fruta-core%2Flatest%2Forg%2Fapache%2Fuima%2Fruta%2Fengine%2FRutaEngine
> > .html__%3B!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-
> > iOew4ufAIPpkFHnsJxSvv8r5GjWickzI7QF5CI%24&amp;data=04%7C01%7C%7C2c06b4
> > 8172e64effe38208d91b01d138%7Cd09f6c4846d241f380993e0f7df7a48e%7C1%7C0%
> > 7C637570516886398095%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQI
> > joiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=wsLHHngunn8
> > M%2B8IIJpCLuUeHEreCkFbJsYxN41%2FErrc%3D&amp;reserved=0
> > >>>
> > >>
> > https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Furld
> > efense.com%2Fv3%2F__http%3A%2F%2Fsvn.apache.org%2Frepos%2Fasf%2Fuima%2
> > Fruta%2Ftrunk%2Fruta-core%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fuima%2F
> > ruta%2Fengine%2FRutaEngine.java__%3B!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAv
> > Lt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWickzJJ96zT4%24&am
> > p;data=04%7C01%7C%7C2c06b48172e64effe38208d91b01d138%7Cd09f6c4846d241f
> > 380993e0f7df7a48e%7C1%7C0%7C637570516886398095%7CUnknown%7CTWFpbGZsb3d
> > 8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C
> > 1000&amp;sdata=8e53AJqf9xK5ZKj%2BhKk7wy%2BzQSEcHybEe65SM7etn5I%3D&amp;
> > reserved=0
> > >>> That seems to be the actual analysis engine that loads and uses
> > >>> rules
> > to
> > >>> create annotations.
> > >>> While you could use an xml descriptor or use the piper "set"
> > >>> command
> > and
> > >>> do things like mapping ruta to ctakes type systems, I would take
> > >>> the alternate approach of "copying" the initialize(..) and process
> > >>> (..)
> > >> methods
> > >>> and modify them to use ctakes types directly.
> > >>>
> > >>> Disclaimer:  I know very little about uima ruta.  At some point I
> > >>> did
> > >> look
> > >>> into it but it was for a specific (ctakes-derivative) project and
> > >>> I
> > >> didn't
> > >>> go further than basic doc perusal.
> > >>>
> > >>> If you move forward with this please let us all know what you
> > >>> find.  I think that there will be great interest in the community.
> > >>>
> > >>> Sean
> > >>> ________________________________________
> > >>> From: Greg Silverman <[email protected]>
> > >>> Sent: Tuesday, May 18, 2021 11:13 AM
> > >>> To: [email protected]
> > >>> Cc: Himanshu Shekhar Sahoo
> > >>> Subject: Re: rule-based lookup for custom lexicon [EXTERNAL]
> > >>>
> > >>> * External Email - Caution *
> > >>>
> > >>>
> > >>> Hi Sean,
> > >>> I was wondering if there was a way to use rule-base lookup of a
> > >>> custom lexicon within cTAKES (say a locally curated list of covd-19
> symptoms).
> > >>> When I Googled around, I stumbled on UIMA Ruta, but couldn't find
> > >> anything
> > >>> wrt to cTAKES specifics.
> > >>>
> > >>> Thanks!
> > >>>
> > >>>
> > >>> Greg--
> > >>>
> > >>> On Tue, May 18, 2021 at 10:04 AM Finan, Sean <
> > >>> [email protected]> wrote:
> > >>>
> > >>>>  To which ctakes component(s) are you referring?
> > >>>> ________________________________________
> > >>>> From: Greg Silverman <[email protected]>
> > >>>> Sent: Sunday, May 16, 2021 6:02 PM
> > >>>> To: [email protected]; Himanshu Shekhar Sahoo
> > >>>> Subject: rule-based lookup for custom lexicon [EXTERNAL]
> > >>>>
> > >>>> * External Email - Caution *
> > >>>>
> > >>>>
> > >>>> I looked all over and could not find any information on how to
> > >>>> add
> > this
> > >>>> pipeline component to cTAKES. I assume it uses UIMA Ruta?
> > >>>>
> > >>>> Thanks in advance!
> > >>>>
> > >>>> Greg--
> > >>>> --
> > >>>> Greg M. Silverman
> > >>>> Senior Systems Developer
> > >>>> NLP/IE <
> > >>>>
> > >>
> > https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Furld
> > efense.com%2Fv3%2F__https%3A%2F%2Fhealthinformatics.umn.edu%2Fresearch
> > %2Fnlpie-group__%3B!!NZvER7FxgEiBAiR_!6hN356eDesvWNYzsrDMaXgF6IkZw313Q
> > U2QUQw5M8Jysvh1K1JxjEBeztZicX1DM2jC0o7_0qAA%24&amp;data=04%7C01%7C%7C2
> > c06b48172e64effe38208d91b01d138%7Cd09f6c4846d241f380993e0f7df7a48e%7C1
> > %7C0%7C637570516886398095%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAi
> > LCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=0WN0yw
> > j9IqYGirnJL2cF4EhcJCyqLR2E6gjrGH8r%2BPo%3D&amp;reserved=0
> > >>>> Department of Surgery
> > >>>> University of Minnesota
> > >>>> [email protected]
> > >>>>
> > >>>
> > >>> --
> > >>> Greg M. Silverman
> > >>> Senior Systems Developer
> > >>> NLP/IE <
> > >>>
> > >>
> > https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Furld
> > efense.com%2Fv3%2F__https%3A%2F%2Fhealthinformatics.umn.edu%2Fresearch
> > %2Fnlpie-group__%3B!!NZvER7FxgEiBAiR_!8uKf_4SXyKdCmvlMHvRGddxlzofg64D4
> > _zsPdCThqeMAyn2akyMNI8wqM6yNUZA2N93F-aAsR7I%24&amp;data=04%7C01%7C%7C2
> > c06b48172e64effe38208d91b01d138%7Cd09f6c4846d241f380993e0f7df7a48e%7C1
> > %7C0%7C637570516886408094%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAi
> > LCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=aUEAqH
> > Dqep4MURX9a5ZXabQ4W1LzM89AEPNHTqzG1Yw%3D&amp;reserved=0
> > >>> Department of Surgery
> > >>> University of Minnesota
> > >>> [email protected]
> > >>>
> > --
> > Dr. Peter Klügl
> > Head of Text Mining/Machine Learning
> >
> > Averbis GmbH
> > Salzstr. 15
> > 79098 Freiburg
> > Germany
> >
> > Fon: +49 761 708 394 0
> > Fax: +49 761 708 394 10
> > Email: [email protected]
> > Web:
> > https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Furld
> > efense.com%2Fv3%2F__https%3A%2F%2Faverbis.com__%3B!!NZvER7FxgEiBAiR_!8
> > k8JQUqNQYj-fQWELRFtxlACk1xSqLtVEnIHDmvmw6QnGtc3id_S4IOLqa6-Y9F4mOzpTfA
> > OWo4%24&amp;data=04%7C01%7C%7C2c06b48172e64effe38208d91b01d138%7Cd09f6
> > c4846d241f380993e0f7df7a48e%7C1%7C0%7C637570516886408094%7CUnknown%7CT
> > WFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI
> > 6Mn0%3D%7C1000&amp;sdata=EQcNZBDQoEHOCGnJRWPyz%2B2a8tulfifkkFGI1Py4SIs
> > %3D&amp;reserved=0
> >
> > Headquarters: Freiburg im Breisgau
> > Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080 Managing
> > Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
> >
> >
>


-- 
Greg M. Silverman
Senior Systems Developer
NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
Department of Surgery
University of Minnesota
[email protected]

Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS]

Reply via email to