---------- Forwarded message ---------
From: Peter Abramowitsch <pabramowit...@gmail.com>
Date: Thu, Feb 10, 2022 at 6:41 PM
Subject: Re: neural negation model in ctakes
To: <twloehf...@ucdavis.edu.invalid>


Hi Tom

My changes to negex are checked into the ctakes trunk in  sub-project
ctakes-ytex-uima
Make sure you also pick up the updated resources:  a negex_triggers.txt and
a new one: negex_excluded_keys.txt that needs to live in the same folder

resources/org/apache/ctakes/ytex/uima/annotators/negex_excluded_keys.txt
resources/org/apache/ctakes/ytex/uima/annotators/negex_triggers.txt

I'm curious to see how it performs for you

Note that in the test I have made, it is much more successful when you use
the SentenceDectectorBIO rather than the default sentence detector.  This
because BIO is able to make sentences out of the kind of fragments you
typically find in real notes.

And thanks for the update on the other things that are happening!

Peter

On Thu, Feb 10, 2022 at 5:29 PM Thomas W Loehfelm
<twloehf...@ucdavis.edu.invalid> wrote:

> Thanks for the BERT negation classifier Tim - I tested it on a small
> sample of radiology notes and didn't see a big improvement in accuracy
> (kudos to the original ctakes pipeline!) but your framework of a FastAPI
> component that can loosely interface with cTAKES was really interesting.
>
> @Peter - I'd love to try your improved Negex Annotator too - is it
> available somewhere?
>
> I made a relative of Tim's FastAPI component but based on the medSpaCy
> ConText annotator and am sharing it here in case it is useful to anyone
> else. I haven't gotten around to doing the head-to-head-to-head comparison
> between ctakes, RoBERTa, and medSpaCy Context, but will someday and can
> update the list.
>
> Check it out here: https://github.com/twloehfelm/medSpaCy_Context
>
> The FastAPI docs serve as a useful intro (localhost:8000/docs once up and
> running), but you basically pass it a dictionary of
>
> {
>         accnum: [accession number],
>         report: [report text],
>         annotations: List[Annotation]
> }
>
> Where Annotation is an object with:
> first_pos: Int
> last_pos: Int
> is_negated: bool
> is_uncertain: bool
> is_conditional: bool
> is_historic: bool
> subject: str
>
> I parse ctakes annotations into a database similar to the annotation
> object, so with this FastAPI endpoint I pass in the cTAKES output and try
> to find the matching contexts using the medSpaCy context pipeline. I return
> a list of exact or overlapping (if no exact matching) spans where I set
> True if either cTAKES or medSpaCy thought that item was true. Again,
> haven't done the testing yet to figure out what the best way to ensemble
> these together might be.
>
> Definite limitations:
> * The annotations medspacy finds are not linked to any knowledge base. It
> is not hard to add that in using sciSpaCy, but I haven't found that to be
> helpful yet and it adds significantly to the startup and document
> processing time.
> * Using overlapping matches is probably a bad idea, since some of the
> named entities identified by medSpaCy include the negation term which
> really messes with the is_negated accuracy.
> * Probably a million others
>
> Sharing in case it is useful to bootstrap someone else.
>
>
> Tom
>
> -----Original Message-----
> From: Peter Abramowitsch <pabramowit...@gmail.com>
> Sent: Sunday, January 24, 2021 9:05 AM
> To: dev@ctakes.apache.org
> Subject: Re: neural negation model in ctakes
>
> Thats great Tim - it sounds very sophisticated!
>
> In fact I had made some changes to the Negex Annotator a last fall which I
> hadn't checked in but was waiting for Sean to test.  In a great deal of my
> own testing I discovered that Negex, which is easily expandable to
> accommodate new constructions, had only a couple of serious flaws and I
> believe I have fixed these, as well as a performance issue it had.   If
> you're interested in testing it up against yours that would be great.
> Reading your description above, I wondered how it would do in the case of
> strings of entities which were negated by a single negating trigger phrase
> either ahead or behind the series.  Or what happens when a series of
> entities which begins as all being negated has one expressed in a way that
> stops the negation pattern.  These are the weaknesses I addressed in my
> changes.
>
> Regards
> Peter
>
> On Sun, Jan 24, 2021 at 5:08 PM Miller, Timothy <
> timothy.mil...@childrens.harvard.edu> wrote:
>
> > Hi all,
> > I just checked in a usable proof-of-concept for a neural
> > (RoBERTa-based to be specific) negation classifier. The way it works
> > is a tiny bit of python code (using FastAPI) sets up a REST interface
> that runs the classifier:
> > ctakes-assertion/src/main/python/negation_rest.py
> >
> > it runs a default model that I trained and uploaded into Huggingface
> > modelhub. It will automatically download the first time the server is
> run.
> >
> > there is a startup script there too:
> > ctakes-assertion/src/main/python/start_negation_rest.sh
> >
> > The idea would be to run this on whatever machine you have with the
> > appropriate GPU resources and it creates 3 REST endpoints:
> > /negation/initialize  -- to load the model (takes longer the first
> > time as it will download) /negation/process -- to classify the data
> > and return negation values /negation/collection_process_complete -- to
> > unload the model
> >
> > to mirror UIMA workflows. Then, the UIMA analysis engine sits in:
> >
> > ctakes-assertion/src/main/java/org/apache/ctakes/assertion/ae/Polarity
> > BertRestAnnotator.java
> >
> > The main work here is converting the cTAKES entities/events into a
> > simpler data structure that gets sent to the python REST server,
> > making the REST call, and then converting the classifier output into the
> polarity property.
> >
> > Performance:
> > The accuracy of this classifier is much better in my testing. I am
> > looking forward to being able to hopefully make the path to improving
> > the performance easier as it can potentially just be a change to the
> > model string to have it grab a new model on modelhub.
> >
> > The speed is marginally slower if we do a 1-for-1 swap, but that's a
> > little bit misleading, because we currently run 2 parsers to generate
> > features for the default ML negation module. If we don't need those
> > parsers we can dramatically cut the speed of the processing even with
> > the neural negation module. I tested this with the python code running
> > on a machine with a 1070ti. The goal for these methods going forward
> > if we want to scale should be to have the neural call do a few things
> > with a single pass, especially if we are using large transformer
> > models. But this proof of concept of a single task will hopefully make
> > it easier for other folks to do that if they wish.
> >
> > FYI, another way of doing this is by using python libraries like
> > cassis and actually having python functions be essentially UIMA AEs --
> > I think there will be a place for both approaches and I'm not trying
> > to wall off work in that direction.
> >
> > Tim
> >
> >
> **CONFIDENTIALITY NOTICE** This e-mail communication and any attachments
> are for the sole use of the intended recipient and may contain information
> that is confidential and privileged under state and federal privacy laws.
> If you received this e-mail in error, be aware that any unauthorized use,
> disclosure, copying, or distribution is strictly prohibited. If you
> received this e-mail in error, please contact the sender immediately and
> destroy/delete all copies of this message.
>

Reply via email to