Thats great Tim - it sounds very sophisticated! In fact I had made some changes to the Negex Annotator a last fall which I hadn't checked in but was waiting for Sean to test. In a great deal of my own testing I discovered that Negex, which is easily expandable to accommodate new constructions, had only a couple of serious flaws and I believe I have fixed these, as well as a performance issue it had. If you're interested in testing it up against yours that would be great. Reading your description above, I wondered how it would do in the case of strings of entities which were negated by a single negating trigger phrase either ahead or behind the series. Or what happens when a series of entities which begins as all being negated has one expressed in a way that stops the negation pattern. These are the weaknesses I addressed in my changes.
Regards Peter On Sun, Jan 24, 2021 at 5:08 PM Miller, Timothy < timothy.mil...@childrens.harvard.edu> wrote: > Hi all, > I just checked in a usable proof-of-concept for a neural (RoBERTa-based to > be specific) negation classifier. The way it works is a tiny bit of python > code (using FastAPI) sets up a REST interface that runs the classifier: > ctakes-assertion/src/main/python/negation_rest.py > > it runs a default model that I trained and uploaded into Huggingface > modelhub. It will automatically download the first time the server is run. > > there is a startup script there too: > ctakes-assertion/src/main/python/start_negation_rest.sh > > The idea would be to run this on whatever machine you have with the > appropriate GPU resources and it creates 3 REST endpoints: > /negation/initialize -- to load the model (takes longer the first time as > it will download) > /negation/process -- to classify the data and return negation values > /negation/collection_process_complete -- to unload the model > > to mirror UIMA workflows. Then, the UIMA analysis engine sits in: > > ctakes-assertion/src/main/java/org/apache/ctakes/assertion/ae/PolarityBertRestAnnotator.java > > The main work here is converting the cTAKES entities/events into a simpler > data structure that gets sent to the python REST server, making the REST > call, and then converting the classifier output into the polarity property. > > Performance: > The accuracy of this classifier is much better in my testing. I am looking > forward to being able to hopefully make the path to improving the > performance easier as it can potentially just be a change to the model > string to have it grab a new model on modelhub. > > The speed is marginally slower if we do a 1-for-1 swap, but that's a > little bit misleading, because we currently run 2 parsers to generate > features for the default ML negation module. If we don't need those parsers > we can dramatically cut the speed of the processing even with the neural > negation module. I tested this with the python code running on a machine > with a 1070ti. The goal for these methods going forward if we want to scale > should be to have the neural call do a few things with a single pass, > especially if we are using large transformer models. But this proof of > concept of a single task will hopefully make it easier for other folks to > do that if they wish. > > FYI, another way of doing this is by using python libraries like cassis > and actually having python functions be essentially UIMA AEs -- I think > there will be a place for both approaches and I'm not trying to wall off > work in that direction. > > Tim > >