Re: Performance of the cleartk history module [EXTERNAL]

Finan, Sean Tue, 04 Jan 2022 07:47:34 -0800

Hi Peter,

I created a second engine that just used text matching or regular expressions 
given the discovered events.  It also uses covering section types, formatted 
text and other things, but the text match might be the most impactful item.


You are an accomplished developer so the email scratch below is for the benefit 
of others who search archives. 

class LazyHistoryFinder extends JCasAnnotator_ImplBase {
  String[] HISTORY = { "history of", "h/o", "h / o" };

  boolean isHistory( EventMention event ) {
       text = e.getCoveredText().toLowerCase();
      return Arrays.stream( HISTORY ).anyMatch( text::startsWith );
  }

  void process( JCas jcas ) throws Analysis*Ex {
    JCasUtil.select( jcas, EventMention.class )
                 .stream()
                 .filter( this::isHistory )
                 .foreach( e -> e.setHistoryOf( CONST.NE_HISTORY_OF_PRESENT ) );
  }
}

It requires a stroll through the monstrous cas array and it certainly isn't 
sexy, but it gets the job done.  

Sean


________________________________________
From: Peter Abramowitsch <pabramowit...@gmail.com>
Sent: Monday, January 3, 2022 10:23 PM
To: dev@ctakes.apache.org
Subject: Re: Performance of the cleartk history module [EXTERNAL]

* External Email - Caution *


Thanks Sean

By "following engine", you mean a second instance of the history engine
that uses only the event spans, or you modified the current one to traverse
the event-span within the context window?    I see you made some source
changes in that area and will check tomorrow.

Peter

On Mon, Jan 3, 2022 at 2:26 PM Finan, Sean <sean.fi...@childrens.harvard.edu>
wrote:

> Hi Peter,
>
> I have noticed this and just added a following engine that recognized text
> within event spans.  It is a lazy solution, but it fit my needs and
> available time.
>
> Sean
> ________________________________________
> From: Peter Abramowitsch <pabramowit...@gmail.com>
> Sent: Monday, January 3, 2022 5:03 PM
> To: dev@ctakes.apache.org
> Subject: Performance of the cleartk history module [EXTERNAL]
>
> * External Email - Caution *
>
>
> Hi All
>
> I've noticed that the HistoryCleartkAnalysisEngine misses many common forms
> of subject history including the obvious "h/o" prefix.    Looking into the
> distribution, there's a model.jar and what  appears to be a weights file
> containing trigger words:
> resources/org/apache/ctakes/assertion/models/history.txt   where h, o, /
> are all given their own weights.   But I'm not sure that they're actually
> used in this way:  see below.   However, there's also a tiny file:
> /org/apache/ctakes/assertion/semantic_classes/history.txt
> which does contain a few entries including "h/o" which I assume is used for
> training but is never referred to anywhere.
>
> Here's the behavior I'm seeing:
> example input condition term found history feature marked range text
> history of pregnancies "history of" included in the cu_term and prefterm
> yes
>   no history of pregnancies
> history of adenopathy "history of" not included in the cu_term or prefterm
> yes yes adenopathy
> H/O postpartum psychosis "h/o" not included in the prefterm or cu_term yes
> yes postpartum psychosis
> H/O: postpartum psychosis "h/o" not included in the prefterm or cu_term yes
> no postpartum psychosis
> H/O pregnancies "h/o"  included in the  cu_term yes no h/o pregnancies
>
> You can see that it is quite perverse -  there is a pattern suggesting that
> if the concept definition occupies the history words, then they cannot be
> seen by the history annotation engine.
>
> Has anyone else noticed this - and have they done anything about it?
>
> Peter
>

Re: Performance of the cleartk history module [EXTERNAL]

Reply via email to