Two Questions about OverlapJcasTermAnnotator

2022-08-23 Thread Peter Abramowitsch
Hi Sean (or whoever has some historical knowledge)

I'm trying to improve the term annotators for speed and have noticed that
the overlap term annotator does not seem to pass even the most rudimentary
use cases suggested in the code comments:

// things like "blood, urine, sputum cultures" should pick up "blood
culture" and "urine culture"

I'm happy to fix this, but my question is whether anyone can attest to
whether it ever has worked, or what use cases you have to indicate that it
does today.

The other question is about the conventions in the term dictionary.  When a
PREFTERM has symbols embedded in its text - like so:

*'electrocardiogram ; 24 hour'*
or so
*'us . doppler . cw'*
or so
*'angioscopies , microscopic'*

Do the symbols have any implied meaning or behavior somewhere in the
pipeline, or are they literally part of the text? (which is usually an
impossibility in real notes)


Re: Two Questions about OverlapJcasTermAnnotator [EXTERNAL]

2022-08-23 Thread Finan, Sean
Hi Peter,

the "blood, urine"... in the example did work when I originally tested, but the 
default settings (window size, etc.) may have been changed since then.

Everything in preftext is simple string literal.  It is likely that certain 
things will not appear in raw text.  The UMLS has some interesting synonym 
sources.

Sean


From: Peter Abramowitsch 
Sent: Tuesday, August 23, 2022 6:00 PM
To: dev@ctakes.apache.org 
Subject: Two Questions about OverlapJcasTermAnnotator [EXTERNAL]

* External Email - Caution *


Hi Sean (or whoever has some historical knowledge)

I'm trying to improve the term annotators for speed and have noticed that
the overlap term annotator does not seem to pass even the most rudimentary
use cases suggested in the code comments:

// things like "blood, urine, sputum cultures" should pick up "blood
culture" and "urine culture"

I'm happy to fix this, but my question is whether anyone can attest to
whether it ever has worked, or what use cases you have to indicate that it
does today.

The other question is about the conventions in the term dictionary.  When a
PREFTERM has symbols embedded in its text - like so:

*'electrocardiogram ; 24 hour'*
or so
*'us . doppler . cw'*
or so
*'angioscopies , microscopic'*

Do the symbols have any implied meaning or behavior somewhere in the
pipeline, or are they literally part of the text? (which is usually an
impossibility in real notes)


Re: Two Questions about OverlapJcasTermAnnotator [EXTERNAL]

2022-08-23 Thread Peter Abramowitsch
Thanks Sean.
Glad to know there wasn't any special behavior with prefterms that I hadn't
known about all these years

Peter

On Tue, Aug 23, 2022 at 4:31 PM Finan, Sean
 wrote:

> Hi Peter,
>
> the "blood, urine"... in the example did work when I originally tested,
> but the default settings (window size, etc.) may have been changed since
> then.
>
> Everything in preftext is simple string literal.  It is likely that
> certain things will not appear in raw text.  The UMLS has some interesting
> synonym sources.
>
> Sean
>
> 
> From: Peter Abramowitsch 
> Sent: Tuesday, August 23, 2022 6:00 PM
> To: dev@ctakes.apache.org 
> Subject: Two Questions about OverlapJcasTermAnnotator [EXTERNAL]
>
> * External Email - Caution *
>
>
> Hi Sean (or whoever has some historical knowledge)
>
> I'm trying to improve the term annotators for speed and have noticed that
> the overlap term annotator does not seem to pass even the most rudimentary
> use cases suggested in the code comments:
>
> // things like "blood, urine, sputum cultures" should pick up "blood
> culture" and "urine culture"
>
> I'm happy to fix this, but my question is whether anyone can attest to
> whether it ever has worked, or what use cases you have to indicate that it
> does today.
>
> The other question is about the conventions in the term dictionary.  When a
> PREFTERM has symbols embedded in its text - like so:
>
> *'electrocardiogram ; 24 hour'*
> or so
> *'us . doppler . cw'*
> or so
> *'angioscopies , microscopic'*
>
> Do the symbols have any implied meaning or behavior somewhere in the
> pipeline, or are they literally part of the text? (which is usually an
> impossibility in real notes)
>