Hi Vighnesh,

1.  Does ctakes depend upon exact word match?
        By default, yet.  The fast clinical pipeline uses 
"DefaultJCasTermAnnotator" or some such horribly named class.  There is also an 
"OverlapJCasTermAnnotator".  Equally horrible name, slightly different 
functionality.  Given: "Blood, urine test" the Default will identify "blood", 
"urine" and "urine test".  The overlap will identify "Blood", "urine", "urine 
test" and "blood test".  Obviously this requires all four terms to be in the 
dictionary.

2.  How to get all nouns in a document not covered by an IdentifiedAnnotation?

JCasUtil.select( jcas, BaseToken.class ).stream().filter( b -> 
b.getPartOfSpeech().equals("NN") ).map( Annotation::getCoveredText() ).forEach( 
System.out::println );

Something like that should work.  Filtering by discovered IdentifiedAnnotations 
is another step.  Something like:

Collection<TextSpan> identifiedSpans = JCasUtil.select( jcas, 
IdentifiedAnnotation.class ).stream().map( a -> new DefaultTextSpan(a, 0) 
).collect( Collectors.toList() );

Predicate<BaseToken> overlapped = bt -> {  
   TextSpan ts = new DefaultTextSpan( BaseToken, 0 );
   return identifiedSpans.stream().filter( s -> s.overlaps(ts) 
).findAny().exists();
}

Then add .filter( !overlapped ) before the original .map( 
Annotation::getCoveredText ).  I am not debugging this email, so you may need 
to check my stream methods.

Sean


-----Original Message-----
From: Sparsh K [mailto:sparsh...@gmail.com] 
Sent: Thursday, January 12, 2017 7:31 AM
To: dev-...@ctakes.apache.org; dev@ctakes.apache.org
Subject: Question on ctakes

Hi

I am new to ctakes, I have got few questions, Please guide me with your inputs.

1. When a clinical note is inputted to ctakes, it will process that text in 
multi stages.
Let us take an eg of a clinical note :- SINGLE/PRETERM (35 WEEKS 5 
DAYS)/MALE/AGA.

Here the word "preterm" is not in dictionary, preterm infant, premature baby 
etc is there. So ctakes is not identifying that word as coveredText.

My question is does ctakes processing mainly depends on exact word match with 
the dictionary.  If so If i give one page of clinical note with explanation of 
disease and if it does not contain exact matching words with dictionary, then 
ctakes will not identify that word. Is it true?

2. Ctakes does POS tagging and does named entity recognition on the noun terms. 
How to  pull out a list of nouns created which are not matched to a named 
disorder code at the named entity recognition level.


Regards
Vighnesh

Reply via email to