Thanks for clarification sean.

On Thu, Jan 12, 2017 at 8:43 PM, Finan, Sean <
sean.fi...@childrens.harvard.edu> wrote:

> Hi Vighnesh,
>
> 1.  Does ctakes depend upon exact word match?
>         By default, yet.  The fast clinical pipeline uses
> "DefaultJCasTermAnnotator" or some such horribly named class.  There is
> also an "OverlapJCasTermAnnotator".  Equally horrible name, slightly
> different functionality.  Given: "Blood, urine test" the Default will
> identify "blood", "urine" and "urine test".  The overlap will identify
> "Blood", "urine", "urine test" and "blood test".  Obviously this requires
> all four terms to be in the dictionary.
>
> 2.  How to get all nouns in a document not covered by an
> IdentifiedAnnotation?
>
> JCasUtil.select( jcas, BaseToken.class ).stream().filter( b ->
> b.getPartOfSpeech().equals("NN") ).map( Annotation::getCoveredText()
> ).forEach( System.out::println );
>
> Something like that should work.  Filtering by discovered
> IdentifiedAnnotations is another step.  Something like:
>
> Collection<TextSpan> identifiedSpans = JCasUtil.select( jcas,
> IdentifiedAnnotation.class ).stream().map( a -> new DefaultTextSpan(a, 0)
> ).collect( Collectors.toList() );
>
> Predicate<BaseToken> overlapped = bt -> {
>    TextSpan ts = new DefaultTextSpan( BaseToken, 0 );
>    return identifiedSpans.stream().filter( s -> s.overlaps(ts)
> ).findAny().exists();
> }
>
> Then add .filter( !overlapped ) before the original .map(
> Annotation::getCoveredText ).  I am not debugging this email, so you may
> need to check my stream methods.
>
> Sean
>
>
> -----Original Message-----
> From: Sparsh K [mailto:sparsh...@gmail.com]
> Sent: Thursday, January 12, 2017 7:31 AM
> To: dev-...@ctakes.apache.org; dev@ctakes.apache.org
> Subject: Question on ctakes
>
> Hi
>
> I am new to ctakes, I have got few questions, Please guide me with your
> inputs.
>
> 1. When a clinical note is inputted to ctakes, it will process that text
> in multi stages.
> Let us take an eg of a clinical note :- SINGLE/PRETERM (35 WEEKS 5
> DAYS)/MALE/AGA.
>
> Here the word "preterm" is not in dictionary, preterm infant, premature
> baby etc is there. So ctakes is not identifying that word as coveredText.
>
> My question is does ctakes processing mainly depends on exact word match
> with the dictionary.  If so If i give one page of clinical note with
> explanation of disease and if it does not contain exact matching words with
> dictionary, then ctakes will not identify that word. Is it true?
>
> 2. Ctakes does POS tagging and does named entity recognition on the noun
> terms. How to  pull out a list of nouns created which are not matched to a
> named disorder code at the named entity recognition level.
>
>
> Regards
> Vighnesh
>

Reply via email to