Hi Vighnesh, 1. Does ctakes depend upon exact word match? By default, yet. The fast clinical pipeline uses "DefaultJCasTermAnnotator" or some such horribly named class. There is also an "OverlapJCasTermAnnotator". Equally horrible name, slightly different functionality. Given: "Blood, urine test" the Default will identify "blood", "urine" and "urine test". The overlap will identify "Blood", "urine", "urine test" and "blood test". Obviously this requires all four terms to be in the dictionary.
2. How to get all nouns in a document not covered by an IdentifiedAnnotation? JCasUtil.select( jcas, BaseToken.class ).stream().filter( b -> b.getPartOfSpeech().equals("NN") ).map( Annotation::getCoveredText() ).forEach( System.out::println ); Something like that should work. Filtering by discovered IdentifiedAnnotations is another step. Something like: Collection<TextSpan> identifiedSpans = JCasUtil.select( jcas, IdentifiedAnnotation.class ).stream().map( a -> new DefaultTextSpan(a, 0) ).collect( Collectors.toList() ); Predicate<BaseToken> overlapped = bt -> { TextSpan ts = new DefaultTextSpan( BaseToken, 0 ); return identifiedSpans.stream().filter( s -> s.overlaps(ts) ).findAny().exists(); } Then add .filter( !overlapped ) before the original .map( Annotation::getCoveredText ). I am not debugging this email, so you may need to check my stream methods. Sean -----Original Message----- From: Sparsh K [mailto:sparsh...@gmail.com] Sent: Thursday, January 12, 2017 7:31 AM To: dev-...@ctakes.apache.org; dev@ctakes.apache.org Subject: Question on ctakes Hi I am new to ctakes, I have got few questions, Please guide me with your inputs. 1. When a clinical note is inputted to ctakes, it will process that text in multi stages. Let us take an eg of a clinical note :- SINGLE/PRETERM (35 WEEKS 5 DAYS)/MALE/AGA. Here the word "preterm" is not in dictionary, preterm infant, premature baby etc is there. So ctakes is not identifying that word as coveredText. My question is does ctakes processing mainly depends on exact word match with the dictionary. If so If i give one page of clinical note with explanation of disease and if it does not contain exact matching words with dictionary, then ctakes will not identify that word. Is it true? 2. Ctakes does POS tagging and does named entity recognition on the noun terms. How to pull out a list of nouns created which are not matched to a named disorder code at the named entity recognition level. Regards Vighnesh