Hi Sean, I tried to use your solution,
I got few compilation errors , I few fixed few. I Have changed JCasUtil.select( jcas, IdentifiedAnnotation.class ).stream().map( a -> new DefaultTextSpan(a, 0) ) to JCasUtil.select( jcas, IdentifiedAnnotation.class ).stream().map( a -> new DefaultTextSpan(*a.getBegin()*, 0) ) hope this is correct. I could not make out what needs to be added in place of BaseToken in below case. TextSpan ts = new DefaultTextSpan( BaseToken, 0 ); Thanks & Regards Vighnesh On Thu, Jan 12, 2017 at 10:12 PM, Sparsh K <sparsh...@gmail.com> wrote: > Thanks for clarification sean. > > On Thu, Jan 12, 2017 at 8:43 PM, Finan, Sean < > sean.fi...@childrens.harvard.edu> wrote: > >> Hi Vighnesh, >> >> 1. Does ctakes depend upon exact word match? >> By default, yet. The fast clinical pipeline uses >> "DefaultJCasTermAnnotator" or some such horribly named class. There is >> also an "OverlapJCasTermAnnotator". Equally horrible name, slightly >> different functionality. Given: "Blood, urine test" the Default will >> identify "blood", "urine" and "urine test". The overlap will identify >> "Blood", "urine", "urine test" and "blood test". Obviously this requires >> all four terms to be in the dictionary. >> >> 2. How to get all nouns in a document not covered by an >> IdentifiedAnnotation? >> >> JCasUtil.select( jcas, BaseToken.class ).stream().filter( b -> >> b.getPartOfSpeech().equals("NN") ).map( Annotation::getCoveredText() >> ).forEach( System.out::println ); >> >> Something like that should work. Filtering by discovered >> IdentifiedAnnotations is another step. Something like: >> >> Collection<TextSpan> identifiedSpans = JCasUtil.select( jcas, >> IdentifiedAnnotation.class ).stream().map( a -> new DefaultTextSpan(a, 0) >> ).collect( Collectors.toList() ); >> >> Predicate<BaseToken> overlapped = bt -> { >> TextSpan ts = new DefaultTextSpan( BaseToken, 0 ); >> return identifiedSpans.stream().filter( s -> s.overlaps(ts) >> ).findAny().exists(); >> } >> >> Then add .filter( !overlapped ) before the original .map( >> Annotation::getCoveredText ). I am not debugging this email, so you may >> need to check my stream methods. >> >> Sean >> >> >> -----Original Message----- >> From: Sparsh K [mailto:sparsh...@gmail.com] >> Sent: Thursday, January 12, 2017 7:31 AM >> To: dev-...@ctakes.apache.org; dev@ctakes.apache.org >> Subject: Question on ctakes >> >> Hi >> >> I am new to ctakes, I have got few questions, Please guide me with your >> inputs. >> >> 1. When a clinical note is inputted to ctakes, it will process that text >> in multi stages. >> Let us take an eg of a clinical note :- SINGLE/PRETERM (35 WEEKS 5 >> DAYS)/MALE/AGA. >> >> Here the word "preterm" is not in dictionary, preterm infant, premature >> baby etc is there. So ctakes is not identifying that word as coveredText. >> >> My question is does ctakes processing mainly depends on exact word match >> with the dictionary. If so If i give one page of clinical note with >> explanation of disease and if it does not contain exact matching words with >> dictionary, then ctakes will not identify that word. Is it true? >> >> 2. Ctakes does POS tagging and does named entity recognition on the noun >> terms. How to pull out a list of nouns created which are not matched to a >> named disorder code at the named entity recognition level. >> >> >> Regards >> Vighnesh >> > >