Hi Vighnesh, > I Have changed JCasUtil.select .... a -> new DefaultTextSpan(a, 0) should not be changed. The DefaultTextSpan in core.cc.pretty.textspan should be used as it has the overlaps(..) convenience method. That and the similar class in lookup2 should be merged at some point ... The constructor you are using: /** * @param annotation - * @param sentenceOffset begin span offset of the containing sentence */ public DefaultTextSpan( final AnnotationFS annotation, final int sentenceOffset ) { this( annotation.getBegin() - sentenceOffset, annotation.getEnd() - sentenceOffset ); }
> I could not make out what needs to be added in place of BaseToken ... TextSpan ts = new DefaultTextSpan( BaseToken, 0 ); should be TextSpan ts = new DefaultTextSpan( bt, 0 ); My apologies if you spent a lot of time debugging - I try to get the details into these emails but don't really have time to write and run everything myself. It might help people on the devlist (or other forums) to state a little about your development experience / background if you need help. I hope that this fixes everything, Sean -----Original Message----- From: Sparsh K [mailto:sparsh...@gmail.com] Sent: Sunday, January 15, 2017 12:26 PM To: dev@ctakes.apache.org Cc: dev-...@ctakes.apache.org Subject: Re: Question on ctakes Hi Sean, I tried to use your solution, I got few compilation errors , I few fixed few. I Have changed JCasUtil.select( jcas, IdentifiedAnnotation.class ).stream().map( a -> new DefaultTextSpan(a, 0) ) to JCasUtil.select( jcas, IdentifiedAnnotation.class ).stream().map( a -> new DefaultTextSpan(*a.getBegin()*, 0) ) hope this is correct. I could not make out what needs to be added in place of BaseToken in below case. TextSpan ts = new DefaultTextSpan( BaseToken, 0 ); Thanks & Regards Vighnesh On Thu, Jan 12, 2017 at 10:12 PM, Sparsh K <sparsh...@gmail.com> wrote: > Thanks for clarification sean. > > On Thu, Jan 12, 2017 at 8:43 PM, Finan, Sean < > sean.fi...@childrens.harvard.edu> wrote: > >> Hi Vighnesh, >> >> 1. Does ctakes depend upon exact word match? >> By default, yet. The fast clinical pipeline uses >> "DefaultJCasTermAnnotator" or some such horribly named class. There >> is also an "OverlapJCasTermAnnotator". Equally horrible name, >> slightly different functionality. Given: "Blood, urine test" the >> Default will identify "blood", "urine" and "urine test". The overlap >> will identify "Blood", "urine", "urine test" and "blood test". >> Obviously this requires all four terms to be in the dictionary. >> >> 2. How to get all nouns in a document not covered by an >> IdentifiedAnnotation? >> >> JCasUtil.select( jcas, BaseToken.class ).stream().filter( b -> >> b.getPartOfSpeech().equals("NN") ).map( Annotation::getCoveredText() >> ).forEach( System.out::println ); >> >> Something like that should work. Filtering by discovered >> IdentifiedAnnotations is another step. Something like: >> >> Collection<TextSpan> identifiedSpans = JCasUtil.select( jcas, >> IdentifiedAnnotation.class ).stream().map( a -> new >> DefaultTextSpan(a, 0) ).collect( Collectors.toList() ); >> >> Predicate<BaseToken> overlapped = bt -> { >> TextSpan ts = new DefaultTextSpan( BaseToken, 0 ); >> return identifiedSpans.stream().filter( s -> s.overlaps(ts) >> ).findAny().exists(); } >> >> Then add .filter( !overlapped ) before the original .map( >> Annotation::getCoveredText ). I am not debugging this email, so you >> may need to check my stream methods. >> >> Sean >> >> >> -----Original Message----- >> From: Sparsh K [mailto:sparsh...@gmail.com] >> Sent: Thursday, January 12, 2017 7:31 AM >> To: dev-...@ctakes.apache.org; dev@ctakes.apache.org >> Subject: Question on ctakes >> >> Hi >> >> I am new to ctakes, I have got few questions, Please guide me with >> your inputs. >> >> 1. When a clinical note is inputted to ctakes, it will process that >> text in multi stages. >> Let us take an eg of a clinical note :- SINGLE/PRETERM (35 WEEKS 5 >> DAYS)/MALE/AGA. >> >> Here the word "preterm" is not in dictionary, preterm infant, >> premature baby etc is there. So ctakes is not identifying that word as >> coveredText. >> >> My question is does ctakes processing mainly depends on exact word >> match with the dictionary. If so If i give one page of clinical note >> with explanation of disease and if it does not contain exact matching >> words with dictionary, then ctakes will not identify that word. Is it true? >> >> 2. Ctakes does POS tagging and does named entity recognition on the >> noun terms. How to pull out a list of nouns created which are not >> matched to a named disorder code at the named entity recognition level. >> >> >> Regards >> Vighnesh >> > >