Thanks for clarification sean. On Thu, Jan 12, 2017 at 8:43 PM, Finan, Sean < sean.fi...@childrens.harvard.edu> wrote:
> Hi Vighnesh, > > 1. Does ctakes depend upon exact word match? > By default, yet. The fast clinical pipeline uses > "DefaultJCasTermAnnotator" or some such horribly named class. There is > also an "OverlapJCasTermAnnotator". Equally horrible name, slightly > different functionality. Given: "Blood, urine test" the Default will > identify "blood", "urine" and "urine test". The overlap will identify > "Blood", "urine", "urine test" and "blood test". Obviously this requires > all four terms to be in the dictionary. > > 2. How to get all nouns in a document not covered by an > IdentifiedAnnotation? > > JCasUtil.select( jcas, BaseToken.class ).stream().filter( b -> > b.getPartOfSpeech().equals("NN") ).map( Annotation::getCoveredText() > ).forEach( System.out::println ); > > Something like that should work. Filtering by discovered > IdentifiedAnnotations is another step. Something like: > > Collection<TextSpan> identifiedSpans = JCasUtil.select( jcas, > IdentifiedAnnotation.class ).stream().map( a -> new DefaultTextSpan(a, 0) > ).collect( Collectors.toList() ); > > Predicate<BaseToken> overlapped = bt -> { > TextSpan ts = new DefaultTextSpan( BaseToken, 0 ); > return identifiedSpans.stream().filter( s -> s.overlaps(ts) > ).findAny().exists(); > } > > Then add .filter( !overlapped ) before the original .map( > Annotation::getCoveredText ). I am not debugging this email, so you may > need to check my stream methods. > > Sean > > > -----Original Message----- > From: Sparsh K [mailto:sparsh...@gmail.com] > Sent: Thursday, January 12, 2017 7:31 AM > To: dev-...@ctakes.apache.org; dev@ctakes.apache.org > Subject: Question on ctakes > > Hi > > I am new to ctakes, I have got few questions, Please guide me with your > inputs. > > 1. When a clinical note is inputted to ctakes, it will process that text > in multi stages. > Let us take an eg of a clinical note :- SINGLE/PRETERM (35 WEEKS 5 > DAYS)/MALE/AGA. > > Here the word "preterm" is not in dictionary, preterm infant, premature > baby etc is there. So ctakes is not identifying that word as coveredText. > > My question is does ctakes processing mainly depends on exact word match > with the dictionary. If so If i give one page of clinical note with > explanation of disease and if it does not contain exact matching words with > dictionary, then ctakes will not identify that word. Is it true? > > 2. Ctakes does POS tagging and does named entity recognition on the noun > terms. How to pull out a list of nouns created which are not matched to a > named disorder code at the named entity recognition level. > > > Regards > Vighnesh >