First of all thanks James and VJ for the prompt responses.
I just verified with family history, history, subject, uncertainty, and
polarity. These features are not assigned if the annotation is on a new line.
Glad to hear the YTEX branch includes a sentence splitter that does not split
sentences on new lines. I think that will also solve the problem with the
sectionizer and it's sporadic NPE's.
Regards,
Paula
> Date: Wed, 15 Jan 2014 11:34:24 -0500
> Subject: Re: svn commit: r1551805 -
> /ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakesImpl.java
> From: vnga...@gmail.com
> To: dev@ctakes.apache.org
>
> The issue is indeed the sentence splitter - negation is limited to words
> within the sentence, and if newlines are considered sentence boundaries, it
> doesn't work properly (splitting on newlines breaks many other things as
> well). The YTEX branch includes a sentence splitter that does not
> automatically split sentences on newlines.
>
> best,
>
> vj
>
>
> On Wed, Jan 15, 2014 at 10:03 AM, Masanz, James J.
> <masanz.ja...@mayo.edu>wrote:
>
> > Hi Paula,
> >
> > The sentence detector in 3.1.0 and 3.1.1 (and previous releases) assumes
> > sentences don't cross line boundaries.
> > OpenNLP is used to find sentence breaks, but then if newlines are found,
> > those are also set (within cTAKES, not OpenNLP) to be sentence breaks.
> >
> > (just FYI I haven't had a chance to look at the ytex branch, which the
> > subject commit is about)
> >
> > -- James
> >
> > -----Original Message-----
> > From: dev-return-2375-Masanz.James=mayo....@ctakes.apache.org [mailto:
> > dev-return-2375-Masanz.James=mayo....@ctakes.apache.org] On Behalf Of
> > digital paula
> > Sent: Tuesday, January 14, 2014 10:25 PM
> > To: dev@ctakes.apache.org
> > Subject: RE: svn commit: r1551805 -
> > /ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakesImpl.java
> >
> >
> >
> >
> >
> >
> >
> > Hello cTAKES Developer Community,
> > I'm a little behind on reading posts....this one is from last month. I
> > think this issue is already addressed in current release? I'm still running
> > the previous release...3.1.0.
> > I just noticed something interesting, the negation didn't take when it is
> > on a different line. I just removed all carriage returns from narratives
> > and negation picked it up as long as it's treated as one long string. To
> > better explain what I mean. Two narrative comments below.
> >
> > 1. patient did not have diabetes
> > 2. patient did not have
> > diabetes
> >
> > Number 1 above got negated but number 2 did not. This might be related to
> > the issue w/the sectionizer. I noticed that when I treated the narrative
> > as one string the sectionizer never crashes with the NPE. Well the
> > sectionizer is of no point if narrative is as one string but it's helping
> > me pinpoint the problem.
> >
> > Regards,
> > Paula
> >
> >
> > > Date: Thu, 19 Dec 2013 11:04:57 -0500
> > > Subject: Re: FW: svn commit: r1551805 -
> > /ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakesImpl.java
> > > From: vnga...@gmail.com
> > > To: dev@ctakes.apache.org
> > >
> > > Hi Pei,
> > >
> > > I'm not sure if that would solve the problem: change in the ytex branch
> > > causes newlines to be ignored (i.e. not treated as a token). trunk's
> > > sentence splitter is splits sentences on newlines, so newlines would
> > never
> > > be found in a sentence. However, if we had a reproducer we could check
> > it
> > > fairly easily in the ytex branch.
> > >
> > > Best,
> > >
> > > VJ
> > >
> > >
> > > On Thu, Dec 19, 2013 at 10:15 AM, Chen, Pei
> > > <pei.c...@childrens.harvard.edu>wrote:
> > >
> > > > Vj,
> > > > Do you think this is what was causing the NPE's [1]?
> > > > If so, shall we make the same fix in trunk?
> > > > --Pei
> > > >
> > > > [1]
> > > >
> > http://mail-archives.apache.org/mod_mbox/ctakes-dev/201309.mbox/%3C924DE05C19409B438EB81DE683A942D9105A93CB%40CHEXMBX1A.CHBOSTON.ORG%3E
> > > >
> > > > -----Original Message-----
> > > > From: vjapa...@apache.org [mailto:vjapa...@apache.org]
> > > > Sent: Tuesday, December 17, 2013 9:15 PM
> > > > To: comm...@ctakes.apache.org
> > > > Subject: svn commit: r1551805 -
> > > >
> > /ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakesImpl.java
> > > >
> > > > Author: vjapache
> > > > Date: Wed Dec 18 02:14:13 2013
> > > > New Revision: 1551805
> > > >
> > > > URL: http://svn.apache.org/r1551805
> > > > Log:
> > > > add support for sentences that contain newline tokens.
> > > >
> > > > Modified:
> > > >
> > > >
> > ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakesImpl.java
> > > >
> > > > Modified:
> > > >
> > ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakesImpl.java
> > > > URL:
> > > >
> > http://svn.apache.org/viewvc/ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakesImpl.java?rev=1551805&r1=1551804&r2=1551805&view=diff
> > > >
> > > >
> > ==============================================================================
> > > > ---
> > > >
> > ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakesImpl.java
> > > > (original)
> > > > +++
> > ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctake
> > > > +++
> > s/assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCta
> > > > +++ kesImpl.java Wed Dec 18 02:14:13 2013
> > > > @@ -32,8 +32,8 @@ import org.apache.uima.jcas.tcas.Annotat import
> > > > org.mitre.medfacts.i2b2.api.ApiConcept;
> > > > import org.mitre.medfacts.zoner.CharacterOffsetToLineTokenConverter;
> > > > import org.mitre.medfacts.zoner.LineAndTokenPosition;
> > > > -
> > > > import org.apache.ctakes.typesystem.type.syntax.BaseToken;
> > > > +import org.apache.ctakes.typesystem.type.syntax.NewlineToken;
> > > > import org.apache.ctakes.typesystem.type.textspan.Sentence;
> > > >
> > > > public class CharacterOffsetToLineTokenConverterCtakesImpl implements
> > > > CharacterOffsetToLineTokenConverter
> > > > @@ -78,11 +78,13 @@ public class CharacterOffsetToLineTokenC
> > > > for (Annotation current : annotationIndex)
> > > > {
> > > > BaseToken bt = (BaseToken)current;
> > > > - int begin = bt.getBegin();
> > > > - int end = bt.getEnd();
> > > > -
> > > > - tokenBeginEndTreeSet.add(begin);
> > > > - tokenBeginEndTreeSet.add(end);
> > > > + // filter out NewlineToken
> > > > + if (!(bt instanceof NewlineToken)) {
> > > > + int begin = bt.getBegin();
> > > > + int end = bt.getEnd();
> > > > + tokenBeginEndTreeSet.add(begin);
> > > > + tokenBeginEndTreeSet.add(end);
> > > > + }
> > > > }
> > > > }
> > > >
> > > >
> > > >
> > > >
> >
> >
> >
> >