CharacterOffsetToLineTokenConverterCtakesImpl.java

digital paula Wed, 15 Jan 2014 17:55:50 -0800
First of all thanks James and VJ for the prompt responses.
 
I just verified with family history, history, subject, uncertainty, and 
polarity.   These features are not assigned if the annotation is on a new line. 
   
 
Glad to hear the YTEX branch includes a sentence splitter that does not split 
sentences on new lines.    I think that will also solve the problem with the 
sectionizer and it's sporadic NPE's.  
 
Regards,
Paula
 
> Date: Wed, 15 Jan 2014 11:34:24 -0500
> Subject: Re: svn commit: r1551805 - 
> /ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakesImpl.java
> From: [email protected]
> To: [email protected]
> 
> The issue is indeed the sentence splitter - negation is limited to words
> within the sentence, and if newlines are considered sentence boundaries, it
> doesn't work properly (splitting on newlines breaks many other things as
> well).  The YTEX branch includes a sentence splitter that does not
> automatically split sentences on newlines.
> 
> best,
> 
> vj
> 
> 
> On Wed, Jan 15, 2014 at 10:03 AM, Masanz, James J. 
> <[email protected]>wrote:
> 
> > Hi Paula,
> >
> > The sentence detector in 3.1.0 and 3.1.1 (and previous releases) assumes
> > sentences don't cross line boundaries.
> > OpenNLP is used to find sentence breaks, but then if newlines are found,
> > those are also set (within cTAKES, not OpenNLP) to be sentence breaks.
> >
> > (just FYI I haven't had a chance to look at the ytex branch, which the
> > subject commit is about)
> >
> > -- James
> >
> > -----Original Message-----
> > From: [email protected] [mailto:
> > [email protected]] On Behalf Of
> > digital paula
> > Sent: Tuesday, January 14, 2014 10:25 PM
> > To: [email protected]
> > Subject: RE: svn commit: r1551805 -
> > /ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakesImpl.java
> >
> >
> >
> >
> >
> >
> >
> > Hello cTAKES Developer Community,
> >  I'm a little behind on reading posts....this one is from last month.  I
> > think this issue is already addressed in current release? I'm still running
> > the previous release...3.1.0.
> > I just noticed something interesting, the negation didn't take when it is
> > on a different line.  I just removed all carriage returns from narratives
> > and negation picked it up as long as it's treated as one long string.   To
> > better explain what I mean.  Two narrative comments below.
> >
> > 1.  patient did not have diabetes
> > 2. patient did not have
> > diabetes
> >
> > Number 1 above got negated but number 2 did not. This might be related to
> > the issue w/the sectionizer.  I noticed that when I treated the narrative
> > as one string the sectionizer never crashes with the NPE.   Well the
> > sectionizer is of no point if narrative is as one string but it's helping
> > me pinpoint the problem.
> >
> > Regards,
> > Paula
> >
> >
> > > Date: Thu, 19 Dec 2013 11:04:57 -0500
> > > Subject: Re: FW: svn commit: r1551805 -
> > /ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakesImpl.java
> > > From: [email protected]
> > > To: [email protected]
> > >
> > > Hi Pei,
> > >
> > > I'm not sure if that would solve the problem: change in the ytex branch
> > > causes newlines to be ignored (i.e. not treated as a token).  trunk's
> > > sentence splitter is splits sentences on newlines, so newlines would
> > never
> > > be found in a sentence.  However, if we had a reproducer we could check
> > it
> > > fairly easily in the ytex branch.
> > >
> > > Best,
> > >
> > > VJ
> > >
> > >
> > > On Thu, Dec 19, 2013 at 10:15 AM, Chen, Pei
> > > <[email protected]>wrote:
> > >
> > > > Vj,
> > > > Do you think this is what was causing the NPE's [1]?
> > > > If so, shall we make the same fix in trunk?
> > > > --Pei
> > > >
> > > > [1]
> > > >
> > http://mail-archives.apache.org/mod_mbox/ctakes-dev/201309.mbox/%3C924DE05C19409B438EB81DE683A942D9105A93CB%40CHEXMBX1A.CHBOSTON.ORG%3E
> > > >
> > > > -----Original Message-----
> > > > From: [email protected] [mailto:[email protected]]
> > > > Sent: Tuesday, December 17, 2013 9:15 PM
> > > > To: [email protected]
> > > > Subject: svn commit: r1551805 -
> > > >
> > /ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakesImpl.java
> > > >
> > > > Author: vjapache
> > > > Date: Wed Dec 18 02:14:13 2013
> > > > New Revision: 1551805
> > > >
> > > > URL: http://svn.apache.org/r1551805
> > > > Log:
> > > > add support for sentences that contain newline tokens.
> > > >
> > > > Modified:
> > > >
> > > >
> > ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakesImpl.java
> > > >
> > > > Modified:
> > > >
> > ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakesImpl.java
> > > > URL:
> > > >
> > http://svn.apache.org/viewvc/ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakesImpl.java?rev=1551805&r1=1551804&r2=1551805&view=diff
> > > >
> > > >
> > ==============================================================================
> > > > ---
> > > >
> > ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakesImpl.java
> > > > (original)
> > > > +++
> > ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctake
> > > > +++
> > s/assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCta
> > > > +++ kesImpl.java Wed Dec 18 02:14:13 2013
> > > > @@ -32,8 +32,8 @@ import org.apache.uima.jcas.tcas.Annotat  import
> > > > org.mitre.medfacts.i2b2.api.ApiConcept;
> > > >  import org.mitre.medfacts.zoner.CharacterOffsetToLineTokenConverter;
> > > >  import org.mitre.medfacts.zoner.LineAndTokenPosition;
> > > > -
> > > >  import org.apache.ctakes.typesystem.type.syntax.BaseToken;
> > > > +import org.apache.ctakes.typesystem.type.syntax.NewlineToken;
> > > >  import org.apache.ctakes.typesystem.type.textspan.Sentence;
> > > >
> > > >  public class CharacterOffsetToLineTokenConverterCtakesImpl implements
> > > > CharacterOffsetToLineTokenConverter
> > > > @@ -78,11 +78,13 @@ public class CharacterOffsetToLineTokenC
> > > >           for (Annotation current : annotationIndex)
> > > >           {
> > > >                   BaseToken bt = (BaseToken)current;
> > > > -                 int begin = bt.getBegin();
> > > > -                 int end = bt.getEnd();
> > > > -
> > > > -                 tokenBeginEndTreeSet.add(begin);
> > > > -                 tokenBeginEndTreeSet.add(end);
> > > > +                 // filter out NewlineToken
> > > > +                 if (!(bt instanceof NewlineToken)) {
> > > > +                         int begin = bt.getBegin();
> > > > +                         int end = bt.getEnd();
> > > > +                         tokenBeginEndTreeSet.add(begin);
> > > > +                         tokenBeginEndTreeSet.add(end);
> > > > +                 }
> > > >           }
> > > >    }
> > > >
> > > >
> > > >
> > > >
> >
> >
> >
> >
RE: svn commit: r1551805 - /ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakesImpl.java

Reply via email to