Hi Peter, I believe I've encountered this too; I never got around to tracking it down to the root cause, and didn't have the civic-mindedness to report it as you have. Thanks! To shut it up I implemented a brutal brute-force workaround, enclosed for your possible amusement.
But it occurred to me that in every other case, where the annotation > doesn't begin on the first character and it doesn't throw an exception, it > might cause downstream methods like doesSubsume to give the wrong result > because the begin/end offsets are wrong. One would think so, but interestingly enough, this does *not* seem to be the case. Everywhere I've checked (quite a few, over the past few years), non-initial ContextAnnotation offsets look correct. Workaround: a class that extends NegexAnnotator and adjusts the offsets at the end of the process() method. public class NegexAnnotator extends org.apache.ctakes.ytex.uima.annotators.NegexAnnotator { ... private void adjustContextOffsets(JCas jCas) { String text = jCas.getDocumentText(); if (text == null) return; Collection<ContextAnnotation> contexts = JCasUtil.select(jCas, ContextAnnotation.class); if (contexts == null || contexts.isEmpty()) return; contexts.stream() .filter(c -> c.getBegin() < 0) .peek(c -> logger.debug("adjusting begin=" + c.getBegin())) .forEach(c -> c.setBegin(0)); // don't know if this happens int docTextLen = jCas.getDocumentText().length(); contexts.stream() .filter(c -> c.getEnd() >= docTextLen) .peek(c -> logger.debug("adjusting end=" + c.getEnd())) .forEach(c -> c.setEnd(docTextLen - 1)); } On Sun, Aug 30, 2020 at 5:35 PM Peter Abramowitsch <pabramowit...@gmail.com> wrote: > Hi, > I was getting a StringIndexOutOfBoundsException in > DependencyUtil.doesSubsume(annot1, annot2) with exactly this situation: > > *negex annotator* > *the text begins "negative for <anything>"* > > If the chunk *negative for xyz *is preceded by anything else, even a > space, the problem goes away. It also goes away when you choose another > style of negation. "no headache", for instance > > I've traced the problem back to some illegal entries in the jCAS You can > see from the image below that the ContextAnnotation's begin offset is > illegal. > > Clearly there's an off-by-one error and this triggered the exception > because in my example, the Annotation is created right from the 0th char of > my note text. But it occurred to me that in every other case, where the > annotation doesn't begin on the first character and it doesn't throw an > exception, it might cause downstream methods like doesSubsume to give the > wrong result because the begin/end offsets are wrong. > > I'm not sure how to follow this up. But if anyone wants to tackle it....? > > This is from HistoryAttributeClassifier beginning at line 274 > > [image: image.png] > > > >