Hi Peter,

I believe I've encountered this too; I never got around to tracking it down
to the root cause, and didn't have the civic-mindedness to report it as you
have.  Thanks!
To shut it up I implemented a brutal brute-force workaround, enclosed for
your possible amusement.

But it occurred to me that in every other case, where the annotation
> doesn't begin on the first character and it doesn't throw an exception, it
> might cause  downstream methods like doesSubsume to give the wrong result
> because the begin/end offsets are wrong.


One would think so, but interestingly enough, this does *not* seem to be
the case.  Everywhere I've checked (quite a few, over the past few years),
non-initial ContextAnnotation offsets look correct.

Workaround: a class that extends NegexAnnotator and adjusts the offsets at
the end of the process() method.

public class NegexAnnotator extends
org.apache.ctakes.ytex.uima.annotators.NegexAnnotator {
...

private void adjustContextOffsets(JCas jCas) {

String text = jCas.getDocumentText();

if (text == null) return;

Collection<ContextAnnotation> contexts = JCasUtil.select(jCas,
ContextAnnotation.class);

if (contexts == null || contexts.isEmpty()) return;

contexts.stream()

.filter(c -> c.getBegin() < 0)

.peek(c -> logger.debug("adjusting begin=" + c.getBegin()))

.forEach(c -> c.setBegin(0));

// don't know if this happens

int docTextLen = jCas.getDocumentText().length();

contexts.stream()

.filter(c -> c.getEnd() >= docTextLen)

.peek(c -> logger.debug("adjusting end=" + c.getEnd()))

.forEach(c -> c.setEnd(docTextLen - 1));

}




On Sun, Aug 30, 2020 at 5:35 PM Peter Abramowitsch <pabramowit...@gmail.com>
wrote:

> Hi,
> I was getting a StringIndexOutOfBoundsException in
> DependencyUtil.doesSubsume(annot1, annot2)  with exactly this situation:
>
> *negex annotator*
> *the text begins  "negative for <anything>"*
>
> If the chunk *negative for xyz *is preceded by anything else, even a
> space, the problem goes away.  It also goes away when you choose another
> style of negation.   "no headache", for instance
>
> I've traced the problem back to some illegal entries in the jCAS  You can
> see from the image below that the ContextAnnotation's begin offset is
> illegal.
>
> Clearly there's an off-by-one error and this triggered the exception
> because in my example, the Annotation is created right from the 0th char of
> my note text.  But it occurred to me that in every other case, where the
> annotation doesn't begin on the first character and it doesn't throw an
> exception, it might cause  downstream methods like doesSubsume to give the
> wrong result because the begin/end offsets are wrong.
>
> I'm not sure how to follow this up.  But if anyone wants to tackle it....?
>
> This is from HistoryAttributeClassifier beginning at line 274
>
> [image: image.png]
>
>
>
>

Reply via email to