Just to followup... I opened these three issues:
https://issues.apache.org/jira/browse/LUCENE-1441 (fixed in 2.9)
https://issues.apache.org/jira/browse/LUCENE-1442 (fixed in 2.9)
https://issues.apache.org/jira/browse/LUCENE-1448 (still iterating)
Mike
Christian Reuschling wrote:
Hi Guy
Thanks for raising these!
For the 1st issue (KeywordTokenizer fails to set start/end offset on
its token), I think we add your two lines to fix it. I'll open an
issue for this.
The 2nd issue (if same field name has more than one NOT_ANALYZED
instance in a doc then the offsets are double counted
Hi Guys,
I currently have a bug of wrong term offset values for fields analyzed
with KeywordAnalyzer (and also unanalyzed fields, whereby I assume that
the code may be the same)
The offset of a field seems to be incremented by the entry length of the
previously analyzed field.
I had a look into