Re: term offsets wrong depending on analyzer

2008-11-11 Thread Michael McCandless
Just to followup... I opened these three issues: https://issues.apache.org/jira/browse/LUCENE-1441 (fixed in 2.9) https://issues.apache.org/jira/browse/LUCENE-1442 (fixed in 2.9) https://issues.apache.org/jira/browse/LUCENE-1448 (still iterating) Mike Christian Reuschling wrote: Hi Guy

Re: term offsets wrong depending on analyzer

2008-11-07 Thread Michael McCandless
Thanks for raising these! For the 1st issue (KeywordTokenizer fails to set start/end offset on its token), I think we add your two lines to fix it. I'll open an issue for this. The 2nd issue (if same field name has more than one NOT_ANALYZED instance in a doc then the offsets are double counted

term offsets wrong depending on analyzer

2008-11-07 Thread Christian Reuschling
Hi Guys, I currently have a bug of wrong term offset values for fields analyzed with KeywordAnalyzer (and also unanalyzed fields, whereby I assume that the code may be the same) The offset of a field seems to be incremented by the entry length of the previously analyzed field. I had a look into