Add fake charfilter to BaseTokenStreamTestCase to find offsets bugs
-------------------------------------------------------------------
Key: LUCENE-3717
URL: https://issues.apache.org/jira/browse/LUCENE-3717
Project: Lucene - Java
Issue Type: Task
Reporter: Robert Muir
Fix For: 3.6, 4.0
Attachments: LUCENE-3717.patch
Recently lots of issues have been fixed about broken offsets, but it would be
nice to improve the
test coverage and test that they work across the board (especially with
charfilters).
in BaseTokenStreamTestCase.checkRandomData, we can sometimes pass the analyzer
a reader wrapped
in a "MockCharFilter" (the one in the patch sometimes doubles characters). If
the analyzer does
not call correctOffsets or does incorrect "offset math" (LUCENE-3642, etc) then
eventually
this will create offsets and the test will fail.
Other than tests bugs, this found 2 real bugs: ICUTokenizer did not call
correctOffset() in its end(),
and ThaiWordFilter did incorrect offset math.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]