Add fake charfilter to BaseTokenStreamTestCase to find offsets bugs
-------------------------------------------------------------------

                 Key: LUCENE-3717
                 URL: https://issues.apache.org/jira/browse/LUCENE-3717
             Project: Lucene - Java
          Issue Type: Task
            Reporter: Robert Muir
             Fix For: 3.6, 4.0
         Attachments: LUCENE-3717.patch

Recently lots of issues have been fixed about broken offsets, but it would be 
nice to improve the
test coverage and test that they work across the board (especially with 
charfilters).

in BaseTokenStreamTestCase.checkRandomData, we can sometimes pass the analyzer 
a reader wrapped
in a "MockCharFilter" (the one in the patch sometimes doubles characters). If 
the analyzer does
not call correctOffsets or does incorrect "offset math" (LUCENE-3642, etc) then 
eventually
this will create offsets and the test will fail.

Other than tests bugs, this found 2 real bugs: ICUTokenizer did not call 
correctOffset() in its end(),
and ThaiWordFilter did incorrect offset math.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to