[ 
https://issues.apache.org/jira/browse/LUCENE-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13233996#comment-13233996
 ] 

Robert Muir commented on LUCENE-3894:
-------------------------------------

I think we have bugs in some tokenizers. Its currently very hard to reproduce 
and we get no random seed :(

I think the issue is the maxWordLength=20. This is not long enough to catch 
bugs in tokenizers I think,
we should exceed whatever buffersize they use for example.

So I think we need to refactor this logic so that the multithreaded tests take 
maxWordLength, and ensure
this parameter is always respected.

This way, tests for things like tokenizers can bump this up to things like 
CharTokenizer.IO_BUFFER_SIZE*2
or whatever makes sense to them, to ensure we really test them well.

I don't like the fact that only my stupid trivial test (testHugeDoc) found the 
IO-311 bug, what if we
didn't have that silly test? 

I'll add a patch.
                
> Make BaseTokenStreamTestCase a bit more evil
> --------------------------------------------
>
>                 Key: LUCENE-3894
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3894
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.6, 4.0
>
>         Attachments: LUCENE-3894.patch, LUCENE-3894.patch, LUCENE-3894.patch
>
>
> Throw an exception from the Reader while tokenizing, stop after not consuming 
> all tokens, sometimes spoon-feed chars from the reader...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to