All,
I realize that we should be consuming all tokens from a stream. I'd like to
wrap a client's Analyzer with LimitTokenCountAnalyzer with consume=false. For
the analyzers that I've used, this has caused no problems. When I use
MockTokenizer, I run into this assertion error: "end() called before
incrementToken()". The comment in MockTokenizer reads:
// some tokenizers, such as limiting tokenizers, call end() before
incrementToken() returns false.
// these tests should disable this check (in general you should consume the
entire stream)
Disabling assertions gives me pause as does disobeying the workflow
(http://lucene.apache.org/core/4_5_1/core/index.html). I assume from the
warnings that there are Analyzers and use cases that will fail unless the
stream is entirely consumed.
Is there a safe way to wrap a client Analyzer and only read x number of
tokens? Should I allow the client to decide whether or not to consume?
Thank you!
Best,
Tim