ok, lets help improve it: I think these have likely always been confusing. before they were both reset: reset() and reset(Reader), even though they are unrelated. I thought the rename would help this :)
Does the TokenStream workfloat here help? http://lucene.apache.org/core/4_0_0-BETA/core/org/apache/lucene/analysis/TokenStream.html Basically reset() is a mandatory thing the consumer must call. it just means 'reset any mutable state so you can be reused for processing again'. This is something on any TokenStream: Tokenizers, TokenFilters, or even some direct descendent you make that parses byte arrays, or whatever. This means if you are keeping some state across tokens (like stopfilter's #skippedTokens). here is where you would set that = 0 again. setReader(Reader) is only on Tokenizer, it means replace the Reader with a different one to be processed. The fact that CharTokenizer is doing 'reset()-like-stuff' in here is bogus IMO, but I dont think it will cause any bugs. Don't emulate it :) On Wed, Aug 29, 2012 at 3:29 PM, Benson Margulies <ben...@basistech.com> wrote: > I've read the javadoc through a few times, but I confess that I'm still > feeling dense. > > Are all tokenizers responsible for implementing some way of retaining the > contents of their reader, so that a call to reset without a call to > setReader rewinds? I note that CharTokenizer doesn't implement #reset, > which leads me to suspect that I'm not responsible for the rewind behavior. -- lucidworks.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org