Re: reset versus setReader on TokenStream

Robert Muir Wed, 29 Aug 2012 12:38:29 -0700

ok, lets help improve it: I think these have likely always been confusing.

before they were both reset: reset() and reset(Reader), even though
they are unrelated. I thought the rename would help this :)

Does the TokenStream workfloat here help?
http://lucene.apache.org/core/4_0_0-BETA/core/org/apache/lucene/analysis/TokenStream.html
Basically reset() is a mandatory thing the consumer must call. it just
means 'reset any mutable state so you can be reused for processing
again'.
This is something on any TokenStream: Tokenizers, TokenFilters, or
even some direct descendent you make that parses byte arrays, or
whatever.

This means if you are keeping some state across tokens (like
stopfilter's #skippedTokens). here is where you would set that = 0
again.

setReader(Reader) is only on Tokenizer, it means replace the Reader
with a different one to be processed.
The fact that CharTokenizer is doing 'reset()-like-stuff' in here is
bogus IMO, but I dont think it will cause any bugs. Don't emulate it
:)

On Wed, Aug 29, 2012 at 3:29 PM, Benson Margulies <ben...@basistech.com> wrote:
> I've read the javadoc through a few times, but I confess that I'm still
> feeling dense.
>
> Are all tokenizers responsible for implementing some way of retaining the
> contents of their reader, so that a call to reset without a call to
> setReader rewinds? I note that CharTokenizer doesn't implement #reset,
> which leads me to suspect that I'm not responsible for the rewind behavior.

-- 
lucidworks.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: reset versus setReader on TokenStream

Reply via email to