For the record of other people who implement tokenizers: Say that your tokenizer has a constructor, like:
public MyTokenizer(Reader reader, ....) { super(reader); myWrappedInputDevice = new MyWrappedInputDevice(reader); } Not a good idea. Tokenizer carefully manages the data flow from the constructor arg to the 'input' field. The correct form is: public MyTokenizer(Reader reader, ....) { super(reader); myWrappedInputDevice = new MyWrappedInputDevice(this.input); } On Tue, Jan 7, 2014 at 2:59 PM, Robert Muir <rcm...@gmail.com> wrote: > See Tokenizer.java for the state machine logic. In general you should > not have to do anything if the tokenizer is well-behaved (e.g. close > calls super.close() and so on). > > > > On Tue, Jan 7, 2014 at 2:50 PM, Benson Margulies <bimargul...@gmail.com> > wrote: > > In 4.6.0, > org.apache.lucene.analysis.BaseTokenStreamTestCase#checkResetException > > > > fails if incrementToken fails to throw if there's a missing reset. > > > > How am I supposed to organize this in a Tokenizer? A quick look at > > CharTokenizer did not reveal any code for the purpose. > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >