Re: TokenStreamComponents in Lucene 4.0

2012-11-20 Thread Robert Muir
On Tue, Nov 20, 2012 at 6:26 AM, Carsten Schnober wrote: > > Thanks, Uwe! > I think what changed in comparison to Lucene 3.6 is that reset() is > called upon initialization, too, instead of after processing the first > document only, right? There is no such change: this step was always mandator

Re: TokenStreamComponents in Lucene 4.0

2012-11-20 Thread Carsten Schnober
Am 20.11.2012 10:22, schrieb Uwe Schindler: Hi, > The createComponents() method of Analyzers is only called *once* for each > thread and the Tokenstream is *reused* for later documents. The Analyzer will > call the final method Tokenizer#setReader() to notify the Tokenizer of a new > Reader (t

RE: TokenStreamComponents in Lucene 4.0

2012-11-20 Thread Uwe Schindler
er H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Carsten Schnober [mailto:schno...@ids-mannheim.de] > Sent: Tuesday, November 20, 2012 10:15 AM > To: java-user@lucene.apache.org > Subject: Re: TokenStr

Re: TokenStreamComponents in Lucene 4.0

2012-11-20 Thread Carsten Schnober
Am 19.11.2012 17:44, schrieb Carsten Schnober: Hi, > However, after switching to Lucene 4 and TokenStreamComponents, I'm > getting a strange behaviour: only the first document in the collection > is tokenized properly. The others do appear in the index, but > un-tokenized, although I have tried n

Re: TokenStreamComponents in Lucene 4.0

2012-11-19 Thread Carsten Schnober
Am 19.11.2012 17:44, schrieb Carsten Schnober: Hi again, just a little update: > However, after switching to Lucene 4 and TokenStreamComponents, I'm > getting a strange behaviour: only the first document in the collection > is tokenized properly. The others do appear in the index, but > un-tokeni