In addition, in your first field you are using StringReader to feed in the data which can only be consumed once. This has nothing to do with TokenStream reuse.
----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -----Original Message----- > From: Uwe Schindler [mailto:u...@thetaphi.de] > Sent: Wednesday, February 27, 2013 8:03 PM > To: 'java-user@lucene.apache.org' > Subject: RE: Confusion with Analyzer.tokenStream() re-use in 4.1 > > The problem here is that the tokenstream is instantiated in the same thread > from 2 different code paths and consumed later. If you add fields, the > indexer will fetch a new reused TokenStream one after each other and > consume them directly after getting. It will not interleave this. In your > case, > the second field is instantiated using a TokenStream, which is already > initialized. Unfortunately, if you ask the analyzer for another TokenStream > later, the already opened one gets invalid (the second field). > > Don't use new Field(name, TokenStream) with TokenStreams from > Analyzers, because they are only "valid" for a very short time. If you need to > do this, use a second Analyzer instance. If you add Fields with a String > value, > the TokenStream is created on they fly and is be consumed by the > DocumentsWriter directly after getting it. > > Uwe > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > > -----Original Message----- > > From: Konstantyn Smirnov [mailto:inject...@yahoo.com] > > Sent: Wednesday, February 27, 2013 6:25 PM > > To: java-user@lucene.apache.org > > Subject: Confusion with Analyzer.tokenStream() re-use in 4.1 > > > > Dear all, > > > > I'm using the following test-code: > > > > Document doc = new Document() > > Analyzer a = new SimpleAnalyzer( Version.LUCENE_41 ) > > > > TokenStream inputTS = a.tokenStream( 'name1', new StringReader( 'aaa > > bbb ccc' ) ) Field f = new TextField( 'name1', inputTS ) doc.add f > > > > TokenStream ts = doc.getField( 'name1' ).tokenStreamValue() > > ts.reset() > > > > String sb = '' > > while( ts.incrementToken() ) sb += ts.getAttribute( CharTermAttribute ) + > '|' > > assert 'aaa|bbb|ccc|' == sb > > > > inputTS = a.tokenStream( 'name2', new StringReader( 'xxx zzz' ) ) f = > > new TextField( 'name2', inputTS ) doc.add f > > > > TokenStream ts = doc.getField( 'name2' ).tokenStreamValue() > > ts.reset() > > > > sb = '' > > while( ts.incrementToken() ) sb += ts.getAttribute( CharTermAttribute ) + > '|' > > assert 'xxx|zzz|' == sb // << FAILS! -> sb == '' and > > ts.incrementTokent() == false > > > > The 1st added field lets read it's tokentStreamValue() tokens, all > > subsequent calls bring nothing, unless I re-instantiate the analyzer. > > > > Another strange thing is, that just before adding a new field to the > > document, the tokenStream is filled.. > > > > What am I doing wrong? > > > > TIA > > > > > > > > > > -- > > View this message in context: > > http://lucene.472066.n3.nabble.com/Confusion-with-Analyzer- > > tokenStream-re-use-in-4-1-tp4043427.html > > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org