Hi all, I'm migrating from Lucene 3.6.1 to 4.3.1 and there seems to be a major change in how analyzers work.... Given the code example below (which is almost copied from http://lucene.apache.org/core/4_3_1/core/index.html)
@Test public void testAnalysis() throws IOException { final String[] texts = {"demo", "TokenStream", "API"}; CustomAnalyzer analyzer = new CustomAnalyzer(IndexLocale.ENGLISH, false); for (String text : texts) { TokenStream stream = analyzer.tokenStream("field", new StringReader(text)); CharTermAttribute termAtt = stream.addAttribute(CharTermAttribute.class); try { stream.reset(); while (stream.incrementToken()) { System.out.println("Token : " + termAtt.toString()); } stream.end(); } finally { stream.close(); } } } The output is the following in 3.6.1 : Token : demo Token : Tokenstream Token : API while in 4.3.1 : Token : demo This is happening because of the ReuseStrategy that is now embedded inside Analyzer.TokenStream which caches the 1st token ("demo") and reuses this one afterwards. CustomAnalyzer is a custom analyzer :) and its implementation is irrelevant to the question (apart from the fact that in 3.6.1 it overrides tokenStream() while in 4.3.1 it overrides createComponents() ). I'm pretty sure the same is happening with Lucene's analyzers too. The question is : Do I need to change something in my logic to make it work as in 3.6.1? The only way to get the same output is by initializing CustomAnalyzer before calling tokenstream(). -- View this message in context: http://lucene.472066.n3.nabble.com/Lucene-4-0-tokenstream-logic-tp4077203.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org