Lucene 4.0 tokenstream logic

zzT Thu, 11 Jul 2013 00:32:03 -0700

Hi all, 

I'm migrating from Lucene 3.6.1 to 4.3.1 and there seems to be a major
change in how analyzers work....
Given the code example below (which is almost copied from
http://lucene.apache.org/core/4_3_1/core/index.html)


@Test
public void testAnalysis() throws IOException {
        final String[] texts = {"demo", "TokenStream", "API"};
        CustomAnalyzer analyzer = new CustomAnalyzer(IndexLocale.ENGLISH,
false);

        for (String text : texts) {
            TokenStream stream = analyzer.tokenStream("field", new
StringReader(text));
            CharTermAttribute termAtt =
stream.addAttribute(CharTermAttribute.class);

            try {
                stream.reset();
                while (stream.incrementToken()) {
                    System.out.println("Token : " + termAtt.toString());
                }
                stream.end();
            } finally {
                stream.close();
            }           
        }
    }

The output is the following 
in 3.6.1 : 
Token : demo
Token : Tokenstream
Token : API

while in 
4.3.1 :
Token : demo

This is happening because of the ReuseStrategy that is now embedded  inside
Analyzer.TokenStream which caches the 1st token ("demo") and reuses this one
afterwards.

CustomAnalyzer is a custom analyzer :) and its implementation is irrelevant
to the question (apart from the fact that in 3.6.1 it overrides
tokenStream() while in 4.3.1 it overrides createComponents() ). I'm pretty
sure the same is happening with Lucene's analyzers too.

The question is : Do I need to change something in my logic to make it work
as in 3.6.1? The only way to get the same output is by initializing
CustomAnalyzer before calling tokenstream().



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Lucene-4-0-tokenstream-logic-tp4077203.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Lucene 4.0 tokenstream logic

Reply via email to