Re: Implicit Stopping in StandardTokenizer??

Erik Hatcher Mon, 20 Jun 2005 11:54:16 -0700


On Jun 20, 2005, at 11:48 AM, Max Pfingsthorn wrote:

import java.io.Reader;


import org.apache.lucene.analysis.standard.StandardTokenizer;
import org.apache.lucene.analysis.standard.StandardFilter;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.LowerCaseFilter;
import org.apache.lucene.analysis.StopAnalyzer;
import org.apache.lucene.analysis.StopFilter;
import org.apache.lucene.analysis.TokenStream;

public class SimpleStandardAnalyzer extends Analyzer {
  public SimpleStandardAnalyzer() {
  }

  public TokenStream tokenStream(String fieldName, Reader reader) {
    return new LowerCaseFilter(
      new StandardFilter(
        new StandardTokenizer(reader)
      )
    );
  }
}

That looks fine. Stop words will not (well, "should not" it appearsto you!) be removed from that analyzer.

I tried this too, but still the same effect. "This" and "is", etc,get filtered out even with no stopwords set. Any ideas?

The only idea I have is that you're running StandardAnalyzer and notthis custom one by something amiss in your indirect configuration.

For example, I typed in "this is" to the AnalyzerDemo that ships withthe Lucene in Action source code (get it from http://www.lucenebook.com) after modifying AnalyzerDemo to construct aStandardAnalyzer like this:


    new StandardAnalyzer(new String[] {})

And got this output:

$ ant AnalyzerDemo

AnalyzerDemo:
    [input] String to analyze: [This string will be analyzed.]
this is
     [echo] Running lia.analysis.AnalyzerDemo...
     [java] Analyzing "this is"
     [java]   WhitespaceAnalyzer:
     [java]     [this] [is]

     [java]   SimpleAnalyzer:
     [java]     [this] [is]

     [java]   StopAnalyzer:
     [java]

     [java]   StandardAnalyzer:
     [java]     [this] [is]

     [java]   SnowballAnalyzer:
     [java]     [this] [is]

     [java]   SnowballAnalyzer:
     [java]     [this] [is]

     [java]   SnowballAnalyzer:
     [java]     [thi] [i]

As you can see [this] and [is] made it fine through StandardAnalyzer.

    Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Implicit Stopping in StandardTokenizer??

Reply via email to