On Jun 20, 2005, at 11:48 AM, Max Pfingsthorn wrote:
import java.io.Reader;
import org.apache.lucene.analysis.standard.StandardTokenizer;
import org.apache.lucene.analysis.standard.StandardFilter;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.LowerCaseFilter;
import org.apache.lucene.analysis.StopAnalyzer;
import org.apache.lucene.analysis.StopFilter;
import org.apache.lucene.analysis.TokenStream;
public class SimpleStandardAnalyzer extends Analyzer {
public SimpleStandardAnalyzer() {
}
public TokenStream tokenStream(String fieldName, Reader reader) {
return new LowerCaseFilter(
new StandardFilter(
new StandardTokenizer(reader)
)
);
}
}
That looks fine. Stop words will not (well, "should not" it appears
to you!) be removed from that analyzer.
I tried this too, but still the same effect. "This" and "is", etc,
get filtered out even with no stopwords set. Any ideas?
The only idea I have is that you're running StandardAnalyzer and not
this custom one by something amiss in your indirect configuration.
For example, I typed in "this is" to the AnalyzerDemo that ships with
the Lucene in Action source code (get it from http://
www.lucenebook.com) after modifying AnalyzerDemo to construct a
StandardAnalyzer like this:
new StandardAnalyzer(new String[] {})
And got this output:
$ ant AnalyzerDemo
AnalyzerDemo:
[input] String to analyze: [This string will be analyzed.]
this is
[echo] Running lia.analysis.AnalyzerDemo...
[java] Analyzing "this is"
[java] WhitespaceAnalyzer:
[java] [this] [is]
[java] SimpleAnalyzer:
[java] [this] [is]
[java] StopAnalyzer:
[java]
[java] StandardAnalyzer:
[java] [this] [is]
[java] SnowballAnalyzer:
[java] [this] [is]
[java] SnowballAnalyzer:
[java] [this] [is]
[java] SnowballAnalyzer:
[java] [thi] [i]
As you can see [this] and [is] made it fine through StandardAnalyzer.
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]