On May 20, 2005, at 3:55 AM, Möckl Susanne wrote:
In my index are (for example) 4 documents which contains the word simone.
The problem is that lucene does not find all documents by some inputs.
To explain here some examples and what they return:
input: simone returns: 4 hits input: simon returns: 4 hits input: simo returns: 4 hits input: sim returns: 0 hits input: sim* returns: 4 hits input: simo* returns: 4 hits input: simon* returns: 0 hits input: simon? returns: 0 hits
The returns are not correct. Why does lucene find all results when writing simo without wildcard but no result when writing simon with wildcard.
What is lucene doing with the input? (I´m using GermanAnalyzer)
Have I made something wrong or is this a problem from lucene?
Thanks for specifying which Analyzer you're using. That is the key to this issue. GermanAnalyzer analyzes "simone", "simon", and "simo" to "simo", and "sim" is unchanged. So the first four queries you provided are explained in this manner (you're using GermanAnalyzer with QueryParser also, I presume).
QueryParser does not analyze wildcard queries and remember that "simo" is what was indexed, so "sim*" and "simo*" work. "simon*" and "simon?" do not because nothing was indexed with the prefix "simon".
Make sense?
So what's the solution? You'll need to tinker with the analyzer decision until you find one that works for you. The AnalyzerDemo program that comes with the source code to Lucene in Action will help. Download the source code from www.lucenebook.com, unzip it, and run "ant AnalyzerDemo". To help in my reply, I adjusted the AnalyzerDemo code locally adding the last items in each of the arrays like this:
private static final String[] examples = { "The quick brown fox jumped over the lazy dogs", "XY&Z Corporation - [EMAIL PROTECTED]", "simone simon simo sim" };
private static final Analyzer[] analyzers = new Analyzer[]{ new WhitespaceAnalyzer(), new SimpleAnalyzer(), new StopAnalyzer(), new StandardAnalyzer(), new GermanAnalyzer() };
and got this relevant output:
Analyzing "simone simon simo sim" WhitespaceAnalyzer: [simone] [simon] [simo] [sim]
SimpleAnalyzer: [simone] [simon] [simo] [sim]
StopAnalyzer: [simone] [simon] [simo] [sim]
StandardAnalyzer: [simone] [simon] [simo] [sim]
GermanAnalyzer: [simo] [simo] [simo] [sim]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]