The problem ist he following:
The docFreq of the term "lucéne" is 2, all other terms have 1 (because 
StandardAnalyzer lowercases everything). What happens is, that terms with lower 
docFreq get a higher score in TermQuery. This score overweighs the boosting 
done by FuzzyQuery (because you index is so small).

If you raise the minSimilarity a little bit, your query matches less terms and 
the rewritten BooleanQuery contains less clauses. At some point the score 
overweigh of the less frequent terms is no longer relevant for the final score. 

By the way, you can always look at the explain() results which informs you 
about the scoring done.

The fix is (applies only to trunk, see issue 
https://issues.apache.org/jira/browse/LUCENE-124) to ignore scoring of the 
TermQueries generated by Fuzzy and only look at the edit distance (implemented 
by another MTQ.RewriteMode), that can be set with FuzzyQuery.setRewriteMode().

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -----Original Message-----
> From: stefcl [mailto:stefatw...@gmail.com]
> Sent: Tuesday, February 16, 2010 10:11 AM
> To: java-user@lucene.apache.org
> Subject: Re: Strange Fuzzyquery results scoring when using a low
> minimal distance
> 
> 
> Thanksa lot,
> But I still don't understand why raising a little bit the min
> similarity
> change the ordering...
> 
> 
> 
> markharw00d wrote:
> >
> > This could be down to IDF ie "Lucane" is ranked higher because it is
> rarer
> > despite having worse edit distance.
> > This is arguably a bug.
> > See http://issues.apache.org/jira/browse/LUCENE-329 which discusses
> this.
> > You could try subclass QueryParser and override newFuzzyQuery to
> return
> > FuzzyLikeThisQuery (found in "contrib/queries")
> >
> > Cheers
> > Mark
> >
> >
> >
> > ----- Original Message ----
> > From: stefcl <stefatw...@gmail.com>
> > To: java-user@lucene.apache.org
> > Sent: Mon, 15 February, 2010 14:13:52
> > Subject: Strange Fuzzyquery results scoring when using a low minimal
> > distance
> >
> >
> > Hello,
> >
> > I'm using Lucene v3.
> > Please consider the following spellings
> >
> > Lucene
> > Lucéne
> > lucéne
> > Lucane
> > Lucen
> >
> > When searching for "lucéne" among those words using a FuzzyQuery
> (with 0.5
> > edit distance), results show :
> >
> > 1. Lucene 1.0259752
> > 2. Lucane 1.0259752
> > 3. Lucéne 0.95660806
> > 4. lucéne 0.95660806
> > 5. Lucen 0.30779266
> >
> > #4 is an exact match, why does it receive a lower score than "Lucane"
> > which
> > contains one incorrect letter?
> >
> > Also, if you raise min similarity a bit higher (0.6 of above),
> everything
> > becomes normal :
> >
> > 1. Lucéne 1.0438477
> > 2. lucéne 1.0438477
> > 3. Lucene 0.97959816
> > 4. Lucane 0.97959816
> >
> >
> > Any idea?
> > Thanks in advance...
> >
> >
> > The code I use :
> >
> >    /**
> >      * @param args the command line arguments
> >      */
> >     public static void main(String[] args) throws IOException,
> > ParseException
> >     {
> >
> >         StandardAnalyzer analyzer = new
> > StandardAnalyzer(Version.LUCENE_CURRENT);
> >
> >         // TODO code application logic here
> >         Directory index = new RAMDirectory();
> >         IndexWriter w = new IndexWriter(index, analyzer, true,
> > IndexWriter.MaxFieldLength.UNLIMITED);
> >
> >         addDoc(w, "Lucene");
> >         addDoc(w, "Lucéne");
> >         addDoc(w, "lucéne");
> >         addDoc(w, "Lucane");
> >         addDoc(w, "Lucen");
> >
> >         w.close();
> >
> >         FuzzyQuery q =  new FuzzyQuery( new Term("title", "lucéne") ,
> 0.5f
> > );
> >
> >         // 3. search
> >         IndexSearcher searcher = new IndexSearcher(index);
> >
> >         TopDocs collector = searcher.search(q, 10);
> >         ScoreDoc[] hits = collector.scoreDocs;
> >
> >         // 4. display results
> >         System.out.println("Found " + hits.length + " hits.");
> >         for(int i = 0 ; i < hits.length; i++)
> >         {
> >               Document d = searcher.doc(hits[i].doc);
> >               System.out.println((i + 1) + ". " + d.get("title") + "
> " +
> > hits[i].score );
> >         }
> >
> >         // searcher can only be closed when there
> >         // is no need to access the documents any more.
> >         searcher.close();
> >     }
> >
> >
> >     private static void addDoc(IndexWriter w, String value) throws
> > IOException
> >     {
> >         Document doc = new Document();
> >         doc.add(new Field("title", value, Field.Store.YES,
> > Field.Index.ANALYZED));
> >         w.addDocument(doc);
> >     }
> > --
> > View this message in context:
> > http://old.nabble.com/Strange-Fuzzyquery-results-scoring-when-using-
> a-low-minimal-distance-tp27594371p27594371.html
> > Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >
> >
> 
> --
> View this message in context: http://old.nabble.com/Strange-Fuzzyquery-
> results-scoring-when-using-a-low-minimal-distance-
> tp27594371p27605395.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to