[ 
https://issues.apache.org/jira/browse/LUCENE-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingkei Ly updated LUCENE-2557:
-------------------------------

    Attachment: idf-scoring-test-case.patch

I've attached a test case which demonstrates some of the scoring issues (the 
patch applies to the existing TestFuzzyQuery class). With the default 
FuzzyQuery, the fuzzy terms "joness" and "smiith" get promoted to the top of 
the search results because they have higher IDFs than the exact matches.

If you modify the test so that the FuzzyQuerys use 
TopTermsBoostOnlyBooleanQueryRewrite, i.e. uncomment these lines in the test 
case:
{code}
smithQuery.setRewriteMethod(new 
MultiTermQuery.TopTermsBoostOnlyBooleanQueryRewrite());
jonesQuery.setRewriteMethod(new 
MultiTermQuery.TopTermsBoostOnlyBooleanQueryRewrite());
{code}

The fuzzy terms are correctly relegated to the bottom of the search results 
but, because IDF is ignored, "jones" appears more highly scored than "smith" 
even though "smith" is the rarer term.

Ideally the solution should solve both these issues.

> FuzzyQuery - fuzzy terms and misspellings are ranked higher than exact matches
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-2557
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2557
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 3.0.2
>            Reporter: Jingkei Ly
>         Attachments: idf-scoring-test-case.patch
>
>
> The FuzzyQuery often causes misspellings to be ranked higher than the exact 
> match, which seems to be an undesirable property generally. 
> For example, in an index of surnames, if I search using a FuzzyQuery for 
> "smith", the misspellings such as "smiith", or "smiht" would appear near the 
> top of the search results ahead of documents that match "smith".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to