[ 
https://issues.apache.org/jira/browse/LUCENE-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingkei Ly updated LUCENE-2557:
-------------------------------

    Attachment: LUCENE-2557.patch

I've had a crack at implementing a fix, based on suggestions in LUCENE-329. It 
takes the IDF of the term used in the FuzzyQuery if it exists in the index and 
uses that as the IDF. If the term is not in the index it uses the average IDF 
of all the terms.

It is implemented as a rewrite method similar to 
TopTermsBoostOnlyBooleanQueryRewrite from LUCENE-124, although it required 
modifying TopTermsBooleanQueryRewrite a little bit.

> FuzzyQuery - fuzzy terms and misspellings are ranked higher than exact matches
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-2557
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2557
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 3.0.2
>            Reporter: Jingkei Ly
>         Attachments: idf-scoring-test-case.patch, LUCENE-2557.patch
>
>
> The FuzzyQuery often causes misspellings to be ranked higher than the exact 
> match, which seems to be an undesirable property generally. 
> For example, in an index of surnames, if I search using a FuzzyQuery for 
> "smith", the misspellings such as "smiith", or "smiht" would appear near the 
> top of the search results ahead of documents that match "smith".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to