[
https://issues.apache.org/jira/browse/LUCENE-5469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13910767#comment-13910767
]
Tim Allison commented on LUCENE-5469:
-------------------------------------
Sorry about this. My fix was based on the idea that 80% of 5 should be the
equivalent of an edit distance of 1, but you are absolutely right. This
behavior is entirely consistent with 3.5.
I tested a 10 letter word (salmonella) on noisy ocr'd data
4.x
~2 110 variants
~1 37 variants
0.9 no variants
0.89 34 variants
0.88 37 variants
0.80 37 variants
0.79 94 variants
0.78 94 variants
0.77 108 variants
0.74 110 variants
3.5
0.9 no variants
0.89 34 variants
0.88 37 variants
0.80 37 variants
0.79 94 variants
0.78 94 variants
0.77 108 variants
0.74 110 variants
> Add small rounding to FuzzyQuery.floatToEdits
> ---------------------------------------------
>
> Key: LUCENE-5469
> URL: https://issues.apache.org/jira/browse/LUCENE-5469
> Project: Lucene - Core
> Issue Type: Bug
> Components: core/search
> Affects Versions: 5.0
> Reporter: Tim Allison
> Priority: Trivial
> Labels: easyfix
> Attachments: LUCENE-5469.patch
>
>
> I realize that FuzzyQuery.floatToEdits is deprecated, but I'd like to make a
> small fix for posterity. Because of floating point issues, if a percentage
> leads to a number that is very close to a whole number of edits, our cast to
> int can improperly cause misses.
> ddddd~0.8 will not match "ddddX"
> eeeee~0.8 will not match "eeee" or "eeeeee"
> This is a trivial part of the plan to reduce code duplication with
> LUCENE-5205.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]