[
https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13687902#comment-13687902
]
Robert Muir commented on LUCENE-5030:
-------------------------------------
I dont think changing SEP_LABEL from a single byte to 4 bytes is necessarily a
good idea.
I think benchmarks (size and speed) should be run on this change before we jump
into it, I'm also concerned about the determinization and shit being in the
middle of an autosuggest request... this seems like it would be way way too
slow.
> FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work
> correctly for 1-byte (like English) and multi-byte (non-Latin) letters
> ------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: LUCENE-5030
> URL: https://issues.apache.org/jira/browse/LUCENE-5030
> Project: Lucene - Core
> Issue Type: Bug
> Affects Versions: 4.3
> Reporter: Artem Lukanin
> Attachments: nonlatin_fuzzySuggester1.patch,
> nonlatin_fuzzySuggester2.patch, nonlatin_fuzzySuggester3.patch,
> nonlatin_fuzzySuggester4.patch, nonlatin_fuzzySuggester.patch,
> nonlatin_fuzzySuggester.patch
>
>
> There is a limitation in the current FuzzySuggester implementation: it
> computes edits in UTF-8 space instead of Unicode character (code point)
> space.
> This should be fixable: we'd need to fix TokenStreamToAutomaton to work in
> Unicode character space, then fix FuzzySuggester to do the same steps that
> FuzzyQuery does: do the LevN expansion in Unicode character space, then
> convert that automaton to UTF-8, then intersect with the suggest FST.
> See the discussion here:
> http://lucene.472066.n3.nabble.com/minFuzzyLength-in-FuzzySuggester-behaves-differently-for-English-and-Russian-td4067018.html#none
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]