[
https://issues.apache.org/jira/browse/SOLR-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015634#comment-13015634
]
Dawid Weiss commented on SOLR-2378:
-----------------------------------
The build time needs to sort the input again (and create it in the first
place). Because Lookup API assumes suggestion keywords can come from a variety
of sources there is no guarantee they will be sorted, so we need to sort them
before we can build the automaton.
Still, I think the numbers are acceptable... if you need on-line construction
of these suggestions you'll pick TST (it can add new keywords to the structure
dynamically); for a batch-load suggester you'd pick the FST one.
It is also very likely that I overlooked something that could bring those
numbers down, I'll create a clean patch tomorrow, so everything will be out
there for improving.
> FST-based Lookup (suggestions) for prefix matches.
> --------------------------------------------------
>
> Key: SOLR-2378
> URL: https://issues.apache.org/jira/browse/SOLR-2378
> Project: Solr
> Issue Type: New Feature
> Components: spellchecker
> Reporter: Dawid Weiss
> Assignee: Dawid Weiss
> Labels: lookup, prefix
> Fix For: 4.0
>
>
> Implement a subclass of Lookup based on finite state automata/ transducers
> (Lucene FST package). This issue is for implementing a relatively basic
> prefix matcher, we will handle infixes and other types of input matches
> gradually. Impl. phases:
> - write a DFA based suggester effectively identical to ternary tree based
> solution right now,
> - baseline benchmark against tern. tree (memory consumption, rebuilding
> speed, indexing speed; reuse Andrzej's benchmark code)
> - modify DFA to encode term weights directly in the automaton (optimize for
> onlyMostPopular case)
> - benchmark again
> - add infix suggestion support with prefix matches boosted higher (?)
> - benchmark again
> - modify the tutorial on the wiki [http://wiki.apache.org/solr/Suggester]
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]