I am currently working on a refactoring of FSTLookup so that either one or both of your objectives will be possible.
I would still argue that storing exact scores does not make much sense (think: if you collect query logs then you probably won't differentiate between two suggestions that differ by two or three hits if their count is in millions). The order of magnitude matters, not exact numbers. Bucketing is not only a way to speed up collection (although it is a very good way to speed it up!), it is also a way to abstract "classes" of suggestions -- think of buckets as classes corresponding to "frequent", "less frequent", "even less frequent", etc. As for suggesting something else than the input suggestion this can be done even now: when you're building FSTLookup, pass a string that is a concatenation of what you expect as a prefix and a full completion, for example: bush|george bush flower|plant if you ask for suggestions for "geor" then the results will contain full string, you only need to post-process. The mechanism of using the automaton is identical, details change. Dawid On Wed, Nov 16, 2011 at 7:00 PM, Sudarshan Gaikaiwari <sudars...@acm.org>wrote: > Hi > > I am trying to implement an auto complete suggest system using FST. > For my use case I cannot use FSTLookup for the following reasons. > > 1. I cannot construct the display string using the arc labels like > FSTLookup as the display strings for autocompletion are different from the > strings used as prefixes. > 2. I am computing the scores for the suggestions by analyzing logs and do > not want to put scores into a few buckets. > > > Is there a way to get all the outputs from an FST for a particular prefix? > I have been looking at the code for FST and FSTEnum but have not found a > method that provides this functionality. > > Thanks > Sudarshan > > > -- > Sudarshan Gaikaiwari > www.sudarshan.org > sudars...@acm.org >