Wonderful, glad to hear you got something working, and thanks for bringing closure.
Mike McCandless http://blog.mikemccandless.com On Sat, Mar 4, 2017 at 1:32 AM, Oliver Mannion <o.mann...@gmail.com> wrote: > Hi Dawid, > > Yes, the performance differences between Lucene's FST, morfologik's fsa > and the PatriciaTrie are rather small. > > I've managed to get something working well for my use-case. Thanks Dawid > and Michael for all your insight! > > On 20 February 2017 at 19:21, Dawid Weiss <dawid.we...@gmail.com> wrote: > >> > PatriciaTrie. In particular building an FST with doShareSuffix = false >> is >> > the fastest of any option, >> >> If you don't share the suffix then you are building a kind of Patricia >> trie... But suffix sharing is cheap and can give you a memory saving >> (and resulting cache locality sometimes) that is non-trivial to >> neglect. >> >> > and morfologik's fsa has the lowest heap usage of >> > all (both in terms of garbage and the final size of the FSA). >> >> I think these would be minuscule differences, really. Something not >> worth the effort of maintaining a compatibility with two libraries, >> for example. Lucene's FST are transducers, so they do have an >> additional benefit in that you can store something extra with each >> entry (this may be handy sometimes -- for example frequency >> information attached to each term, etc.). >> >> > to work with BytesRef) and it supports both prefix >> > (MatchResult.SEQUENCE_IS_A_PREFIX) and exact matching >> > (MatchResult.EXACT_MATCH). >> >> So does Lucene's FST, although at a lower level (you'd need to descend >> manually). >> >> > Given the efficiencies and functionality similarity of the FST vs >> Automation >> > in Lucene, I'm kind of curious as to why you would ever use Automaton? >> >> The Automaton class was ported from Brics library and it supports a >> much more generic class of automata, including optimizations from >> NFA->DFA, etc. The FST class is built directly from one specific kind >> of data (unique sorted keys on input) and it's heavily optimized >> towards this case. >> >> Dawid >> > >