>> >> We're only allowing expansions within an edit distance of 1, which should >> keep the numbers of terms down. > > Ahh, ok. So even if the term has two occurrences of cl, only one of > them is allowed to substitute d?
Yes, exactly - "cloocl" will be expanded to "doocl" and "clood" only. It's a pretty inaccurate way of searching anyway, and more expansions start leading to too many false positives without improving recall pretty quickly. > >>> For steps 2 and 3 you shouldn't use FST at all. Instead, for 2) use >>> BasicAutomata.makeString(String) on each of your expanded terms, then >>> BasicOperations.union on all of those automata to make a single >>> automaton accepting all your expanded terms, then likely call >>> .determinize() on the resulting automaton (maybe also .minimize() but >>> I think that may not help). Then pass that automaton to AQ. >> >> Excellent, thanks for your help. I'll give that a go. > > You might also try the DaciukMihovAutomatonBuilder class (it's in > lucene/test-framework): it builds a minimal deterministic automaton > from sorted terms, very quickly... you'd just have to pre-sort your > terms. Thanks, will have a look there too. > > Mike > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org