>> 
>> We're only allowing expansions within an edit distance of 1, which should 
>> keep the numbers of terms down.
> 
> Ahh, ok.  So even if the term has two occurrences of cl, only one of
> them is allowed to substitute d?

Yes, exactly - "cloocl" will be expanded to "doocl" and "clood" only.  It's a 
pretty inaccurate way of searching anyway, and more expansions start leading to 
too many false positives without improving recall pretty quickly.

> 
>>> For steps 2 and 3 you shouldn't use FST at all.  Instead, for 2) use
>>> BasicAutomata.makeString(String) on each of your expanded terms, then
>>> BasicOperations.union on all of those automata to make a single
>>> automaton accepting all your expanded terms, then likely call
>>> .determinize() on the resulting automaton (maybe also .minimize() but
>>> I think that may not help).  Then pass that automaton to AQ.
>> 
>> Excellent, thanks for your help.  I'll give that a go.
> 
> You might also try the DaciukMihovAutomatonBuilder class (it's in
> lucene/test-framework): it builds a minimal deterministic automaton
> from sorted terms, very quickly... you'd just have to pre-sort your
> terms.

Thanks, will have a look there too.

> 
> Mike
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to