On Tue, Feb 28, 2012 at 8:42 AM, Alan Woodward
<alan.woodw...@romseysoftware.co.uk> wrote:
>
> On 28 Feb 2012, at 13:31, Michael McCandless wrote:
>
>> Neat :)  It's like a FuzzyQuery w/ a custom (binary?) cost matrix for
>> the insert/delete/transposition changes...
>>
>> Is the number of edits smallish?  Ie you're not concerned about
>> combinatoric explosion of step 1?
>
> We're only allowing expansions within an edit distance of 1, which should 
> keep the numbers of terms down.

Ahh, ok.  So even if the term has two occurrences of cl, only one of
them is allowed to substitute d?

>> For steps 2 and 3 you shouldn't use FST at all.  Instead, for 2) use
>> BasicAutomata.makeString(String) on each of your expanded terms, then
>> BasicOperations.union on all of those automata to make a single
>> automaton accepting all your expanded terms, then likely call
>> .determinize() on the resulting automaton (maybe also .minimize() but
>> I think that may not help).  Then pass that automaton to AQ.
>
> Excellent, thanks for your help.  I'll give that a go.

You might also try the DaciukMihovAutomatonBuilder class (it's in
lucene/test-framework): it builds a minimal deterministic automaton
from sorted terms, very quickly... you'd just have to pre-sort your
terms.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to