Re: Levenshtein FST's?

2016-05-28 Thread Luke Nezda
s, > > Xmx3g OK) > > > > On Thu, May 26, 2016 at 12:13 PM, Michael McCandless < > > luc...@mikemccandless.com> wrote: > > > >> But how many states does the not-yet-determinized union of 5000+ > >> Levenshtein automata contain? > >>

Re: Levenshtein FST's?

2016-05-27 Thread Luke Nezda
any states does the not-yet-determinized union of 5000+ > Levenshtein automata contain? > > Mike McCandless > > http://blog.mikemccandless.com > > On Thu, May 26, 2016 at 12:08 PM, Luke Nezda wrote: > > > I should note, I know in I can > > call Oper

Re: Levenshtein FST's?

2016-05-26 Thread Luke Nezda
I should note, I know in I can call Operations.determinize(union, 10_000_000) but union of 5000+ Levenshtein automata seems to require too many states to be tractable, and that's on the low end of what I'd like to work with. On Thu, May 26, 2016 at 9:59 AM, Luke Nezda wrote: > I

Re: Levenshtein FST's?

2016-05-26 Thread Luke Nezda
yeah, converting THAT to an FST is tricky... > > Mike McCandless > > http://blog.mikemccandless.com > > On Wed, May 25, 2016 at 2:46 PM, Luke Nezda wrote: > > > Oof, sounds too tricky for me to justify pursuing right now. While > > union'ing 10k Levenshtein

Re: Levenshtein FST's?

2016-05-25 Thread Luke Nezda
shtein > automaton for that word, and recording the first arcs you hit that has one > unique original word as its output, and placing outputs on those arcs, and > then doing a "rote" conversion to the syn filter's FST format. This part > sounds tricky :) > > Mi

Levenshtein FST's?

2016-05-24 Thread Luke Nezda
the match character offsets of each match in each document. > > Mike McCandless > > http://blog.mikemccandless.com > > On Mon, May 23, 2016 at 8:59 PM, Luke Nezda wrote: > > > Hello, all - > > > > I'd like to use Lucene's automaton/FST code to

Levenshtein FST's?

2016-05-23 Thread Luke Nezda
Hello, all - I'd like to use Lucene's automaton/FST code to achieve fast fuzzy (OSA edit distance up to 2) search for many (10k+) strings (knowledge base: kb) in many large strings (docs). Approach I was thinking of: create Levenshtein FST with all paths associated with unedited form for each kb

Re: ApacheCon next week

2005-12-11 Thread Luke Nezda
ge - > From: "Luke Nezda" <[EMAIL PROTECTED]> > To: > Sent: Sunday, December 11, 2005 6:28 PM > Subject: Re: ApacheCon next week > > > Hello Grant- > Could you post the material you present (eg slides, handouts, etc) for > those > of us who cann

Re: ApacheCon next week

2005-12-11 Thread Luke Nezda
Hello Grant- Could you post the material you present (eg slides, handouts, etc) for those of us who cannot attend? Thanks in advance, -Luke On 12/9/05, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > > Any one planning on going to ApacheCon next week? I will be giving a > talk on Lucene on Monday af