> I think I see this now, and how skipping determinization and matching with > the NFA could easily leave you with an intractable amount of backtracking > for even the simpler binary question of does my input match any of the > automatons I've unioned.
Note that with NFAs you may answer the question of whether the input matches without completing all the partial NFA matching states. So it may be a viable solution, actually. Much depends on your particular problem. > Ok, yea I could see these being the same ones whose union is not too large > to determinize. The difference is that you can create a minimal deterministic automaton for a set of (sorted) input strings in O(n) -- it is really, really fast and memory friendly. You can also fairly trivially enumerate (and store) the set of unique strings accepted by automata. So this is, again, something to simply try in real life -- could be a way to build a minimal automaton out of a large number of smaller automata. > re2 does look neat for matching regexes which can be expressed as DFAs. I actually wanted to point you at the papers published by Russ Cox, not the implementation itself. See: https://swtch.com/~rsc/regexp/ > I think I see now how determinizing the union of (Levenshtein) DFAs likely > has no "algorithmic shortcut" I'll be able to devise :) My reply #3 was only partially a joke. There are no naive questions and the problem may have a fast algorithmic solution (I don't know of one though). If this is of interest to you, and it's definitely a nice research topic, I'd contact somebody in the academia who is rooted deeper in automata theory. Dawid --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org