On Sat, Aug 28, 2010 at 8:34 AM, Alexander Korotkov <aekorot...@gmail.com> wrote: > Here is the patch which adds levenshtein_less_equal function. I'm going to > add it to current commitfest.
There are some minor stylistic issues with this patch - e.g. lines ending in whitespace, cuddled elses - but those don't look too terribly difficult to fix. I'm more concerned about the fact that I don't really understand the algorithm you're using. Actually, I didn't really understand the original algorithm either until I went and read up on it, and I just adjusted the comments to make it a bit more clear what it's doing. That caused some minor merge conflicts with your patch, so I'm attaching a rebased version that applies cleanly over my changes. Can you explain a bit more what algorithm this is using? It seems like in the max_d >= 0 case the cells of the notional array have a meaning which is completely different from what they mean in otherwise, and it's not clear to me from reading the comments what that meaning is. I took a look on that font of all human knowledge, Wikipedia: http://en.wikipedia.org/wiki/Levenshtein_distance Their suggestion for handling this case is: "If we are only interested in the distance if it is smaller than a threshold k, then it suffices to compute a diagonal stripe of width 2k+1 in the matrix. In this way, the algorithm can be run in O(kl) time, where l is the length of the shortest string." It seems like that may be similar to what you're doing here but I don't think that's exactly it. I don't think that exact thing would work in our case anyhow because we've got configurable costs. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
levenshtein_less_equal-0.2.patch
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers