BBands wrote: > Diez B. Roggisch wrote: > > I did a levenshtein-fuzzy-search myself, however I enhanced my version by > > normalizing the distance the following way: > > Thanks for the snippet. I agree that normalizing is important. A > distance of three is one thing when your strings are long, but quite > another when they are short. I'd been thinking about something along > these lines myself, but hadn't gotten there yet. It'll be interesting > to have a look at the distribution of the normalized numbers, I'd guess > that there may be a rough threshold that effectively separates the > wheat from the chaff. > > jab
i noticed this guy, who's quite a good ruby developer spent some time on distances: http://ruby.brian-schroeder.de/editierdistanz/ and also look at soundex, other algorithms (Double Metaphone, NYSIIS, Phonex, I have notes to investigate but I haven't looked at them myhself) -- http://mail.python.org/mailman/listinfo/python-list