In article <mailman.1821.1383156703.18130.python-l...@python.org>, Michael Torrie <torr...@gmail.com> wrote:
> On 10/30/2013 10:08 AM, wxjmfa...@gmail.com wrote: > > My comment had nothing to do with Python, it was a > > general comment. A diacritical mark just makes a letter > > a different letter; a "ï " and a "i" are "as > > diferent" as a "a" from a "z". A diacritical mark > > is more than a simple ornementation. > > That's nice, but you didn't actually read what Ned said (or the OP). > The OP doesn't care that "ï " and a "i" are as different as "a" and "z". > For the purposes of his search he wants them treated as the same > letter. A fuzzy searching treats them all the same. That's one definition of fuzzy. But, there's nothing that says you can't build a fuzzy matching algorithm which considers some mismatches to be worse than others. For example, it's reasonable to consider any vowel (or string of vowels, for that matter) to be closer to another vowel than to a consonant. A great example is the word, "bureaucrat". As far as I'm concerned, it's spelled {b, vowels, r, vowels, c, r, a, t}. It usually takes me three or four tries to get auto-correct to even recognize what I'm trying to type and fix it for me. Likewise for pairs like {c, s}, {j, g}, {v, w}, and so on. In that spirit, I would think that a, á, and â would all be considered more conservative replacements for each other than they would be for k, x, or z.
-- https://mail.python.org/mailman/listinfo/python-list