or Washington / District of Columbia. my point is i wouldnt do anything complicated and slow if you can get away with analyzers/phonetics and maybe some synonyms, or maybe even just the spellchecker. theres always crazy cases none of these algorithms will work for.
On Mon, Jul 26, 2010 at 2:09 PM, Walter Underwood <[email protected]>wrote: > Try mixing colonial influences with Native American names. > > When my parents moved to Baton Rouge, LA years ago, they got a > recommendation for a "Dr. Kyto". They couldn't find him. Years later, they > met Dr. Cailleteaux. Nice man. > > And the French don't pronounce Baton Rouge as "batten wrooj". > > Also: Natchitoches is "NACK-a-tish". > > wunder > > On Jul 26, 2010, at 10:44 AM, <[email protected]> wrote: > > Clearly you haven’t been in the Northeast much. Try “Worcester” vs. > “wuster”, or “Leominster” vs. “leminster”. It’s also likely to be a > challenge to come up with the right phonetics for any given proper location > name. It’s even worse in Britain, or countries where the phonetic rules > may be a hodgepodge of different colonial influences. > > That having been said, if there exists a “PhoneticQuery” object that does > all this using the automaton logic under the covers, I think it would be > worth a serious look. > > Karl > > > *From:* ext Robert Muir [mailto:[email protected]] > *Sent:* Monday, July 26, 2010 1:24 PM > *To:* [email protected] > *Subject:* Re: LevenshteinFilter proposal > > > > On Mon, Jul 26, 2010 at 1:13 PM, <[email protected]> wrote: > What I want to capture is situations where people misspell things in > roughly a phonetic way. For example, “Tchaikovsky Avenue” might be > misspelled as “Chicovsky Avenue”. Modules that do phonetic mapping are > possible but you’d have to somehow generate a phonetic database of (say) > streetnames, worldwide. Good luck on getting hold of that kind of data > anywhere. ;-) In the absence of such data, an LD distance will have to do – > but it will almost certainly need to be greater than 2. > I added this to 'TestPhoneticFilter' and it passes: assertAlgorithm(new > DoubleMetaphone(), false, "Tchaikovsky Chicovsky", new String[] { "XKFS", > "XKFS" }); > > So if you want to give me all your street names, i can sell you a phonetic > database, or you can use the filters in modules/analyzers/phonetic, which > have a bunch of different configurable algorithms :) > > -- > Robert Muir > [email protected] > > > > > -- Robert Muir [email protected]
