Try mixing colonial influences with Native American names. When my parents moved to Baton Rouge, LA years ago, they got a recommendation for a "Dr. Kyto". They couldn't find him. Years later, they met Dr. Cailleteaux. Nice man.
And the French don't pronounce Baton Rouge as "batten wrooj". Also: Natchitoches is "NACK-a-tish". wunder On Jul 26, 2010, at 10:44 AM, <[email protected]> wrote: > Clearly you haven’t been in the Northeast much. Try “Worcester” vs. > “wuster”, or “Leominster” vs. “leminster”. It’s also likely to be a > challenge to come up with the right phonetics for any given proper location > name. It’s even worse in Britain, or countries where the phonetic rules may > be a hodgepodge of different colonial influences. > > That having been said, if there exists a “PhoneticQuery” object that does all > this using the automaton logic under the covers, I think it would be worth a > serious look. > > Karl > > > From: ext Robert Muir [mailto:[email protected]] > Sent: Monday, July 26, 2010 1:24 PM > To: [email protected] > Subject: Re: LevenshteinFilter proposal > > > > On Mon, Jul 26, 2010 at 1:13 PM, <[email protected]> wrote: > What I want to capture is situations where people misspell things in roughly > a phonetic way. For example, “Tchaikovsky Avenue” might be misspelled as > “Chicovsky Avenue”. Modules that do phonetic mapping are possible but you’d > have to somehow generate a phonetic database of (say) streetnames, worldwide. > Good luck on getting hold of that kind of data anywhere. ;-) In the absence > of such data, an LD distance will have to do – but it will almost certainly > need to be greater than 2. > I added this to 'TestPhoneticFilter' and it passes: assertAlgorithm(new > DoubleMetaphone(), false, "Tchaikovsky Chicovsky", new String[] { "XKFS", > "XKFS" }); > > So if you want to give me all your street names, i can sell you a phonetic > database, or you can use the filters in modules/analyzers/phonetic, which > have a bunch of different configurable algorithms :) > > -- > Robert Muir > [email protected]
