Clearly you haven’t been in the Northeast much. Try “Worcester” vs. “wuster”, or “Leominster” vs. “leminster”. It’s also likely to be a challenge to come up with the right phonetics for any given proper location name. It’s even worse in Britain, or countries where the phonetic rules may be a hodgepodge of different colonial influences.
That having been said, if there exists a “PhoneticQuery” object that does all this using the automaton logic under the covers, I think it would be worth a serious look. Karl From: ext Robert Muir [mailto:[email protected]] Sent: Monday, July 26, 2010 1:24 PM To: [email protected] Subject: Re: LevenshteinFilter proposal On Mon, Jul 26, 2010 at 1:13 PM, <[email protected]<mailto:[email protected]>> wrote: What I want to capture is situations where people misspell things in roughly a phonetic way. For example, “Tchaikovsky Avenue” might be misspelled as “Chicovsky Avenue”. Modules that do phonetic mapping are possible but you’d have to somehow generate a phonetic database of (say) streetnames, worldwide. Good luck on getting hold of that kind of data anywhere. ;-) In the absence of such data, an LD distance will have to do – but it will almost certainly need to be greater than 2. I added this to 'TestPhoneticFilter' and it passes: assertAlgorithm(new DoubleMetaphone(), false, "Tchaikovsky Chicovsky", new String[] { "XKFS", "XKFS" }); So if you want to give me all your street names, i can sell you a phonetic database, or you can use the filters in modules/analyzers/phonetic, which have a bunch of different configurable algorithms :) -- Robert Muir [email protected]<mailto:[email protected]>
