Clearly you haven’t been in the Northeast much.  Try “Worcester” vs. “wuster”, 
or “Leominster” vs. “leminster”.  It’s also likely to be a challenge to come up 
with the right phonetics for any given proper location name.   It’s even worse 
in Britain, or countries where the phonetic rules may be a hodgepodge of 
different colonial influences.

That having been said, if there exists a “PhoneticQuery” object that does all 
this using the automaton logic under the covers, I think it would be  worth a 
serious look.

Karl


From: ext Robert Muir [mailto:[email protected]]
Sent: Monday, July 26, 2010 1:24 PM
To: [email protected]
Subject: Re: LevenshteinFilter proposal


On Mon, Jul 26, 2010 at 1:13 PM, 
<[email protected]<mailto:[email protected]>> wrote:
What I want to capture is situations where people misspell things in roughly a 
phonetic way.  For example, “Tchaikovsky Avenue” might be misspelled as 
“Chicovsky Avenue”.  Modules that do phonetic mapping are possible but you’d 
have to somehow generate a phonetic database of (say) streetnames, worldwide.  
Good luck on getting hold of that kind of data anywhere. ;-)  In the absence of 
such data, an LD distance will have to do – but it will almost certainly need 
to be greater than 2.
I added this to 'TestPhoneticFilter' and it passes:  assertAlgorithm(new 
DoubleMetaphone(), false, "Tchaikovsky Chicovsky", new String[] { "XKFS", 
"XKFS" });

So if you want to give me all your street names, i can sell you a phonetic 
database, or you can use the filters in modules/analyzers/phonetic, which have 
a bunch of different configurable algorithms :)

--
Robert Muir
[email protected]<mailto:[email protected]>

Reply via email to