or Washington / District of Columbia.

my point is i wouldnt do anything complicated and slow if you can get away
with analyzers/phonetics and maybe some synonyms, or maybe even just the
spellchecker.
theres always crazy cases none of these algorithms will work for.

On Mon, Jul 26, 2010 at 2:09 PM, Walter Underwood <[email protected]>wrote:

> Try mixing colonial influences with Native American names.
>
> When my parents moved to Baton Rouge, LA years ago, they got a
> recommendation for a "Dr. Kyto". They couldn't find him. Years later, they
> met Dr. Cailleteaux. Nice man.
>
> And the French don't pronounce Baton Rouge as "batten wrooj".
>
> Also: Natchitoches is "NACK-a-tish".
>
> wunder
>
> On Jul 26, 2010, at 10:44 AM, <[email protected]> wrote:
>
> Clearly you haven’t been in the Northeast much.  Try “Worcester” vs.
> “wuster”, or “Leominster” vs. “leminster”.  It’s also likely to be a
> challenge to come up with the right phonetics for any given proper location
> name.   It’s even worse in Britain, or countries where the phonetic rules
> may be a hodgepodge of different colonial influences.
>
> That having been said, if there exists a “PhoneticQuery” object that does
> all this using the automaton logic under the covers, I think it would be
> worth a serious look.
>
> Karl
>
>
>  *From:* ext Robert Muir [mailto:[email protected]]
> *Sent:* Monday, July 26, 2010 1:24 PM
> *To:* [email protected]
> *Subject:* Re: LevenshteinFilter proposal
>
>
>
> On Mon, Jul 26, 2010 at 1:13 PM, <[email protected]> wrote:
> What I want to capture is situations where people misspell things in
> roughly a phonetic way.  For example, “Tchaikovsky Avenue” might be
> misspelled as “Chicovsky Avenue”.  Modules that do phonetic mapping are
> possible but you’d have to somehow generate a phonetic database of (say)
> streetnames, worldwide.  Good luck on getting hold of that kind of data
> anywhere. ;-)  In the absence of such data, an LD distance will have to do –
> but it will almost certainly need to be greater than 2.
> I added this to 'TestPhoneticFilter' and it passes:  assertAlgorithm(new
> DoubleMetaphone(), false, "Tchaikovsky Chicovsky", new String[] { "XKFS",
> "XKFS" });
>
> So if you want to give me all your street names, i can sell you a phonetic
> database, or you can use the filters in modules/analyzers/phonetic, which
> have a bunch of different configurable algorithms :)
>
> --
> Robert Muir
> [email protected]
>
>
>
>
>


-- 
Robert Muir
[email protected]

Reply via email to