Re: phonetic search and accents

Mikhail Khludnev Thu, 16 Mar 2023 12:41:44 -0700

Dima, I did a simple exercise with BMPM. It seems it handles these cases
well.
BMPM Rocks!!! – Telegraph <https://telegra.ph/BMPM-Rocks-03-16>


On Thu, Mar 16, 2023 at 8:02 PM dmitri maziuk <dmitri.maz...@gmail.com>
wrote:

> On 2023-03-16 10:33 AM, Andy C wrote:
> > A perhaps simplistic option would be to map accented letters to their
> > unaccented versions using either the ASCII Folding Filter or the ICU
> > Folding Filter.
>
> Or the equivalent of
> '''
> unicodedata.normalize( "NFKD", v ).encode('ascii','ignore').decode()
> '''
> (v.2 python) when importing into the index.
>
> Unfortunately that's not going to help: we had people complain that
> "Muller" does not find "Mueller" -- which is/was a common English way to
> transcribe "Müller".
>
> It gets worse: e.g. with Slav "short i" and "open i" that "Zelenskyy"
> spells as "-yy". They are two different sounds, neither has a Latin
> letter for it, Russians would usually transcribe it as "-iy" because "i"
> isn't that "open" but Poles would more likely use a single "-i" because
> the "y" is "almost silent".
>
> If anyone knows of a usable implementation of name search in Solr, I
> would very much like to hear about it too because we do have lots of
> name records in our index and genealogy researchers are complaining.
>
> Dima
>
>

-- 
Sincerely yours
Mikhail Khludnev
https://t.me/MUST_SEARCH
A caveat: Cyrillic!

Re: phonetic search and accents

Reply via email to