Dima, I did a simple exercise with BMPM. It seems it handles these cases well. BMPM Rocks!!! – Telegraph <https://telegra.ph/BMPM-Rocks-03-16>
On Thu, Mar 16, 2023 at 8:02 PM dmitri maziuk <dmitri.maz...@gmail.com> wrote: > On 2023-03-16 10:33 AM, Andy C wrote: > > A perhaps simplistic option would be to map accented letters to their > > unaccented versions using either the ASCII Folding Filter or the ICU > > Folding Filter. > > Or the equivalent of > ''' > unicodedata.normalize( "NFKD", v ).encode('ascii','ignore').decode() > ''' > (v.2 python) when importing into the index. > > Unfortunately that's not going to help: we had people complain that > "Muller" does not find "Mueller" -- which is/was a common English way to > transcribe "Müller". > > It gets worse: e.g. with Slav "short i" and "open i" that "Zelenskyy" > spells as "-yy". They are two different sounds, neither has a Latin > letter for it, Russians would usually transcribe it as "-iy" because "i" > isn't that "open" but Poles would more likely use a single "-i" because > the "y" is "almost silent". > > If anyone knows of a usable implementation of name search in Solr, I > would very much like to hear about it too because we do have lots of > name records in our index and genealogy researchers are complaining. > > Dima > > -- Sincerely yours Mikhail Khludnev https://t.me/MUST_SEARCH A caveat: Cyrillic!