On 2023-03-16 10:33 AM, Andy C wrote:
A perhaps simplistic option would be to map accented letters to their
unaccented versions using either the ASCII Folding Filter or the ICU
Folding Filter.

Or the equivalent of
'''
unicodedata.normalize( "NFKD", v ).encode('ascii','ignore').decode()
'''
(v.2 python) when importing into the index.

Unfortunately that's not going to help: we had people complain that "Muller" does not find "Mueller" -- which is/was a common English way to transcribe "Müller".

It gets worse: e.g. with Slav "short i" and "open i" that "Zelenskyy" spells as "-yy". They are two different sounds, neither has a Latin letter for it, Russians would usually transcribe it as "-iy" because "i" isn't that "open" but Poles would more likely use a single "-i" because the "y" is "almost silent".

If anyone knows of a usable implementation of name search in Solr, I would very much like to hear about it too because we do have lots of name records in our index and genealogy researchers are complaining.

Dima

Reply via email to