On 2023-03-16 10:33 AM, Andy C wrote:
A perhaps simplistic option would be to map accented letters to their
unaccented versions using either the ASCII Folding Filter or the ICU
Folding Filter.
Or the equivalent of
'''
unicodedata.normalize( "NFKD", v ).encode('ascii','ignore').decode()
'''
(v.2 python) when importing into the index.
Unfortunately that's not going to help: we had people complain that
"Muller" does not find "Mueller" -- which is/was a common English way to
transcribe "Müller".
It gets worse: e.g. with Slav "short i" and "open i" that "Zelenskyy"
spells as "-yy". They are two different sounds, neither has a Latin
letter for it, Russians would usually transcribe it as "-iy" because "i"
isn't that "open" but Poles would more likely use a single "-i" because
the "y" is "almost silent".
If anyone knows of a usable implementation of name search in Solr, I
would very much like to hear about it too because we do have lots of
name records in our index and genealogy researchers are complaining.
Dima