Re: phonetic search and accents

dmitri maziuk Thu, 16 Mar 2023 10:02:08 -0700

On 2023-03-16 10:33 AM, Andy C wrote:

A perhaps simplistic option would be to map accented letters to their
unaccented versions using either the ASCII Folding Filter or the ICU
Folding Filter.


Or the equivalent of
'''
unicodedata.normalize( "NFKD", v ).encode('ascii','ignore').decode()
'''
(v.2 python) when importing into the index.

Unfortunately that's not going to help: we had people complain that"Muller" does not find "Mueller" -- which is/was a common English way totranscribe "Müller".

It gets worse: e.g. with Slav "short i" and "open i" that "Zelenskyy"spells as "-yy". They are two different sounds, neither has a Latinletter for it, Russians would usually transcribe it as "-iy" because "i"isn't that "open" but Poles would more likely use a single "-i" becausethe "y" is "almost silent".

If anyone knows of a usable implementation of name search in Solr, Iwould very much like to hear about it too because we do have lots ofname records in our index and genealogy researchers are complaining.


Dima

Re: phonetic search and accents

Reply via email to