Re: phonetic search and accents

Alexandre Rafalovitch Thu, 16 Mar 2023 10:09:34 -0700

I think the common approach was multi-indexing with increasingly less
precice mapping and searching those alternative fields with different
weights (E. G. With expanding field name aliases to manage those weights).
Similar to issues for searching some Asian names where 1st name and 2nd
name may be entered in unexpected order.


But that will not combine well with any simple sort then.

Regards,
   Alex

On Thu., Mar. 16, 2023, 1:02 p.m. dmitri maziuk, <dmitri.maz...@gmail.com>
wrote:

> On 2023-03-16 10:33 AM, Andy C wrote:
> > A perhaps simplistic option would be to map accented letters to their
> > unaccented versions using either the ASCII Folding Filter or the ICU
> > Folding Filter.
>
> Or the equivalent of
> '''
> unicodedata.normalize( "NFKD", v ).encode('ascii','ignore').decode()
> '''
> (v.2 python) when importing into the index.
>
> Unfortunately that's not going to help: we had people complain that
> "Muller" does not find "Mueller" -- which is/was a common English way to
> transcribe "Müller".
>
> It gets worse: e.g. with Slav "short i" and "open i" that "Zelenskyy"
> spells as "-yy". They are two different sounds, neither has a Latin
> letter for it, Russians would usually transcribe it as "-iy" because "i"
> isn't that "open" but Poles would more likely use a single "-i" because
> the "y" is "almost silent".
>
> If anyone knows of a usable implementation of name search in Solr, I
> would very much like to hear about it too because we do have lots of
> name records in our index and genealogy researchers are complaining.
>
> Dima
>
>

Re: phonetic search and accents

Reply via email to