Not with ICUFoldingFilter, but with the MappingCharFilter.

There you can supply a mapping file and skip baseletter mappings for the users' 
native language, because in their own language, they know the correct spelling 
... most of the time ... sometimes.

This does really help with multiple languages and you lose the convenience of 
ICUFoldingFilter.

André
________________________________
From: Jan Høydahl <jan....@cominvent.com>
Sent: Wednesday, 25 August 2021 15:43
To: users@solr.apache.org <users@solr.apache.org>
Subject: ICUFoldingFilter with preserveOriginal option?

External e-mail.


Hi,

I'm looking at using ICUFoldingFilter for a customer, to fold e.g. Genéve to 
Geneve and thus get better recall.
However, for some common Norwegian words, the folding makes them clash with 
super-common words so it becomes impossible to find exactly what you want.
I imagined if ICUFoldingFilter had a preserverOriginal=true option, then it 
could leave the original word in the index on the same position, and an exact 
match for "Genéve" would score better than the normalized one. But this filter 
does not support this.

Have anyone found a workaround for this, except from duplicating all content in 
different fields with different analysis and search across them with different 
weights?

Jan

Reply via email to