Re: ICUFoldingFilter with preserveOriginal option?

Jan Høydahl Thu, 26 Aug 2021 01:40:00 -0700

Hi,

Thanks for the input. We already use the filter parameter to guard æøåäö. We 
could of course guard é or ô against normalization too, but thise becomes quite 
broad, and much of the benefit disappears.
If the filter supported some kind of protwords-list for exceptions, we could 
start assembling words that we know for sure clashes and should be excepted, 
however an exact-match rank boost approach would seem more flexible.


Jan

> 26. aug. 2021 kl. 10:08 skrev Ere Maijala <ere.maij...@helsinki.fi>:
> 
> Hi,
> 
> For our Finnish audience we avoid folding some characters to alleviate the 
> problem. Along with MappingCharFilter this works pretty well. See 
> https://github.com/NatLibFi/finna-solr/blob/dev/vufind/biblio/conf/schema.xml#L7
>  for examples. Depending on your use case this could be a solution as well. 
> Note that the filter parameter hasn't always been there, so a recent-enough 
> Solr version is needed (I fail to recall the exact version).
> 
> --Ere
> 
> Jan Høydahl kirjoitti 25.8.2021 klo 16.43:
>> Hi,
>> I'm looking at using ICUFoldingFilter for a customer, to fold e.g. Genéve to 
>> Geneve and thus get better recall.
>> However, for some common Norwegian words, the folding makes them clash with 
>> super-common words so it becomes impossible to find exactly what you want.
>> I imagined if ICUFoldingFilter had a preserverOriginal=true option, then it 
>> could leave the original word in the index on the same position, and an 
>> exact match for "Genéve" would score better than the normalized one. But 
>> this filter does not support this.
>> Have anyone found a workaround for this, except from duplicating all content 
>> in different fields with different analysis and search across them with 
>> different weights?
>> Jan
> 
> -- 
> Ere Maijala
> Kansalliskirjasto / The National Library of Finland

Re: ICUFoldingFilter with preserveOriginal option?

Reply via email to