[Koha-devel] What's your opinion on sorting facets with diacritics? (Bug 36947)

David Cook via Koha-devel Mon, 17 Jun 2024 16:32:26 -0700

Hi all,


Lari Strand pointed out on Bug 36947 that Koha doesn’t take into account 
diacritics when it sorts facet names. He talks about Elasticsearch there 
although it appears Zebra also has the same issue.

 

The first proposed solution was to strip out the diacritics using 
Unicode::Normalize. This worked pretty well, but on Mattermost paxed pointed 
out that this doesn’t work well for Finnish where “⟨å⟩, ⟨ä⟩ and ⟨ö⟩ are 
regarded as distinct letters and collated after ⟨z⟩” (as per Wikipedia’s 
“Finnish orthography” entry. 

 

So I looked at some locale-based Perl core options like “use locale” and 
“Unicode::Collate::Locale”, and I really really like 
“Unicode::Collate::Locale”. It leverages the Linux locale files to perfectly 
sort the text. (The only gotcha is that it’s based off the system locale. There 
are ways we could use the UI-chosen language instead, but I figure that’s a 
future development.)

 

Anyway, I just want to get more eyes on this code, because it’s super 
interesting. The patch is very small and easy to understand. I just want to get 
more opinions about what we should be doing with it.

 

Cheers!

 

David Cook

Senior Software Engineer

Prosentient Systems

Suite 7.03

6a Glen St

Milsons Point NSW 2061

Australia

 

Office: 02 9212 0899

Online: 02 8005 0595

_______________________________________________
Koha-devel mailing list
Koha-devel@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : https://www.koha-community.org/
git : https://git.koha-community.org/
bugs : https://bugs.koha-community.org/

[Koha-devel] What's your opinion on sorting facets with diacritics? (Bug 36947)

Reply via email to