Just a heads up that simply using Unicode or locale-based sorting for Hebrew with vowels and accents does not provide the correct order! Pointed Hebrew is supposed to be sorted as if the various diacritics aren't there (except for sin and shin) and then vowels are used as a secondary criterion (the order of which varies from source to source). I've been in correspondence with the Academy for the Hebrew Language in Israel about this very topic.
The problem with the Hebrew vowels is that almost all of them are represented as unicode combining charcters (which have their own code points) instead of having unique code points for every possible character (there would be too many anyway) that would be more helpful for locale-based collation strings. I've written a script that properly sorts pointed Hebrew for the glossary of the Hebrew grammar I'm working on, and I'd be happy to share it, but I'm not sure how practical it is to have a unique sort method for one problem language. (On the other hand, perhaps it is worth it, since it fixes a problem for two of the three languages the Bible was actually written in) On Wed, Jan 13, 2016 at 2:37 PM, Karl Kleinpaste <k...@kleinpaste.org> wrote: > On 01/12/2016 11:32 AM, DM Smith wrote: > > Is ICU4C out of the question? > > Thanx for the pointer. It took a bit more contemplation than it probably > should have, but I used ucol_strcollUTF8() (in icu-i18n) and it seems fine. > > _______________________________________________ > sword-devel mailing list: sword-devel@crosswire.org > http://www.crosswire.org/mailman/listinfo/sword-devel > Instructions to unsubscribe/change your settings at above page _______________________________________________ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page