Re: [sword-devel] better UTF-sensitive sort

2016-01-14 Thread David Haslam
Thanks Aaron, I think it would be nice to share the script. I have friends that are working on Biblical Hebrew projects. Please start a new thread if you do. Best regards, David -- View this message in context: http://sword-dev.350566.n4.nabble.com/better-UTF-sensitive-sort-tp4655731p465575

Re: [sword-devel] better UTF-sensitive sort

2016-01-13 Thread Aaron Christianson
Just a heads up that simply using Unicode or locale-based sorting for Hebrew with vowels and accents does not provide the correct order! Pointed Hebrew is supposed to be sorted as if the various diacritics aren't there (except for sin and shin) and then vowels are used as a secondary criterion (the

Re: [sword-devel] better UTF-sensitive sort

2016-01-13 Thread Karl Kleinpaste
On 01/12/2016 11:32 AM, DM Smith wrote: > Is ICU4C out of the question? Thanx for the pointer. It took a bit more contemplation than it probably should have, but I used ucol_strcollUTF8() (in icu-i18n) and it seems fine. ___ sword-devel mailing list: swo

Re: [sword-devel] better UTF-sensitive sort

2016-01-13 Thread Matěj Cepl
On 2016-01-13, 09:46 GMT, Matěj Cepl wrote: > My colleague working on LibreOffice claims that he doesn’t know > about anything better than ICU. Yes, it is a monster. Perhaps > UTF-8->UTF-16LE->UTF-8 round-trip is not that expensive after > all? Besides, don't we have ICU already as dependency f

Re: [sword-devel] better UTF-sensitive sort

2016-01-13 Thread David Haslam
Aside: FIO. I since found out that Notepad++ doesn't support UTF-16 but rather UCS-2. Hence no support beyond the Unicode BMP. David -- View this message in context: http://sword-dev.350566.n4.nabble.com/better-UTF-sensitive-sort-tp4655731p4655741.html Sent from the SWORD Dev mailing list ar

Re: [sword-devel] better UTF-sensitive sort

2016-01-13 Thread Matěj Cepl
On 2016-01-12, 16:52 GMT, DM Smith wrote: > You can take the second column and sort it by each of the > locales mentioned. https://mcepl.fedorapeople.org/tmp/sort-complicated.txt is the second column as a simple plain text in UTF8. My colleague working on LibreOffice claims that he doesn’t know

Re: [sword-devel] better UTF-sensitive sort

2016-01-12 Thread David Haslam
Hi Karl, Windows is largely based on UTF-16, even though many applications can handle UTF-8 internally. Have you timed a round trip conversion of the language names from UTF-8 to UTF-16 and back again? Is it really so slow that it would be noticeable to users? Are you looking for a locale neutra

Re: [sword-devel] better UTF-sensitive sort

2016-01-12 Thread David Haslam
FIO. Screenshot of the BabelPad sort dialog: https://www.dropbox.com/s/hedexkg6wc3fnhi/Screenshot%202016-01-12%2019.38.23.png?dl=0 David -- View this message in context: http://sword-dev.350566.n4.nabble.com/better-UTF-sensitive-sort-tp4655731p4655738.html Sent from the SWORD Dev mailing lis

Re: [sword-devel] better UTF-sensitive sort

2016-01-12 Thread David Haslam
Thanks DM. Before tackling sorts of any kind, I first used three different programs to convert the UTF-8 to UTF-16LE. Although this is away from where Karl wishes to go, I still thought it would be interesting. BabelPad and TextPipe gave identical results which is a positive. Notepad++ didn't co

Re: [sword-devel] better UTF-sensitive sort

2016-01-12 Thread DM Smith
For a localized list of language names see our wiki: http://www.crosswire.org/wiki/Localized_Language_Names You can take the second column and sort it by each of the locales mentioned. Example of the complexity: It used to be the standard

Re: [sword-devel] better UTF-sensitive sort

2016-01-12 Thread David Haslam
Karl, Please could you provide as an example to play with a copy of a typical unsorted languages list. I'd like to see what happens with one of my favourite Windows programs. Best regards, David -- View this message in context: http://sword-dev.350566.n4.nabble.com/better-UTF-sensitive-sort

Re: [sword-devel] better UTF-sensitive sort

2016-01-12 Thread DM Smith
Is ICU4C out of the question? It has support for collation. See: http://site.icu-project.org/design/collation/v2 > On Jan 12, 2016, at 11:12 AM, Karl Kleinpaste wrote: > > To produce Xiphos' module trees (sidebar, mod.mgr, adv.search), I sort b

[sword-devel] better UTF-sensitive sort

2016-01-12 Thread Karl Kleinpaste
To produce Xiphos' module trees (sidebar, mod.mgr, adv.search), I sort by language using qsort+strcmp. This was recently pointed out as being poor for UTF-8 strings, and I replaced strcmp with strcoll. This works fine in Linux. Unfortunately, the Win32 version of strcoll believes in UTF-16, even wh