https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=247494

--- Comment #4 from Conrad Meyer <c...@freebsd.org> ---
Ok, so radix sort only goes byte-at-a-time; we can't allocate memory for all
wchar_t space (4 GB).  Here are the wchar_t representations of the two
characters:

echo 耳脳 | iconv -f utf-8 -t ucs-4 | hd
00000000  00 00 80 33 00 00 81 33                           |...3...3....|
          ^ first     ^ second

It incorrectly looks at the least significant byte of the wchar_t, observes
that 33 == 33 and invokes collate to attempt to differentiate the two strings. 
But using radixsort's level is wrong for bwscoll, which expects an offset in
wchar_t.  Since radixsort has only processed 1/4 of a wchar_t, this is a bogus
offset.

I'm not sure how our radixsort is supposed to work, honestly.  It seems pretty
broken, even for ASCII.  It should be able to bucket multiple keys that share a
character per level, but it doesn't — it falls back on comparison in that case.

-- 
You are receiving this mail because:
You are the assignee for the bug.
_______________________________________________
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"

Reply via email to