https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=247494
--- Comment #2 from Conrad Meyer <c...@freebsd.org> --- I think the lengths printed in the bad example are correct; that is a measure of wchar_t's, whereas in LC_CTYPE=C, the length is in bytes. So it seems like it is a comparison problem. I think we invoke wstrcoll() -> bwscoll() in the latter case. bwscoll() seems to be broken for short strings: if (len1 <= offset) return ((len2 <= offset) ? 0 : -1); E.g., $ (echo a耳 ; echo a脳 ; echo a耳) | LC_CTYPE=ja_JP.UTF-8 LC_COLLATE=C LANG=C sort --debug ... ; offset=1 ; k1=<a耳>(2), k2=<a脳>(2); offset=1; s1=<a耳>, s2=<a脳>; cmp1=-256 ; offset=1 ; k1=<a脳>(2), k2=<a耳>(2); offset=1; s1=<a脳>, s2=<a耳>; cmp1=256 ; offset=1 ; k1=<a耳>(2), k2=<a耳>(2); offset=1; s1=<a耳>, s2=<a耳>; cmp1=0 a耳 a耳 a脳 The result is correct, because length (2) < offset (1). I don't know if 'offset' here is wrong, or if bswcoll is wrong. It seems like maybe it only invokes bswcoll() on strings it thinks are identical from a radix perspective. So perhaps the problem is some combination of wcstr and byte_sort in radixsort. In --mergesort mode, the result and comparisons are correct: (echo 耳 ; echo 脳 ; echo 耳) | LC_CTYPE=ja_JP.UTF-8 LC_COLLATE=C LANG=C sort --mergesort --debug Memory to be used for sorting: 17100230656 Using collate rules of C locale Byte sort is used sort_method=mergesort ; k1=<耳>(1), k2=<脳>(1); s1=<耳>, s2=<脳>; cmp1=-256 ; k1=<脳>(1), k2=<耳>(1); s1=<脳>, s2=<耳>; cmp1=256 ; k1=<耳>(1), k2=<耳>(1); s1=<耳>, s2=<耳>; cmp1=0 耳 耳 脳 Something is broken in radixsort. -- You are receiving this mail because: You are the assignee for the bug. _______________________________________________ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"