https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=247494
--- Comment #1 from Conrad Meyer <c...@freebsd.org> --- On CURRENT: $ LC_CTYPE=ja_JP.UTF-8 LC_COLLATE=C LANG=C locale LANG=C LC_CTYPE=ja_JP.UTF-8 LC_COLLATE="C" LC_TIME="C" LC_NUMERIC="C" LC_MONETARY="C" LC_MESSAGES="C" LC_ALL= sort(1) attempts to identify situations where it can run in fast, byte-compare only mode by looking only at LC_COLLATE. The --debug option shows more information: $ (echo 耳 ; echo 脳 ; echo 耳) | LC_CTYPE=ja_JP.UTF-8 LC_COLLATE=C LANG=C sort --debug Memory to be used for sorting: 17100230656 Using collate rules of C locale Byte sort is used sort_method=radixsort ; offset=1 ; k1=<耳>(1), k2=<脳>(1); offset=1; s1=<耳>, s2=<脳>; cmp1=0 ; offset=1 ; k1=<脳>(1), k2=<耳>(1); offset=1; s1=<脳>, s2=<耳>; cmp1=0 耳 脳 耳 Both compares seem wrong. The UTF-8 sequences share only the first byte, 0xe8. In LC_CTYPE=C mode: ; offset=1 ; k1=<耳>(3), k2=<脳>(3); offset=1; s1=<耳>, s2=<脳>; cmp1=-4 ; offset=1 ; k1=<脳>(3), k2=<耳>(3); offset=1; s1=<脳>, s2=<耳>; cmp1=4 ; offset=1 ; k1=<耳>(3), k2=<耳>(3); offset=1; s1=<耳>, s2=<耳>; cmp1=0 耳 耳 脳 The comparisons look correct. I will look a little more. I think this is a bug, not design, but I am not sure yet. -- You are receiving this mail because: You are the assignee for the bug. _______________________________________________ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"