On 12/18/19 2:29 AM, Bruno Haible wrote: > Hi Paul, > >> I do have a qualm in that coreutils (and I assume others) interpret >> !hard_locale >> (LC_COLLATE) as meaning that the locale is unibyte and uses native byte >> comparison. > Isn't this warranted by section "LC_COLLATE Category in the POSIX Locale" in > <https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap07.html> ?
I don't see where that section requires unibyte. >> As I recall on some platforms (macOS maybe?), the C locale uses >> UTF-8 so this interpretation isn't correct. > UTF-8 has the nice property that byte-per-byte comparison and codepoint-per- > codepoint comparison are equivalent. True, so the code that assumes strcmp == strcoll should work. But I think some code specifically assumes unibyte. Presumably that code should also check MB_CUR_MAX, which should be enough in practice (even though it doesn't suffice in theory).