Re: LC_COLLATE in the C locale

Paul Eggert Wed, 18 Dec 2019 08:27:56 -0800

On 12/18/19 2:29 AM, Bruno Haible wrote:
> Hi Paul,
> 
>> I do have a qualm in that coreutils (and I assume others) interpret 
>> !hard_locale
>> (LC_COLLATE) as meaning that the locale is unibyte and uses native byte
>> comparison.
> Isn't this warranted by section "LC_COLLATE Category in the POSIX Locale" in
> <https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap07.html> ?


I don't see where that section requires unibyte.

>> As I recall on some platforms (macOS maybe?), the C locale uses
>> UTF-8 so this interpretation isn't correct.
> UTF-8 has the nice property that byte-per-byte comparison and codepoint-per-
> codepoint comparison are equivalent.

True, so the code that assumes strcmp == strcoll should work. But I think some
code specifically assumes unibyte. Presumably that code should also check
MB_CUR_MAX, which should be enough in practice (even though it doesn't suffice
in theory).

Re: LC_COLLATE in the C locale

Reply via email to