Re: GNU sed version 4.2.1: on OS X, C locale gets aliased to UTF-8

Eric Blake Thu, 07 Jun 2012 07:51:35 -0700

On 06/07/2012 08:13 AM, Paolo Bonzini wrote:
> Il 07/06/2012 14:50, Eric Blake ha scritto:
>>>> The fix could be to have two different locale_charset() functions,
>>>> one that returns "US-ASCII" and another one that returns "UTF-8".
>>>> The first one to be used when MB_CUR_MAX and mbrtowc() are used as
>>>> well, the second one to be used by gettext(). But the separation
>>>> line between the two cases is not yet clear to me. Any insights?
> 
> The separation line is what you wrote: whether you'll use the text
> simply for presentation, or whether you'll process it before.  But
> alternatively, we might try a variant of what Eric has suggested...
> 
>> On OS X, can we wrap MB_CUR_MAX to pretend to be 1 when in the "C"
>> locale, to match what cygwin did in distinguishing between 'C' and
>> 'C.UTF-8'?
> 
> ... which is to wrap MB_CUR_MAX and pretend that it is 3.


Actually, MB_CUR_MAX of UTF-8 is 6, thanks to surrogate pairs.

-- 
Eric Blake   [email protected]    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: GNU sed version 4.2.1: on OS X, C locale gets aliased to UTF-8

Reply via email to