On 06/07/2012 03:13 PM, Paolo Bonzini wrote: > Il 07/06/2012 14:50, Eric Blake ha scritto: >>>> The fix could be to have two different locale_charset() functions, >>>> one that returns "US-ASCII" and another one that returns "UTF-8". >>>> The first one to be used when MB_CUR_MAX and mbrtowc() are used as >>>> well, the second one to be used by gettext(). But the separation >>>> line between the two cases is not yet clear to me. Any insights? > > The separation line is what you wrote: whether you'll use the text > simply for presentation, or whether you'll process it before. But > alternatively, we might try a variant of what Eric has suggested... > >> On OS X, can we wrap MB_CUR_MAX to pretend to be 1 when in the "C" >> locale, to match what cygwin did in distinguishing between 'C' and >> 'C.UTF-8'? > > ... which is to wrap MB_CUR_MAX and pretend that it is 3.
How do Mac OS X native command line tools select between UTF8 and byte processing (C in the traditional sense). I'd find it surprising if they didn't support a "C" mode, for performance or functional reasons. Max Horn states that Terminal.app on Mac OS X has an option "Set LANG environment variable" which is _enabled by default_. So that seems to me like it would handle the gettext issue? Thus allowing the handling of US-ASCII as "normal", and not mapping to UTF8. cheers, Pádraig.