Bug#603914: Please drop non-UTF8 locales

Thorsten Glaser Sun, 09 Jan 2011 17:51:18 -0800

Roger Leigh dixit:

>I think the "all byte sequences valid" applies mainly to narrow
>character I/O.  i.e. printf/puts etc. won't alter, drop or otherwise
>mangle any non 7-bit-ASCII codes.  i.e. I think the intent was to
>ensure 8-bit cleanliness in a 7-bit locale.  This naturally extends
>to UTF-8.  I'm not sure that wide character support is implied here,
>given that it implicity requires correct byte sequences to function
>where the narrow character I/O does not (all 8-bit codes are correct).


I was thinking in terms of programmes doing operation on wide characters
internally (for example, tr was the first one I switched to wide charac-
ters, since in MirBSD they use 16 bit, and the table driven design con-
tinued to work; this is also where I noticed the problem). Those are the
programmes you want to be aware of: they _are_ internationalised, thus
use wchar_t and multibytes and narrow I/O, or wchar_t and wide I/O, and
these will benefit from the C.UTF-8 locale; others (that just run on
byte strings as if they were characters) don’t see a difference between
it and the classical C locale anyway.

What I mean is, we try to use C.UTF-8 in places where we want to run
on text in UTF-8 but otherwise keep the normed predictable uniform
behaviour of C; in places where we operate on binary data C is pro-
bably more useful.

Hum. Do I make any sense?

Goodnight,
//mirabilos
-- 
“It is inappropriate to require that a time represented as
 seconds since the Epoch precisely represent the number of
 seconds between the referenced time and the Epoch.”
        -- IEEE Std 1003.1b-1993 (POSIX) Section B.2.2.2



--
To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/pine.bsm.4.64l.1101100139410.13...@herc.mirbsd.org

Bug#603914: Please drop non-UTF8 locales

Reply via email to