Roger Leigh dixit: >I think the "all byte sequences valid" applies mainly to narrow >character I/O. i.e. printf/puts etc. won't alter, drop or otherwise >mangle any non 7-bit-ASCII codes. i.e. I think the intent was to >ensure 8-bit cleanliness in a 7-bit locale. This naturally extends >to UTF-8. I'm not sure that wide character support is implied here, >given that it implicity requires correct byte sequences to function >where the narrow character I/O does not (all 8-bit codes are correct).
I was thinking in terms of programmes doing operation on wide characters internally (for example, tr was the first one I switched to wide charac- ters, since in MirBSD they use 16 bit, and the table driven design con- tinued to work; this is also where I noticed the problem). Those are the programmes you want to be aware of: they _are_ internationalised, thus use wchar_t and multibytes and narrow I/O, or wchar_t and wide I/O, and these will benefit from the C.UTF-8 locale; others (that just run on byte strings as if they were characters) don’t see a difference between it and the classical C locale anyway. What I mean is, we try to use C.UTF-8 in places where we want to run on text in UTF-8 but otherwise keep the normed predictable uniform behaviour of C; in places where we operate on binary data C is pro- bably more useful. Hum. Do I make any sense? Goodnight, //mirabilos -- “It is inappropriate to require that a time represented as seconds since the Epoch precisely represent the number of seconds between the referenced time and the Epoch.” -- IEEE Std 1003.1b-1993 (POSIX) Section B.2.2.2 -- To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/pine.bsm.4.64l.1101100139410.13...@herc.mirbsd.org