Silvan Jegen dixit: >Wouldn't a 16-bit wchar_t be non-standard-conform when using a UTF-8 >locale?
Nope. UTF-8 is just an encoding for Unicode, and as long as I take care to #define __STDC_ISO_10646__ 200009L (and no later date) this is perfectly permissible. (And please do not language-lawyer me, I’ve had enough of those, and since I can prove that 100% POSIX compliance is probably illegal in my country, I don’t care, even.) >So the problem seems to be that binary files contain bytes that are not >valid UTF-8 and that using tools on them that expect UTF-8 will mangle >these files. No. The problem is that “using tools that use the wchar_t API” will mangle them _iff_ the locale is UTF-8. So if your C locale is UTF-8, you *will* break all kinds of things, since “env LC_ALL=C tr x x <binfile” is supposed to retain the binary input unchanged. This just means that your C locale cannot be strictly UTF-8. All others can, but the C locale is precisely for this. This is because the C locale is special like that. bye, //mirabilos -- 13:37⎜«Natureshadow» Deep inside, I hate mirabilos. I mean, he's a good guy. But he's always right! In every fsckin' situation, he's right. Even with his deeply perverted taste in software and borked ambition towards broken OSes - in the end, he's damn right about it :(! […] works in mksh