Thomas already gave an answer with relevant points. Le nonidi 9 frimaire, an CCXXIV, Morten W. Petersen a écrit : > I'm writing an XML parser/writer/simple DOM, which will input and output > primarily in UTF-32.
Is there a good or unavoidable reason to use UTF-32? This is really a bad choice of format for external representation. Nowadays, I would say that UTF-8 should always be the preferred choice (for external representation; for internal representation, using integers for code points may be better depending on the use case). > What I'm looking for is a cross-platform way to output some data, to aid > in the testing process. Reading and writing from files will probably be > binary and handled internally in the program. If you want cross-platform, stay away from wchar_t. It allows you to do SIMPLE tings in a cross-platform way, such as printing an error message, but no more. If you need control over the encoding, then you can not do it with wchar_t portably. For starters, the i4s at microsoft decided that 64k characters should be enough for everyone, so if your cross-platform includes microsoftisms, you can not use wchar_t to represent an Unicode code point. The i4s at sun had other interesting ideas on how to make the coding for wchar_t itself depend on the locale. If you really need to write UTF-32, writing the corresponding function takes about half a minute: void put_utf32be(FILE *f, unsigned c) { putc(f, (c >> 24) & 0xFF); putc(f, (c >> 16) & 0xFF); putc(f, (c >> 8) & 0xFF); putc(f, (c >> 0) & 0xFF); } Note: I hereby place this code under the terms of the GNU GPL. And correctly handling errors is left as an exercise to the reader. Regards, -- Nicolas George
signature.asc
Description: Digital signature