Abdelrazak Younes wrote: > But IMHO, there is really not need to use iconv for these simple > conversions. I even think that we should do the ucs4 to/from utf8 > ourselves... it looks pretty simple from a first glance.
Here a reference why this is often correct, at least for ucs-4 values smaller than 0xFFFF (65535) see http://www.unicode.org/Public/PROGRAMS/CVTUTF/ConvertUTF.c and http://www.unicode.org/Public/PROGRAMS/CVTUTF/ConvertUTF.h /* Some fundamental constants */ #define UNI_MAX_BMP (UTF32)0x0000FFFF #define UNI_SUR_HIGH_START (UTF32)0xD800 #define UNI_SUR_LOW_END (UTF32)0xDFFF ConversionResult ConvertUTF32toUTF16 ( const UTF32** sourceStart, const UTF32* sourceEnd, UTF16** targetStart, UTF16* targetEnd, ConversionFlags flags) { ConversionResult result = conversionOK; const UTF32* source = *sourceStart; UTF16* target = *targetStart; while (source < sourceEnd) { UTF32 ch; if (target >= targetEnd) { result = targetExhausted; break; } ch = *source++; if (ch <= UNI_MAX_BMP) /* Target is a character <= 0xFFFF */ { /* UTF-16 surrogate values are illegal in UTF-32; 0xffff or 0xfffe are both reserved values */ if (ch >= UNI_SUR_HIGH_START && ch <= UNI_SUR_LOW_END) { if (flags == strictConversion) { --source; /* return to the illegal value itself */ result = sourceIllegal; break; } else { *target++ = UNI_REPLACEMENT_CHAR; } } else { *target++ = (UTF16)ch; /* normal case */ } for values which are not surrogate "if (ch >= UNI_SUR_HIGH_START && ch <= UNI_SUR_LOW_END)" (2047 values) the UTF-16 value is only a cast of the utf32/ucs value, and utf16==ucs2 (see unicode 4.0 docs appendix C). Peter