Abdelrazak Younes wrote:
> But IMHO, there is really not need to use iconv for these simple
> conversions. I even think that we should do the ucs4 to/from utf8
> ourselves... it looks pretty simple from a first glance.

Here a reference why this is often correct, at least for ucs-4 values
smaller than 0xFFFF (65535)


see
http://www.unicode.org/Public/PROGRAMS/CVTUTF/ConvertUTF.c
and
http://www.unicode.org/Public/PROGRAMS/CVTUTF/ConvertUTF.h


/* Some fundamental constants */
#define UNI_MAX_BMP (UTF32)0x0000FFFF


#define UNI_SUR_HIGH_START  (UTF32)0xD800
#define UNI_SUR_LOW_END     (UTF32)0xDFFF

ConversionResult ConvertUTF32toUTF16 (  const UTF32** sourceStart, const UTF32* 
sourceEnd,
        UTF16** targetStart, UTF16* targetEnd, ConversionFlags flags)
{
    ConversionResult result = conversionOK;
    const UTF32* source = *sourceStart;
    UTF16* target = *targetStart;
    while (source < sourceEnd)
    {
        UTF32 ch;
        if (target >= targetEnd) {
            result = targetExhausted; break;
        }
        ch = *source++;
        if (ch <= UNI_MAX_BMP)  /* Target is a character <= 0xFFFF */
        {
            /* UTF-16 surrogate values are illegal in UTF-32; 0xffff or 0xfffe 
are both reserved values */
            if (ch >= UNI_SUR_HIGH_START && ch <= UNI_SUR_LOW_END)
            {
                if (flags == strictConversion)
               {
                    --source; /* return to the illegal value itself */
                    result = sourceIllegal;
                    break;
                } else {
                    *target++ = UNI_REPLACEMENT_CHAR;
                }
            }
            else
            {
                *target++ = (UTF16)ch; /* normal case */
            }

for values which are not surrogate "if (ch >= UNI_SUR_HIGH_START && ch <= 
UNI_SUR_LOW_END)" (2047 values)
the UTF-16 value is only a cast of the utf32/ucs value, and utf16==ucs2 (see 
unicode 4.0 docs appendix C).

Peter

Reply via email to