Peter Kümmel wrote:
Abdelrazak Younes wrote:
Peter Kümmel wrote:
Peter Kümmel wrote:

for values which are not surrogates "if (ch >= UNI_SUR_HIGH_START &&
ch <= UNI_SUR_LOW_END)" (2047 values)
read: only 2047 of the 65535 values are not allowed, and for the rest
a cast transforms from utf32 to utf16.
I think QChar will automatically replace those with interrogation marks
anyway.

But I could also check for these values explicitely in my conversion
routine and return this '?' characters for those unknown characters:

char_type const UNI_SUR_HIGH_START 0xD800;
char_type const UNI_SUR_LOW_END 0xDFFF;

QChar const UnknownChar(...);

QChar const ucs4_to_qchar(char_type const & ucs4)
{
    if (ucs4 >= 0xFFFE
        || (ucs4 >= UNI_SUR_HIGH_START &&  ucs4 <= UNI_SUR_LOW_END)
        return UnknownChar;

    return QChar(static_cast<unsigned short>(ucs4));
}

Abdel.




Could we not replace the current implementation of

unsigned short ucs4_to_ucs2(boost::uint32_t c)

with such a inline implementation, because iconv must
in principle do the same.

char_type const UNI_REPLACEMENT_CHAR 0x0000FFFD
char_type const UNI_SUR_HIGH_START 0xD800;
char_type const UNI_SUR_LOW_END 0xDFFF;

unsigned short ucs4_to_ucs2(boost::uint32_t ucs4)
{
     if (ucs4 >= 0xFFFE || (ucs4 >= UNI_SUR_HIGH_START &&  ucs4 <= 
UNI_SUR_LOW_END))
         return UnknownChar;

     return static_cast<unsigned short>(ucs4);
}

compare with
http://www.unicode.org/Public/PROGRAMS/CVTUTF/ConvertUTF.c

I think iconv does it already. See the "Unicode on Mac" thread and links to ucs4internal.h. If I know Lars, he's beavering away benchmarking all these ideas. Give him a little time to do his day job too ;-)

Angus

Reply via email to