Abdelrazak Younes wrote: > Peter Kümmel wrote: >> Peter Kümmel wrote: >> >>> for values which are not surrogates "if (ch >= UNI_SUR_HIGH_START && >>> ch <= UNI_SUR_LOW_END)" (2047 values) >> >> read: only 2047 of the 65535 values are not allowed, and for the rest >> a cast transforms from utf32 to utf16. > > I think QChar will automatically replace those with interrogation marks > anyway. > > But I could also check for these values explicitely in my conversion > routine and return this '?' characters for those unknown characters: > > char_type const UNI_SUR_HIGH_START 0xD800; > char_type const UNI_SUR_LOW_END 0xDFFF; > > QChar const UnknownChar(...); > > QChar const ucs4_to_qchar(char_type const & ucs4) > { > if (ucs4 >= 0xFFFE > || (ucs4 >= UNI_SUR_HIGH_START && ucs4 <= UNI_SUR_LOW_END) > return UnknownChar; > > return QChar(static_cast<unsigned short>(ucs4)); > } > > Abdel. > >
Could we not replace the current implementation of unsigned short ucs4_to_ucs2(boost::uint32_t c) with such a inline implementation, because iconv must in principle do the same. char_type const UNI_REPLACEMENT_CHAR 0x0000FFFD char_type const UNI_SUR_HIGH_START 0xD800; char_type const UNI_SUR_LOW_END 0xDFFF; unsigned short ucs4_to_ucs2(boost::uint32_t ucs4) { if (ucs4 >= 0xFFFE || (ucs4 >= UNI_SUR_HIGH_START && ucs4 <= UNI_SUR_LOW_END)) return UnknownChar; return static_cast<unsigned short>(ucs4); } compare with http://www.unicode.org/Public/PROGRAMS/CVTUTF/ConvertUTF.c Peter