On 14/03/17 18:46 +0000, Jonathan Wakely wrote:
On 13/03/17 19:35 +0000, Jonathan Wakely wrote:
This is a series of patches to fix various bugs in the Unicode
character conversion facets.

Ther first patch fixes a silly < versus <= bug that meant that 0xffff
got written as a surrogate pair instead of as simply 0xff, and an
endianness bug for the internal representation of UTF-16 code units
stored in char32_t or wchar_t values. That's PR 79511.

The second patch fixes some incorrect bitwise operations (because I
confused & and |) and some incorrect limits (because I confused max
and min). That fixes determining the endianness of the external
representation bytes when they start with a Byte OrderMark, and
correctly reports errors on invalid UCS2. It also fixes
wstring_convert so that it reports the number of characters that were
converted prior to an error. That's PR 79980.

The third patch fixes the output of the encoding() and max_length()
member functions on the codecvt facets, because I wasn't correctly
accounting for a BOM or for the differences between UTF-16 and UCS2.

I plan to commit these for all branches, but I'll wait until after GCC
7.1 is released, and fix it for 7.2 instead. These bugs aren't
important enough to rush into trunk now.

One more patch for a problem found by the libc++ testsuite. Now we
pass all the libc++ tests, and we even pass a test that libc++ fails.
With this, I hope our <codecvt> is 100% conforming. Just in time to be
deprecated for C++17 :-)

I've committed these to trunk, on the basis that they're intended to
be backported to all branches anyway (fixing features that are
currently broken in all branches). There's no point waiting if we plan
to commit them anyway, it would just mean doing an extra backport (5,
6, 7 *and* 8).

Backports will be done soon.


Reply via email to