Paolo Bonzini <pbonz...@redhat.com> writes: > Il 04/02/2013 18:19, Markus Armbruster ha scritto: >> + /* 2 Boundary condition test cases */ >> + /* 2.1 First possible sequence of a certain length */ >> + /* 2.1.5 5 bytes U+200000 */ >> + { >> + "\"\xF8\x88\x80\x80\x80\"", >> + NULL, /* bug: rejected */ >> + "\"\\u8200\\uFFFF\\uFFFF\"", /* bug: want "\"\\uFFFF\"" */ >> + "\xF8\x88\x80\x80\x80", >> + }, >> + /* 2.1.6 6 bytes U+4000000 */ >> + { >> + "\"\xFC\x84\x80\x80\x80\x80\"", >> + NULL, /* bug: rejected */ >> + "\"\\uC100\\uFFFF\\uFFFF\\uFFFF\"", /* bug: want "\"\\uFFFF\"" >> */ >> + "\xFC\x84\x80\x80\x80\x80", >> + }, >> + }, >> + /* 2.2.4 4 bytes U+1FFFFF */ >> + { >> + "\"\xF7\xBF\xBF\xBF\"", >> + NULL, /* bug: rejected */ >> + "\"\\u7FFF\\uFFFF\"", /* bug: want "\"\\uFFFF\"" */ >> + "\xF7\xBF\xBF\xBF", >> + }, >> + /* 2.2.5 5 bytes U+3FFFFFF */ >> + { >> + "\"\xFB\xBF\xBF\xBF\xBF\"", >> + NULL, /* bug: rejected */ >> + "\"\\uBFFF\\uFFFF\\uFFFF\"", /* bug: want "\"\\uFFFF\"" */ >> + "\xFB\xBF\xBF\xBF\xBF", >> + }, >> + /* 2.2.6 6 bytes U+7FFFFFFF */ >> + { >> + "\"\xFD\xBF\xBF\xBF\xBF\xBF\"", >> + NULL, /* bug: rejected */ >> + "\"\\uDFFF\\uFFFF\\uFFFF\\uFFFF\"", /* bug: want "\"\\uFFFF\"" >> */ >> + "\xFD\xBF\xBF\xBF\xBF\xBF", >> + }, >> + { >> + /* \U+1FFFFF */ >> + "\"\xF8\x87\xBF\xBF\xBF\"", >> + NULL, /* bug: rejected */ >> + "\"\\u81FF\\uFFFF\\uFFFF\"", /* bug: want "\"\\uFFFF\"" */ >> + "\xF8\x87\xBF\xBF\xBF", >> + }, >> + { >> + /* \U+3FFFFFF */ >> + "\"\xFC\x83\xBF\xBF\xBF\xBF\"", >> + NULL, /* bug: rejected */ >> + "\"\\uC0FF\\uFFFF\\uFFFF\\uFFFF\"", /* bug: want "\"\\uFFFF\"" >> */ >> + "\xFC\x83\xBF\xBF\xBF\xBF", >> + }, >> + { >> + /* \U+0000 */ >> + "\"\xF8\x80\x80\x80\x80\"", >> + NULL, /* bug: rejected */ >> + "\"\\u8000\\uFFFF\\uFFFF\"", /* bug: want "\"\\u0000\"" */ >> + "\xF8\x80\x80\x80\x80", >> + }, >> + { >> + /* \U+0000 */ >> + "\"\xFC\x80\x80\x80\x80\x80\"", >> + NULL, /* bug: rejected */ >> + "\"\\uC000\\uFFFF\\uFFFF\\uFFFF\"", /* bug: want "\"\\u0000\"" >> */ >> + "\xFC\x80\x80\x80\x80\x80", >> + }, > > Rejecting these is not a bug IMO. Unicode is only defined up to > U+10FFFF. Codepoints above are not valid UTF-8 at all, and in > particular 5/6-byte sequences are never valid UTF-8 (they used to be).
See explanation of bug markers above: + * - bug: rejected + * JSON parser rejects invalid sequence(s) + * We may choose to define this as feature > But there are indeed other bugs... > >> + /* 2.1.4 4 bytes U+10000 */ >> + { >> + "\"\xF0\x90\x80\x80\"", >> + "\xF0\x90\x80\x80", >> + "\"\\u0400\\uFFFF\"", /* bug: want "\"\\uD800\\uDC00\"" */ >> + }, >> + /* U+10FFFF */ >> + "\"\xF4\x8F\xBF\xBF\"", >> + "\xF4\x8F\xBF\xBF", >> + "\"\\u43FF\\uFFFF\"", /* bug: want "\"\\uDBFF\\uDFFF\"" */ >> + }, >> + { >> + /* U+110000 */ >> + "\"\xF4\x90\x80\x80\"", >> + "\xF4\x90\x80\x80", >> + "\"\\u4400\\uFFFF\"", /* bug: want "\"\\uFFFF\"" */ >> + }, > > ...and also some good catches here! In particular U+110000 should be > rejected. Thanks!