Re: [Qemu-devel] [PATCH 28/56] json: Fix \uXXXX for surrogate pairs

Markus Armbruster Mon, 13 Aug 2018 00:08:54 -0700

Eric Blake <ebl...@redhat.com> writes:

> On 08/08/2018 07:03 AM, Markus Armbruster wrote:
>> The JSON parser treats each half of a surrogate pair as unpaired
>> surrogate.  Fix it to recognize surrogate pairs.
>>
>> Signed-off-by: Markus Armbruster <arm...@redhat.com>
>> ---
>>   qobject/json-parser.c | 16 +++++++++++++++-
>>   tests/check-qjson.c   |  3 +--
>>   2 files changed, 16 insertions(+), 3 deletions(-)
>>
>
>> @@ -168,6 +170,18 @@ static QString *parse_string(JSONParserContext *ctxt, 
>> JSONToken *token)
>>                      cp |= hex2decimal(*ptr);
>>                  }
>> +                if (cp >= 0xD800 && cp <= 0xDBFF && !leading_surrogate
>> +                    && ptr[1] == '\\' && ptr[2] == 'u') {
>> +                    ptr += 2;
>> +                    leading_surrogate = cp;
>> +                    goto hex;
>> +                }
>> +                if (cp >= 0xDC00 && cp <= 0xDFFF && leading_surrogate) {
>> +                    cp &= 0x3FF;
>> +                    cp |= (leading_surrogate & 0x3FF) << 10;
>> +                    cp += 0x010000;
>> +                }
>> +
>>                   if (mod_utf8_encode(utf8_buf, sizeof(utf8_buf), cp) < 0) {
>>                       parse_error(ctxt, token,
>>                                   "\\u%.4s is not a valid Unicode character",
>
> Consider "\\udbff\\udfff" - a valid surrogate pair (in terms of being
> in range), but which decodes to u+10ffff.  Since is_valid_codepoint()
> (part of mod_utf8_encode()) rejects it due to (codepoint & 0xfffe) ==
> 0xfffe, it means we end up printing this error message, but only using
> the second half of the surrogate pair.  Is that okay?


It's not horrible, but I wouldn't call it okay.  I'll try to improve it.

> Otherwise,
> Reviewed-by: Eric Blake <ebl...@redhat.com>

Thanks!

Re: [Qemu-devel] [PATCH 28/56] json: Fix \uXXXX for surrogate pairs

Reply via email to