On 08/08/2018 07:03 AM, Markus Armbruster wrote:
The JSON parser treats each half of a surrogate pair as unpaired
surrogate. Fix it to recognize surrogate pairs.
Signed-off-by: Markus Armbruster <arm...@redhat.com>
---
qobject/json-parser.c | 16 +++++++++++++++-
tests/check-qjson.c | 3 +--
2 files changed, 16 insertions(+), 3 deletions(-)
@@ -168,6 +170,18 @@ static QString *parse_string(JSONParserContext *ctxt,
JSONToken *token)
cp |= hex2decimal(*ptr);
}
+ if (cp >= 0xD800 && cp <= 0xDBFF && !leading_surrogate
+ && ptr[1] == '\\' && ptr[2] == 'u') {
+ ptr += 2;
+ leading_surrogate = cp;
+ goto hex;
+ }
+ if (cp >= 0xDC00 && cp <= 0xDFFF && leading_surrogate) {
+ cp &= 0x3FF;
+ cp |= (leading_surrogate & 0x3FF) << 10;
+ cp += 0x010000;
+ }
+
if (mod_utf8_encode(utf8_buf, sizeof(utf8_buf), cp) < 0) {
parse_error(ctxt, token,
"\\u%.4s is not a valid Unicode character",
Consider "\\udbff\\udfff" - a valid surrogate pair (in terms of being in
range), but which decodes to u+10ffff. Since is_valid_codepoint() (part
of mod_utf8_encode()) rejects it due to (codepoint & 0xfffe) == 0xfffe,
it means we end up printing this error message, but only using the
second half of the surrogate pair. Is that okay?
Otherwise,
Reviewed-by: Eric Blake <ebl...@redhat.com>
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org