Re: [Qemu-devel] [PATCH 25/56] json: Leave rejecting invalid escape sequences to parser

Markus Armbruster Mon, 13 Aug 2018 00:06:32 -0700

Eric Blake <ebl...@redhat.com> writes:

> On 08/08/2018 07:03 AM, Markus Armbruster wrote:
>> Both lexer and parser reject invalid escape sequences in strings.  The
>> parser's check is useless.
>>
>
>>
>> Drop the lexer's escape sequence checking, and make it accept the same
>> characters after '\' it accepts elsewhere in strings.  It now produces
>>
>>      JSON_LCURLY   {
>>      JSON_STRING   "abc\@ijk"
>>      JSON_COLON    :
>>      JSON_INTEGER  1
>>      JSON_RCURLY
>>
>> and the parser reports just
>>
>>      JSON parse error, invalid escape sequence in string
>>
>> While there, fix parse_string()'s inaccurate function comment.
>
> Worthwhile improvement.
>
>>
>> Signed-off-by: Markus Armbruster <arm...@redhat.com>
>> ---
>>   qobject/json-lexer.c  | 72 +++----------------------------------------
>>   qobject/json-parser.c | 56 +++++++++++++++++++--------------
>>   2 files changed, 37 insertions(+), 91 deletions(-)
>
> and shorter!
>
>>       [IN_DQ_STRING_ESCAPE] = {
>> -        ['b'] = IN_DQ_STRING,
>> -        ['f'] =  IN_DQ_STRING,
>> -        ['n'] =  IN_DQ_STRING,
>> -        ['r'] =  IN_DQ_STRING,
>> -        ['t'] =  IN_DQ_STRING,
>> -        ['/'] = IN_DQ_STRING,
>> -        ['\\'] = IN_DQ_STRING,
>> -        ['\''] = IN_DQ_STRING,
>> -        ['\"'] = IN_DQ_STRING,
>> -        ['u'] = IN_DQ_UCODE0,
>> +        [0x20 ... 0xFD] = IN_DQ_STRING,
>
> Among other things, this means the parser now has to flag "\u" as an
> incomplete escape - but your added testsuite coverage earlier in the
> series ensures that we do.


Yes.

>> +++ b/qobject/json-parser.c
>> @@ -106,30 +106,40 @@ static int hex2decimal(char ch)
>>   }
>>     /**
>> - * parse_string(): Parse a json string and return a QObject
>> + * parse_string(): Parse a JSON string
>>    *
>> - *  string
>
>> + * From RFC 7159 "The JavaScript Object Notation (JSON) Data
>> + * Interchange Format":
>> + *
>> + *    char = unescaped /
>> + *        escape (
>> + *            %x22 /          ; "    quotation mark  U+0022
>> + *            %x5C /          ; \    reverse solidus U+005C
>> + *            %x2F /          ; /    solidus         U+002F
>> + *            %x62 /          ; b    backspace       U+0008
>> + *            %x66 /          ; f    form feed       U+000C
>> + *            %x6E /          ; n    line feed       U+000A
>> + *            %x72 /          ; r    carriage return U+000D
>> + *            %x74 /          ; t    tab             U+0009
>> + *            %x75 4HEXDIG )  ; uXXXX                U+XXXX
>> + *    escape = %x5C              ; \
>> + *    quotation-mark = %x22      ; "
>> + *    unescaped = %x20-21 / %x23-5B / %x5D-10FFFF
>> + *
>> + * Extensions over RFC 7159:
>> + * - Extra escape sequence in strings:
>> + *   0x27 (apostrophe) is recognized after escape, too
>> + * - Single-quoted strings:
>> + *   Like double-quoted strings, except they're delimited by %x27
>> + *   (apostrophe) instead of %x22 (quotation mark), and can't contain
>> + *   unescaped apostrophe, but can contain unescaped quotation mark.
>> + *
>> + * Note:
>> + * - Encoding is modified UTF-8.
>
> That is an extension over RFC 7159. But I'm okay with leaving it in
> the Notes section.
>
>> + * - Invalid Unicode characters are rejected.
>> + * - Control characters are rejected by the lexer.
>
> Worth being explicit that this is 00-1f, fe, and ff?

\xFE and \xFF are invalid, not control.

What about:

 * - Invalid Unicode characters are rejected.
 * - Control characters \x00..\x1F are rejected by the lexer.

Re: [Qemu-devel] [PATCH 25/56] json: Leave rejecting invalid escape sequences to parser

Reply via email to