Re: [Qemu-devel] [PATCH 39/56] json: Leave rejecting invalid interpolation to parser

Eric Blake Mon, 13 Aug 2018 09:12:44 -0700

On 08/08/2018 07:03 AM, Markus Armbruster wrote:

Both lexer and parser reject invalid interpolation specifications.
The parser's check is useless.


The lexer ends the token right after the first bad character.  This
tends to lead to suboptimal error reporting.  For instance, input

     [ %11d ]

With no context of other characters on this line, it took me a while tonotice that this was '1' (one) not 'l' (ell), even though my currentfont renders the two quite distinctly. (And now you know why base32avoids 0/1 in its alphabet, to minimize confusion with O/l - see RFC4648). A better example might be %22d.


produces the tokens

     JSON_LSQUARE  [
     JSON_ERROR    %1
     JSON_INTEGER  1

And that's in spite of the context you have here, which makes it obviousthat the parser saw an integer.

     JSON_KEYWORD  d
     JSON_RSQUARE  ]

The parser then yields an error, an object and two more errors:

     error: Invalid JSON syntax
     object: 1
     error: JSON parse error, invalid keyword
     error: JSON parse error, expecting value

Change the lexer to accept [A-Za-z0-9]*[duipsf].  It now produces


That regex doesn't match the code.


     JSON_LSQUARE  [
     JSON_INTERPOLATION %11d
     JSON_RSQUARE  ]

and the parser reports just

     JSON parse error, invalid interpolation '%11d'

Signed-off-by: Markus Armbruster <arm...@redhat.com>
---
  qobject/json-lexer.c  | 52 +++++++++----------------------------------
  qobject/json-parser.c |  1 +
  2 files changed, 11 insertions(+), 42 deletions(-)

diff --git a/qobject/json-lexer.c b/qobject/json-lexer.c
index 0ea1eae4aa..7a82aab88b 100644
--- a/qobject/json-lexer.c
+++ b/qobject/json-lexer.c
@@ -93,7 +93,8 @@
   *   (apostrophe) instead of %x22 (quotation mark), and can't contain
   *   unescaped apostrophe, but can contain unescaped quotation mark.
   * - Interpolation:
- *   interpolation = %((l|ll|I64)[du]|[ipsf])
+ *   The lexer accepts [A-Za-z0-9]*, and leaves rejecting invalid ones
+ *   to the parser.

This comment is more apropos. But is it worth spelling "The lexeraccepts %[A-Za-z0-9]*", and/or documenting that recognizinginterpolation during lexing is now optional (thanks to the previous patch)?

@@ -278,6 +238,14 @@ static const uint8_t json_lexer[][256] =  {
          ['\n'] = IN_WHITESPACE,
      },

+ /* interpolation */

+    [IN_INTERPOL] = {
+        TERMINAL(JSON_INTERPOL),
+        ['A' ... 'Z'] = IN_INTERPOL,
+        ['a' ... 'z'] = IN_INTERPOL,
+        ['0' ... '9'] = IN_INTERPOL,
+    },
+

Note that we still treat code like "'foo': %#x" as an invalidinterpolation request (it would be a valid printf format), while at thesame time passing "%1" on to the parser (which is not a valid printfformat, so -Wformat would have flagged it). In fact, "%d1" which isvalid for printf as "%d" followed by a literal "1" gets thrown to theparser as one chunk - but it is not valid in JSON to have %d adjacent toanother integer any more than it is to reject "%d1" as an unknownsequence. I don't think that catering to the remaining printfmetacharacters (' ', '#', '\'', '-', '*', '+') nor demanding that thingsend on a letter is worth the effort, since it just makes the lexer morecomplicated when your goal was to do as little as possible to get thingsthrown over to the parser.

So even though your lexing is now somewhat different from printf, Idon't see that as a serious drawback (the uses we care about are still-Wformat clean, and the uses we don't like are properly handled in theparser).

Perhaps you want to enhance the testsuite (earlier in the series) tocover "%22d", "%d1", "%1" as various unaccepted interpolation requests(if you didn't already).


Reviewed-by: Eric Blake <ebl...@redhat.com>

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-devel] [PATCH 39/56] json: Leave rejecting invalid interpolation to parser

Reply via email to