On 08/08/2018 07:03 AM, Markus Armbruster wrote:
For input 0123, the lexer produces the tokens
JSON_ERROR 01
JSON_INTEGER 23
Reporting an error is correct; 0123 is invalid according to RFC 7159.
But the error recovery isn't nice.
Make the finite state machine eat digits before going into the error
state. The lexer now produces
JSON_ERROR 0123
Signed-off-by: Markus Armbruster <arm...@redhat.com>
---
qobject/json-lexer.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
@@ -158,10 +159,14 @@ static const uint8_t json_lexer[][256] = {
/* Zero */
[IN_ZERO] = {
TERMINAL(JSON_INTEGER),
- ['0' ... '9'] = IN_ERROR,
+ ['0' ... '9'] = IN_BAD_ZERO,
['.'] = IN_MANTISSA,
},
+ [IN_BAD_ZERO] = {
+ ['0' ... '9'] = IN_BAD_ZERO,
+ },
+
Should IN_BAD_ZERO also consume '.' and/or 'e' (after all, '01e2 is a
valid C constant, but not a valid JSON literal)? But I think your
choice here is fine (again, add too much, and then the lexer has to
track a lot of state; whereas this minimal addition catches the most
obvious things with little effort).
Reviewed-by: Eric Blake <ebl...@redhat.com>
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org