Eric Blake <ebl...@redhat.com> writes: > On 08/08/2018 07:03 AM, Markus Armbruster wrote: >> The JSON parser optionally supports interpolation. The lexer >> recognizes interpolation tokens unconditionally. The parser rejects >> them when interpolation is disabled, in parse_interpolation(). >> However, it neglects to set an error then, which can make >> json_parser_parse() fail without setting an error. >> >> Move the check for unwanted interpolation from the parser's >> parse_interpolation() into the lexer's finite state machine. When >> interpolation is disabled, '%' is now handled like any other >> unexpected character. >> >> The next commit will improve how such lexical errors are handled. >> >> Signed-off-by: Markus Armbruster <arm...@redhat.com> >> --- >> include/qapi/qmp/json-lexer.h | 4 ++-- >> qobject/json-lexer.c | 42 ++++++++++++++++++++++++++--------- >> qobject/json-parser.c | 4 ---- >> qobject/json-streamer.c | 2 +- >> tests/qmp-test.c | 4 ++++ >> 5 files changed, 39 insertions(+), 17 deletions(-) >> > > >> @@ -271,17 +272,38 @@ static const uint8_t json_lexer[][256] = { >> [','] = JSON_COMMA, >> [':'] = JSON_COLON, >> ['a' ... 'z'] = IN_KEYWORD, >> + [' '] = IN_WHITESPACE, >> + ['\t'] = IN_WHITESPACE, >> + ['\r'] = IN_WHITESPACE, >> + ['\n'] = IN_WHITESPACE, >> + }, >> + >> + [IN_START_INTERPOL] = { >> + ['"'] = IN_DQ_STRING, > ... > >> + ['\n'] = IN_WHITESPACE, >> + /* matches IN_START up to here */ >> ['%'] = IN_INTERPOL, > > You could compress this as: > > [IN_START_INTERPOL ... IN_START] = { > ['"'] = ... > ['\n'] = ... > }, > [IN_START_INTERPOL]['%'] = IN_INTERPOL, > > rather than duplicating the common list twice. (We already exploit > gcc's range initialization, and the fact that you can initialize a > broader range and then re-initialize a more specific subset later)
Neat! It'll lose some of its charm in PATCH 52, but enough remains for me to like it. >> +++ b/tests/qmp-test.c >> @@ -94,6 +94,10 @@ static void test_malformed(QTestState *qts) >> /* lexical error: interpolation */ >> qtest_qmp_send_raw(qts, "%%p\n"); >> + /* two errors, one for "%", one for "p" */ >> + resp = qtest_qmp_receive(qts); >> + g_assert_cmpstr(get_error_class(resp), ==, "GenericError"); >> + qobject_unref(resp); >> resp = qtest_qmp_receive(qts); >> g_assert_cmpstr(get_error_class(resp), ==, "GenericError"); >> qobject_unref(resp); > > I'm impressed at how easily you got the lexer to parse two different > token grammars, and agree that doing it in the lexer when we don't > want interpolation is a nicer place. > > Reviewed-by: Eric Blake <ebl...@redhat.com> Well, what you see here isn't my first attempt to make the lexer do it, it's the one I finally liked :) Thanks!