Eric Blake <ebl...@redhat.com> writes: > On 08/27/2018 02:00 AM, Markus Armbruster wrote: >> The lexer fails to end a valid token when the lookahead character is >> beyond '\x7F'. For instance, input >> >> true\xC2\xA2 >> >> produces the tokens >> >> JSON_ERROR true\xC2 >> JSON_ERROR \xA2 >> >> The first token should be >> >> JSON_KEYWORD true >> >> instead. > > As long as we still get a JSON_ERROR in the end.
We do: one for \xC2, and one for \xA2. PATCH 4 will lose the second one. >> The culprit is >> >> #define TERMINAL(state) [0 ... 0x7F] = (state) >> >> It leaves [0x80..0xFF] zero, i.e. IN_ERROR. Has always been broken. > > I wonder if that was done because it was assuming that valid input is > only ASCII, and that any byte larger than 0x7f is invalid except > within the context of a string. Plausible thinko. > But whatever the reason for the > original bug, your fix makes sense. > >> Fix it to initialize the complete array. > > Worth testsuite coverage? Since lookahead bytes > 0x7F are always a parse error, all the bug can do is swallow a TERMINAL() token right before a parse error. The TERMINAL() tokens are JSON_INTEGER, JSON_FLOAT, JSON_KEYWORD, JSON_SKIP, JSON_INTERP. Fairly harmless. In particular, JSON objects get through even when followed by a byte > 0x7F. Of course, test coverage wouldn't hurt regardless. >> Signed-off-by: Markus Armbruster <arm...@redhat.com> >> --- >> qobject/json-lexer.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> > > Reviewed-by: Eric Blake <ebl...@redhat.com> Thanks!