Eric Blake <ebl...@redhat.com> writes: > On 08/20/2018 06:39 AM, Markus Armbruster wrote: > >> In review of v1, we discussed whether to try matching non-integer >> numbers with redundant leading zero. Doing that tightly in the lexer >> requires duplicating six states. A simpler alternative is to have the >> lexer eat "digit salad" after redundant leading zero: 0[0-9.eE+-]+. >> Your suggestion for hexadecimal numbers is digit salad with different >> digits: [0-9a-fA-FxX]. Another option is their union: [0-9a-fA-FxX+-]. >> Even more radical would be eating anything but whitespace and structural >> characters: [^][}{:, \t\n\r]. That idea pushed to the limit results in >> a two-stage lexer: first stage finds token strings, where a token string >> is a structural character or a sequence of non-structural, >> non-whitespace characters, second stage rejects invalid token strings. >> >> Hmm, we could try to recover from lexical errors more smartly in >> general: instead of ending the JSON error token after the first >> offending character, end it before the first whitespace or structural >> character following the offending character. >> >> I can try that, but I'd prefer to try it in a follow-up patch. > > Indeed, that sounds like a valid approach. So, for this patch, I'm > fine with just accepting ['0' ... '9'], then seeing if the later > smarter-lexing change makes back-to-back non-structural tokens give > saner error messages in general.
I think I'll drop this patch for now. It's not useful enough to apply it now, then revert it when we have the more general error recovery improvement.