On Thu, 20 May 2010 10:50:41 -0500 Anthony Liguori <anth...@codemonkey.ws> wrote:
> On 05/20/2010 10:16 AM, Paolo Bonzini wrote: > > On 05/20/2010 03:44 PM, Luiz Capitulino wrote: > >> I think there's another issue in the handling of strings. > >> > >> The spec says that valid unescaped chars are in the following range: > >> > >> unescaped = %x20-21 / %x23-5B / %x5D-10FFFF > > That's a spec bug IMHO. Tab is %x09. Surely you can include tabs in > strings. Any parser that didn't accept that would be broken. Honestly, I had the impression this should be encoded as: %x5C %x74, but if you're right, wouldn't this be true for other sequences as well? > >> > >> But we do: > >> > >> [IN_DQ_STRING] = { > >> [1 ... 0xFF] = IN_DQ_STRING, > >> ['\\'] = IN_DQ_STRING_ESCAPE, > >> ['"'] = IN_DONE_STRING, > >> }, > >> > >> Shouldn't we cover 0x20 .. 0xFF instead? > > > > If it's the lexer, isn't just it being liberal in what it accepts? > > I believe the parser correctly rejects invalid UTF-8 sequences. Will check.