Re: [Qemu-devel] Re: [PATCH 09/11] json-lexer: limit the maximum size of a given token

Luiz Capitulino Mon, 14 Mar 2011 13:33:36 -0700

On Mon, 14 Mar 2011 15:18:57 -0500
Anthony Liguori <anth...@codemonkey.ws> wrote:


> On 03/14/2011 02:25 PM, Luiz Capitulino wrote:
> > On Fri, 11 Mar 2011 15:00:47 -0600
> > Anthony Liguori<aligu...@us.ibm.com>  wrote:
> >
> >> This is a security consideration.  We don't want a client to cause an 
> >> arbitrary
> >> amount of memory to be allocated in QEMU.  For now, we use a limit of 64MB
> >> which should be large enough for any reasonably sized token.
> >>
> >> This is important for parsing JSON from untrusted sources.
> >>
> >> Signed-off-by: Anthony Liguori<aligu...@us.ibm.com>
> >>
> >> diff --git a/json-lexer.c b/json-lexer.c
> >> index 834d7af..3462c89 100644
> >> --- a/json-lexer.c
> >> +++ b/json-lexer.c
> >> @@ -18,6 +18,8 @@
> >>   #include "qemu-common.h"
> >>   #include "json-lexer.h"
> >>
> >> +#define MAX_TOKEN_SIZE (64ULL<<  20)
> >> +
> >>   /*
> >>    * 
> >> \"([^\\\"]|(\\\"\\'\\\\\\/\\b\\f\\n\\r\\t\\u[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]))*\"
> >>    * 
> >> '([^\\']|(\\\"\\'\\\\\\/\\b\\f\\n\\r\\t\\u[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]))*'
> >> @@ -312,6 +314,17 @@ static int json_lexer_feed_char(JSONLexer *lexer, 
> >> char ch)
> >>           }
> >>           lexer->state = new_state;
> >>       } while (!char_consumed);
> >> +
> >> +    /* Do not let a single token grow to an arbitrarily large size,
> >> +     * this is a security consideration.
> >> +     */
> >> +    if (lexer->token->length>  MAX_TOKEN_SIZE) {
> >> +        lexer->emit(lexer, lexer->token, lexer->state, lexer->x, 
> >> lexer->y);
> >> +        QDECREF(lexer->token);
> >> +        lexer->token = qstring_new();
> >> +        lexer->state = IN_START;
> >> +    }
> > Entering an invalid token is an error, we should fail here.
> 
> It's not so clear to me.
> 
> I think of it like GCC.  GCC doesn't bail out on the first invalid 
> character the lexer encounters.  Instead, it records that an error has 
> occurred and tries its best to recover.
> 
> The result is that instead of getting an error message about the first 
> error in your code, you'll get a long listing of all the mistakes you 
> made (usually).

I'm not a compiler expert, but I think compilers do this because it would be
very problematic from a usability perspective to report one error at time:
by reporting most errors per run, the compiler gives the user to fix as much
as possible programming mistakes w/o having to re-run.

Our target are not humans, though.

> One thing that makes this more difficult in our case is that when you're 
> testing, we don't have a clear EOI to flush things out.  So a bad 
> sequence of inputs might make the message parser wait for an a character 
> that you're not necessarily inputting which makes the session appear 
> hung.  Usually, if you throw a couple extra brackets around, you'll get 
> back to valid input.

Not sure if this is easy to do, but shouldn't we fail if we don't get
the expected character?

> >   Which brings
> > two features:
> >
> >   1. A test code could trigger this condition and check for the specific
> >      error code
> >
> >   2. Developers will know when they hit the limit. Although I don't expect
> >      expect this to happen, there was talking about adding base64 support
> >      to transfer something (I can't remember what, but we never know how the
> >      protocol will evolve).
> >
> > Also, by testing this I found that the parser seems to get confused when
> > the limit is reached: it stops responding.
> 
> Actually, it does respond.  The lexer just takes an incredibly long time 
> to process a large token because qstring_append_ch is incredibly slow 
> :-)  If you drop the token size down to 64k instead of 64mb, it'll seem 
> a lot more reasonable.

Actually, I dropped it down to 20 :) This is supposed to work like your 64k
suggestion, right?

> 
> Regards,
> 
> Anthony Liguori
> 
> >> +
> >>       return 0;
> >>   }
> >>
> >
>

Re: [Qemu-devel] Re: [PATCH 09/11] json-lexer: limit the maximum size of a given token

Reply via email to