On Mon, 14 Mar 2011 15:18:57 -0500 Anthony Liguori <anth...@codemonkey.ws> wrote:
> On 03/14/2011 02:25 PM, Luiz Capitulino wrote: > > On Fri, 11 Mar 2011 15:00:47 -0600 > > Anthony Liguori<aligu...@us.ibm.com> wrote: > > > >> This is a security consideration. We don't want a client to cause an > >> arbitrary > >> amount of memory to be allocated in QEMU. For now, we use a limit of 64MB > >> which should be large enough for any reasonably sized token. > >> > >> This is important for parsing JSON from untrusted sources. > >> > >> Signed-off-by: Anthony Liguori<aligu...@us.ibm.com> > >> > >> diff --git a/json-lexer.c b/json-lexer.c > >> index 834d7af..3462c89 100644 > >> --- a/json-lexer.c > >> +++ b/json-lexer.c > >> @@ -18,6 +18,8 @@ > >> #include "qemu-common.h" > >> #include "json-lexer.h" > >> > >> +#define MAX_TOKEN_SIZE (64ULL<< 20) > >> + > >> /* > >> * > >> \"([^\\\"]|(\\\"\\'\\\\\\/\\b\\f\\n\\r\\t\\u[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]))*\" > >> * > >> '([^\\']|(\\\"\\'\\\\\\/\\b\\f\\n\\r\\t\\u[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]))*' > >> @@ -312,6 +314,17 @@ static int json_lexer_feed_char(JSONLexer *lexer, > >> char ch) > >> } > >> lexer->state = new_state; > >> } while (!char_consumed); > >> + > >> + /* Do not let a single token grow to an arbitrarily large size, > >> + * this is a security consideration. > >> + */ > >> + if (lexer->token->length> MAX_TOKEN_SIZE) { > >> + lexer->emit(lexer, lexer->token, lexer->state, lexer->x, > >> lexer->y); > >> + QDECREF(lexer->token); > >> + lexer->token = qstring_new(); > >> + lexer->state = IN_START; > >> + } > > Entering an invalid token is an error, we should fail here. > > It's not so clear to me. > > I think of it like GCC. GCC doesn't bail out on the first invalid > character the lexer encounters. Instead, it records that an error has > occurred and tries its best to recover. > > The result is that instead of getting an error message about the first > error in your code, you'll get a long listing of all the mistakes you > made (usually). I'm not a compiler expert, but I think compilers do this because it would be very problematic from a usability perspective to report one error at time: by reporting most errors per run, the compiler gives the user to fix as much as possible programming mistakes w/o having to re-run. Our target are not humans, though. > One thing that makes this more difficult in our case is that when you're > testing, we don't have a clear EOI to flush things out. So a bad > sequence of inputs might make the message parser wait for an a character > that you're not necessarily inputting which makes the session appear > hung. Usually, if you throw a couple extra brackets around, you'll get > back to valid input. Not sure if this is easy to do, but shouldn't we fail if we don't get the expected character? > > Which brings > > two features: > > > > 1. A test code could trigger this condition and check for the specific > > error code > > > > 2. Developers will know when they hit the limit. Although I don't expect > > expect this to happen, there was talking about adding base64 support > > to transfer something (I can't remember what, but we never know how the > > protocol will evolve). > > > > Also, by testing this I found that the parser seems to get confused when > > the limit is reached: it stops responding. > > Actually, it does respond. The lexer just takes an incredibly long time > to process a large token because qstring_append_ch is incredibly slow > :-) If you drop the token size down to 64k instead of 64mb, it'll seem > a lot more reasonable. Actually, I dropped it down to 20 :) This is supposed to work like your 64k suggestion, right? > > Regards, > > Anthony Liguori > > >> + > >> return 0; > >> } > >> > > >