Paolo Bonzini <pbonz...@redhat.com> writes: > On 19/11/2015 16:29, Markus Armbruster wrote: >> Commit 29c75dd "json-streamer: limit the maximum recursion depth and >> maximum token count" attempts to guard against excessive heap usage by >> limiting total token size (it says "token count", but that's a lie). >> >> Total token size is a rather imprecise predictor of heap usage: many >> small tokens use more space than few large tokens with the same input >> size, because there's a constant per-token overhead. >> >> Tighten this up: limit the token count to 128Ki. >> >> If you think 128Ki is too stingy: check-qjson's large_dict test eats a >> sweet 500MiB on my machine to parse ~100K tokens. > > How much of this is freed before the start of the parse? > >> Signed-off-by: Markus Armbruster <arm...@redhat.com> >> Reviewed-by: Eric Blake <ebl...@redhat.com> >> --- >> qobject/json-streamer.c | 2 ++ >> 1 file changed, 2 insertions(+) >> >> diff --git a/qobject/json-streamer.c b/qobject/json-streamer.c >> index 2bd22a7..8752834 100644 >> --- a/qobject/json-streamer.c >> +++ b/qobject/json-streamer.c >> @@ -19,6 +19,7 @@ >> #include "qapi/qmp/json-streamer.h" >> >> #define MAX_TOKEN_SIZE (64ULL << 20) >> +#define MAX_TOKEN_COUNT (128ULL << 10) >> #define MAX_NESTING (1ULL << 10) >> >> static void json_message_process_token(JSONLexer *lexer, QString *token, >> JSONTokenType type, int x, int y) >> @@ -64,6 +65,7 @@ static void json_message_process_token(JSONLexer *lexer, >> QString *token, JSONTok >> parser->bracket_count == 0)) { >> goto out_emit; >> } else if (parser->token_size > MAX_TOKEN_SIZE || >> + qlist_size(parser->tokens) > MAX_TOKEN_COUNT || > > This is O(n^2). I'd rather skip this patch, fix the memory hog and > possibly decrease MAX_TOKEN_SIZE a bit.
I can't see the square right now. Anyway, O() is unchanged by my patch, only n is more limited. See also commit 65c0f1e, which claims to have fixed the quadradic behavior. Regardless, the memory consumption is still ridiculous, and we certainly need to fix that. Whether we can have one for 2.5 I don't know. Even with a fix, this patch has some value. As explained in the commit message, "total token size is a rather imprecise predictor of heap usage: many small tokens use more space than few large tokens with the same input size, because there's a constant per-token overhead." As long as we limit input to limit memory use, we better use a limit that's connected to actual memory use with reasonable accuracy This patch improves accuracy. >> parser->bracket_count + parser->brace_count > MAX_NESTING) { >> /* Security consideration, we limit total memory allocated per >> object >> * and the maximum recursion depth that a message can force.