On Fri, Mar 16, 2012 at 03:37:10PM +0100, Laszlo Ersek wrote: > Hi, > > we seem to have found a double free in qmp_output_visitor_cleanup(). > Please read the analysis below (that is based on commit e4e6aa14) and > please tell me if you'd like me to write a patch for solution (a) or > solution (b), as described at the bottom. > > Paolo wrote a test case to trigger the problem, I'm attaching it. > > Thank you, > Laszlo > > -------- Original Message -------- > Date: Fri, 16 Mar 2012 13:55:48 +0100 > From: Laszlo Ersek <ler...@redhat.com> > > [...] > > Consider the following example: we want to put an int (I0) in a dict > (D1) in a dict (D0). D0 is the root element. > > When we start D0 with qmp_output_start_struct(), the stack is empty, so > > qmp_output_start_struct() > -> qmp_output_add_obj(D0) > -> qmp_output_push_obj(D0) [1] > -> qmp_output_push(D0) [2] > > [1] After this step completes, we have a single entry (entry#0) in the > stack, and it points to D0. refcnt(D0) equals 1 (still from > qdict_new()). Entry#0 is an *owning* reference to D0 ("aggregation", > contributes to refcounts). > > [2] In this step we push D0 again, so entry#1 will point at D0 too. > refcnt(D0) still equals 1. Entry#1 is only a *navigation* reference > (weak reference etc.), it's only here so that we can continue adding > elements to D0 after we finished building a subtree below -- see step [6]. > > Then we start D1: > > qmp_output_start_struct() > -> qmp_output_add(D1) > -> qdict_put_obj(D0, D1) [3] > -> qmp_output_push(D1) [4] > > [3] refcnt(D1) == 1, from qdict_new(). qdict_put_obj() transfers > ownership of D1 to D0, without changing refcnt(D1), so that still equals > 1. At the end of this step, D1 is *owned* by D0. > > [4] Here we push D1 to the stack, without changing refcounts. Entry#2 is > only a *navigation* reference to D1. > > Then we add I0: > qmp_output_type_int() > -> qmp_output_add(I0) > -> qdict_put_obj(D1, I0) [5] > > [5] In this step we transfer ownership of I0 to D1, utilizing the > entry#2 *navigation* link to find D1. refcnt(I0) == 1, from > qint_from_int(). The status quo is as follows: > > {Figure-1} > > entry#0 --- owns [1] --------------+ > | > v > > entry#1 --- navigates-to [2] ---> D0 (refcnt=1) > > | > owns [3] > | > v > > entry#2 --- navigates-to [4] ---> D1 (refcnt=1) > > | > owns [5] > | > v > > I0 (refcnt=1) > > Then we close D1 and return to D0: > > qmp_output_end_struct() > -> qmp_output_pop() [6] > > [6] In this step we simply throw away (release) entry#2. We could > continue adding elements to D0 via the entry#1 navigation link, that was > set up in step [2]. > > Then we close D0 too: > qmp_output_end_struct() > -> qmp_output_pop() [7] > > We drop entry#1. > > We *never* drop entry #0 during this. After we issued an equal number of > pushes and pops, that is, when "we're done", this is how it looks: > > {Figure-2} > > entry#0 --- owns [1] --------------+ > | > v > > D0 (refcnt=1) > > | > owns [3] > | > v > > D1 (refcnt=1) > > | > owns [5] > | > v > > I0 (refcnt=1) > > > Now, consider running qmp_output_visitor_cleanup() on {Figure-2}. Since > there's only entry#0, we remove it, and issue > > qobject_decref(D0) > -> qdict_destroy_obj(D0) > -> qentry_destroy() > -> qobject_decref(D1) > -> qdict_destroy_obj(D1) > -> qentry_destroy() > -> qobject_decref(I0) > -> qint_destroy_obj(I0) > -> qemu_free(I0) > -> qemu_free(D1) > -> qemu_free(D0) > > Everything's gone. > > Consider instead running qmp_output_visitor_cleanup() on {Figure-1}, > that is, in a state where there have been more qmp_output_start_struct() > calls than qmp_output_end_struct() calls: > > {Figure-1 again} > > entry#0 --- owns [1] --------------+ > | > v > > entry#1 --- navigates-to [2] ---> D0 (refcnt=1) > > | > owns [3] > | > v > > entry#2 --- navigates-to [4] ---> D1 (refcnt=1) > > | > owns [5] > | > v > > I0 (refcnt=1) > > qmp_output_visitor_cleanup()'s loop implements "popping order" (both > qmp_output_pop() and this loop starts removing entries at "tqh_first"). > > We issue a qobject_decref(D1) via entry#2. That kills I0 and D1 (in that > order) for good. Notice that the "owns [3]" link from D0 becomes a > dangling *owning* pointer! > > Then we invoke qobject_decref(D0) via entry#1: > > qobject_decref(D0) > -> qdict_destroy_obj(D0) > -> qentry_destroy() > -> qobject_decref(D1) > -> "--D1->refcnt" -- BOOM > > > -- How to fix this: > > (a) When pushing, increase the refcount (ie. turn stack entries #1, #2, > and so on from navigation links into contributing ownership links). > > (b) Alternatively, in qmp_output_visitor_cleanup(), throw away all > entries except entry#0, and issue a decref on that one alone (because > it's an owning link). > > Notice that qmp_ *input* _visitor_cleanup() does exactly (b): it throws > away the stack at once (the navigation links in the array), and issues a > decref on the separately maintained pointer that *owns* the root object. > > -- Also, you don't have to have "three levels" to trigger the problem. > As long as you have at least one "layer of bracketing imbalance" when > you call cleanup, that is, you have at least entry#0 (owner) and entry#1 > (navigator) on the stack, it will blow up. Entry#1 will kill completely > whatever entry#0 *thinks* it owns. > > -- Why would you call qmp_output_visitor_cleanup() halfway into building > the object tree? Because you run into an error. For example, > qmp_output_type_enum() sets an error when an invalid "int" value is > passed to it (ie. one without a corresponding symbolic name). At that > point you probably want to "abort" your half-done object tree. > > ... In fact this example should provide a good test case. Try to > serialize/send a native (ie. C) struct hierarchy with a deeply nested > *invalid* enum. (In C you can easily set it to any value.) > > > I hope I'm not hallucinating things... :)
Nope, it's definitely there, just triggered it easily with simple test case. I think we've been lucky so far with qmp commands not returning responses that trigger this, but it's definitely a bug. I'm inclined to go with option b), just need to look at it some more before sending out a patch. Thanks for catching this! > > Thanks, > Laszlo > diff --git a/qapi-schema-test.json b/qapi-schema-test.json > index 8c7f9f7..9eae350 100644 > --- a/qapi-schema-test.json > +++ b/qapi-schema-test.json > @@ -8,7 +8,7 @@ > > # for testing nested structs > { 'type': 'UserDefOne', > - 'data': { 'integer': 'int', 'string': 'str' } } > + 'data': { 'integer': 'int', 'string': 'str', '*enum1': 'EnumOne' } } > > { 'type': 'UserDefTwo', > 'data': { 'string': 'str', > diff --git a/test-qmp-output-visitor.c b/test-qmp-output-visitor.c > index 78e5f2d..4e0ecd7 100644 > --- a/test-qmp-output-visitor.c > +++ b/test-qmp-output-visitor.c > @@ -274,6 +274,24 @@ static void > test_visitor_out_struct_nested(TestOutputVisitorData *data, > qapi_free_UserDefNested(ud2); > } > > +static void test_visitor_out_struct_errors(TestOutputVisitorData *data, > + const void *unused) > +{ > + EnumOne bad_values[] = { ENUM_ONE_MAX, -1 }; > + UserDefOne u = { 0 }, *pu = &u; > + Error *errp; > + int i; > + > + for (i = 0; i < ARRAY_SIZE(bad_values) ; i++) { > + errp = NULL; > + u.has_enum1 = true; > + u.enum1 = bad_values[i]; > + visit_type_UserDefOne(data->ov, &pu, "unused", &errp); > + g_assert(error_is_set(&errp) == true); > + error_free(errp); > + } > +} > + > typedef struct TestStructList > { > TestStruct *value; > @@ -444,6 +462,8 @@ int main(int argc, char **argv) > &out_visitor_data, test_visitor_out_struct); > output_visitor_test_add("/visitor/output/struct-nested", > &out_visitor_data, > test_visitor_out_struct_nested); > + output_visitor_test_add("/visitor/output/struct-errors", > + &out_visitor_data, > test_visitor_out_struct_errors); > output_visitor_test_add("/visitor/output/list", > &out_visitor_data, test_visitor_out_list); > output_visitor_test_add("/visitor/output/list-qapi-free",