Thomas Huth <th...@redhat.com> writes: > On 25/09/15 16:17, Markus Armbruster wrote: >> Thomas Huth <th...@redhat.com> writes: >> >>> On 24/09/15 20:57, Markus Armbruster wrote: >>>> Several devices don't survive object_unref(object_new(T)): they crash >>>> or hang during cleanup, or they leave dangling pointers behind. >>>> >>>> This breaks at least device-list-properties, because >>>> qmp_device_list_properties() needs to create a device to find its >>>> properties. Broken in commit f4eb32b "qmp: show QOM properties in >>>> device-list-properties", v2.1. Example reproducer: >>>> >>>> $ qemu-system-aarch64 -nodefaults -display none -machine none >>>> -S -qmp stdio >>>> {"QMP": {"version": {"qemu": {"micro": 50, "minor": 4, >>>> "major": 2}, "package": ""}, "capabilities": []}} >>>> { "execute": "qmp_capabilities" } >>>> {"return": {}} >>>> { "execute": "device-list-properties", "arguments": { >>>> "typename": "pxa2xx-pcmcia" } } >>>> qemu-system-aarch64: /home/armbru/work/qemu/memory.c:1307: >>>> memory_region_finalize: Assertion `((&mr->subregions)->tqh_first >>>> == ((void *)0))' failed. >>>> Aborted (core dumped) >>>> [Exit 134 (SIGABRT)] >>>> >>>> Unfortunately, I can't fix the problems in these devices right now. >>>> Instead, add DeviceClass member cannot_even_create_with_object_new_yet >>>> to mark them: > ... >>>> static void pxa2xx_pcmcia_register_types(void) >>>> diff --git a/hw/ppc/spapr_rng.c b/hw/ppc/spapr_rng.c >>>> index ed43d5e..e1b115d 100644 >>>> --- a/hw/ppc/spapr_rng.c >>>> +++ b/hw/ppc/spapr_rng.c >>>> @@ -169,6 +169,11 @@ static void spapr_rng_class_init(ObjectClass *oc, >>>> void *data) >>>> dc->realize = spapr_rng_realize; >>>> set_bit(DEVICE_CATEGORY_MISC, dc->categories); >>>> dc->props = spapr_rng_properties; >>>> + >>>> + /* >>>> + * Reason: crashes device-introspect-test for unknown reason. >>>> + */ >>>> + dc->cannot_even_create_with_object_new_yet = true; >>>> } >>> >>> Please don't do that! That breaks the help output from >>> "-device spapr-rng,?" which should help the user to see how to use this >>> device! >> >> Well, device-introspection-test makes qemu crash, with the backtrace >> pointing squarely to this device. Stands to reason that device >> introspection could crash in normal usage, too. Until the crash is >> debugged, we better disable introspection of this device. >> >> I quite agree that disabling introspection hurts users. Just not as >> much as crashes :) >> >>> I tried to debug why this device breaks the test, but the test >>> environment is giving me a hard time ... how do you best hook a gdb into >>> that framework, so you can trace such problems? >>> Anyway, with some trial and error, I found out that it seems like the >>> >>> object_resolve_path_type("", TYPE_SPAPR_RNG, NULL) >>> >>> in spapr_rng_instance_init() is causing the problems. Could it be that >>> object_resolve_path_type is not working with the test environment? >> >> I tried to figure out why this device breaks under this test, but >> couldn't, so I posted with the "for unknown reason" comment. > > I've debugged this now for a while (thanks for the tip with > MALLOC_PERTURB, by the way!) and it seems to me that the problem is in > the macio object than in spapr-rng - the latter is just the victim of > some memory corruption caused by the first one: The > object_resolve_path_type() crashes while trying to go through the macio > object. > > So could you please add the "dc->cannot_even_create_with_object_new_yet > = true;" to macio_class_init() instead? ... that seems to fix the crash > for me, too, and is likely the better place.
Hmm. For most of the devices my patch marks, we have a pretty good idea on what's wrong with them. spapr-rng is among the exceptions. You believe it's actually "the macio object". Which one? "macio" is abstract... You report introspecting "spapr-rng" crashes "while trying to go through the macio object". I wonder how omitting introspection of macio objects (that's what marking them does to this test) could affect the object we're going through when we crash. > Or maybe we could get this also fixed? The problem could be the > memory_region_init(&s->bar, NULL, "macio", 0x80000) in > macio_instance_init() ... is this ok here? Or does this rather have to > go to the realize() function instead? Hmm, does creating and destroying a macio object leave the memory region behind? Paolo, is calling memory_region_init() in an instance_init() method okay? If yes, where should they be destroyed, and how? If no, we should search for the erroneous pattern and mark the offenders. Some more evidence for macio's culpability: valgrind lets me happily introspect spapr-rng as often as I want, but once I introspected macio-newworld, further introspection of spapr-rng throws "Invalid read" errors.