CCing Paolo, the Memory API maintainer. On Tue, Nov 10, 2020 at 11:14:39AM +0000, Mark Cave-Ayland wrote: > Hi all, > > This email follows on from my investigation of intermittent Travis-CI > failures in make check's device-introspect test when trying to add the patch > at https://lists.gnu.org/archive/html/qemu-devel/2020-10/msg06093.html to my > last qemu-sparc pull request. > > The patch itself seems fairly harmless: moving the sun4u-iommu device as a > QOM child of the sabre PCI host bridge device. So why was "make check" > randomly segfaulting on Travis-CI? > > The hardest part was trying to reproduce the issue to debug it: eventually > after a number of Travis-CI runs I discovered I could generate the same > problem locally if I ran "make check" around 15-20 times in a row, and that > gave me a backtrace that looked like this: > > 0x0000000000614b69 in address_space_init (as=0x16f684d8, > root=0x16f68530, name=0x9a1db2 "iommu-as") at ../softmmu/memory.c:2780 > 2780 QTAILQ_INSERT_TAIL(&address_spaces, as, address_spaces_link); > (gdb) bt > #0 0x0000000000614b69 in address_space_init (as=0x16f684d8, > root=0x16f68530, name=0x9a1db2 "iommu-as") at > ../softmmu/memory.c:2780 > #1 0x00000000005b8f6a in iommu_init (obj=0x16f681c0) at > ../hw/sparc64/sun4u_iommu.c:301 > #2 0x000000000070a997 in object_init_with_type (obj=0x16f681c0, > ti=0x1629fac0) at ../qom/object.c:375 > > With the debugger attached I was able to figure out what was happening: the > sun4u-iommu device creates the iommu-as address space during instance init, > but doesn't have a corresponding instance finalize to remove it which leaves > a dangling pointer in the address_spaces QTAILQ. > > Normally this doesn't matter because IOMMUs are created once during machine > init, but device-introspect-test instantiates sun4u-iommu (and with the > patch sabre also adds it as a child object during instance init) which adds > more dangling pointers to the address_spaces list. Every so often the > dangling pointers end up pointing to memory that gets reused by another QOM > object, eventually causing random segfaults during instance finalize and/or > property iteration. > > There are 2 possible solutions here: 1) ensure QOM objects that add address > spaces during instance init have a corresponding instance finalize function > to remove them or 2) move the creation of address spaces from instance init > to realize. > > Does anyone have any arguments for which solution is preferred?
I'd say (2) is preferred, as we don't expect object_new(T) to have any side effects outside the object instance state. Most address_space_init() calls in the code today seem to be in realize functions. However, I wonder if we could make this simpler (and mistakes less fatal) if we make AddressSpace a QOM child of the device. Paolo, would it be too much overhead to make AddressSpace a QOM object? > As part of this work I hacked up an address_space_count() function in > memory.c that returns the size of the address_spaces QTAILQ and added a > printf() to display the value during instance init and finalize which > demonstrates the problem nicely. This means it should be possible to add a > similar to check to device-introspect-test in future to prevent similar > errors from happening again. > > > ATB, > > Mark. > -- Eduardo