On Mon, 05 Dec 2022 14:59:39 +0000 Alex Bennée <alex.ben...@linaro.org> wrote:
> Jonathan Cameron via <qemu-devel@nongnu.org> writes: > > > On Mon, 5 Dec 2022 10:54:03 +0000 > > Jonathan Cameron via <qemu-devel@nongnu.org> wrote: > > > >> On Sun, 4 Dec 2022 08:23:55 +0100 > >> Thomas Huth <th...@redhat.com> wrote: > >> > >> > On 04/11/2022 07.47, Thomas Huth wrote: > >> > > On 16/06/2022 18.57, Michael S. Tsirkin wrote: > >> > >> From: Jonathan Cameron <jonathan.came...@huawei.com> > >> > >> > >> > >> Emulation of a simple CXL Switch downstream port. > >> > >> The Device ID has been allocated for this use. > >> > >> > >> > >> Signed-off-by: Jonathan Cameron <jonathan.came...@huawei.com> > >> > >> Message-Id: <20220616145126.8002-3-jonathan.came...@huawei.com> > >> > >> Signed-off-by: Michael S. Tsirkin <m...@redhat.com> > >> > >> Reviewed-by: Michael S. Tsirkin <m...@redhat.com> > >> > >> Signed-off-by: Michael S. Tsirkin <m...@redhat.com> > >> > >> --- > >> > >> hw/cxl/cxl-host.c | 43 +++++- > >> > >> hw/pci-bridge/cxl_downstream.c | 249 > >> > >> +++++++++++++++++++++++++++++++++ > >> > >> hw/pci-bridge/meson.build | 2 +- > >> > >> 3 files changed, 291 insertions(+), 3 deletions(-) > >> > >> create mode 100644 hw/pci-bridge/cxl_downstream.c > >> > > > >> > > Hi! > >> > > > >> > > There is a memory problem somewhere in this new device. I can make > >> > > QEMU > >> > > crash by running something like this: > >> > > > >> > > $ MALLOC_PERTURB_=59 ./qemu-system-x86_64 -M x-remote \ > >> > > -display none -monitor stdio > >> > > QEMU 7.1.50 monitor - type 'help' for more information > >> > > (qemu) device_add cxl-downstream > >> > > ./qemu/qom/object.c:1188:5: runtime error: member access within > >> > > misaligned > >> > > address 0x3b3b3b3b3b3b3b3b for type 'struct Object', which requires 8 > >> > > byte > >> > > alignment > >> > > 0x3b3b3b3b3b3b3b3b: note: pointer points here > >> > > <memory cannot be printed> > >> > > Bus error (core dumped) > >> > > > >> > > Could you have a look if you've got some spare minutes? > >> > > >> > Ping! Jonathan, Michael, any news on this bug? > >> > > >> > (this breaks one of my local tests, that's why it's annoying for me) > >> Sorry, my email filters ate your earlier message. > >> > >> Looking into this now. I'll note that it also happens on > >> device_add xio3130-downstream so not specific to this new device. > >> > >> So far all I've managed to do is track it to something rcu related > >> as failing in a call to drain_call_rcu() in qmp_device_add() > >> > >> Will continue digging. > > > > Assuming I'm seeing the same thing... > > > > Problem is g_free() on the PCIBridge windows: > > https://elixir.bootlin.com/qemu/latest/source/hw/pci/pci_bridge.c#L235 > > > > Is called before we get an rcu_call() to flatview_destroy() as a > > result of the final call of flatview_unref() in address_space_set_flatview() > > so we get a use after free. > > > > As to what the fix is... Suggestions welcome! > > It sounds like this is the wrong place to free the value then. I guess > the PCI aliases into &w->alias_io() don't get dealt with until RCU > clean-up time. > > I *think* using g_free_rcu() should be enough to ensure the free occurs > after the rest of the RCU cleanups but maybe you should only be cleaning > up the windows at device unrealize time? Is this a dynamic piece of > memory which gets updated during the lifetime of the device? There is unfortunately code that swaps it for an updated structure in pci_bridge_update_mappings() > > If the memory is being cleared with RCU then the access to the base > pointer should be done with the appropriate qatomic_rcu_[set|read] > functions. > I'm annoyingly snowed under this week with other things, but hopefully can get to in a few days. Note we are looking at an old problem here, just one that's happening for an additional device, not sure if that really affects urgency of fix though. Jonathan