On 2025/05/22 15:45, Alex Bennée wrote:
Akihiko Odaki <akihiko.od...@daynix.com> writes:

On 2025/05/22 1:42, Alex Bennée wrote:
From: Manos Pitsidianakis <manos.pitsidiana...@linaro.org>
This commit fixes an indefinite hang when using VIRTIO GPU blob
objects
under TCG in certain conditions.
The VIRTIO_GPU_CMD_RESOURCE_MAP_BLOB VIRTIO command creates a
MemoryRegion and attaches it to an offset on a PCI BAR of the
VirtIOGPUdevice. The VIRTIO_GPU_CMD_RESOURCE_UNMAP_BLOB command unmaps
it.
Because virglrenderer commands are not thread-safe they are only
called on the main context and QEMU performs the cleanup in three steps
to prevent a use-after-free scenario where the guest can access the
region after it’s unmapped:
1. From the main context, the region’s field finish_unmapping is
false
     by default, so it sets a variable cmd_suspended, increases the
     renderer_blocked variable, deletes the blob subregion, and unparents
     the blob subregion causing its reference count to decrement.
2. From an RCU context, the MemoryView gets freed, the FlatView gets
     recalculated, the free callback of the blob region
     virtio_gpu_virgl_hostmem_region_free is called which sets the
     region’s field finish_unmapping to true, allowing the main thread
     context to finish replying to the command
3. From the main context, the command is processed again, but this
time
     finish_unmapping is true, so virgl_renderer_resource_unmap can be
     called and a response is sent to the guest.
It happens so that under TCG, if the guest has no timers configured
(and
thus no interrupt will cause the CPU to exit), the RCU thread does not
have enough time to grab the locks and recalculate the FlatView.
That’s not a big problem in practice since most guests will assume a
response will happen later in time and go on to do different things,
potentially triggering interrupts and allowing the RCU context to run.
If the guest waits for the unmap command to complete though, it blocks
indefinitely. Attaching to the QEMU monitor and force quitting the guest
allows the cleanup to continue.
There's no reason why the FlatView recalculation can't occur right
away
when we delete the blob subregion, however. It does not, because when we
create the subregion we set the object as its own parent:
      memory_region_init_ram_ptr(mr, OBJECT(mr), "blob", size, data);
The extra reference is what prevents freeing the memory region
object in
the memory transaction of deleting the subregion.
This commit changes the owner object to the device, which removes
the
extra owner reference in the memory region and causes the MR to be
freed right away in the main context.
Acked-by: Michael S. Tsirkin <m...@redhat.com>
Signed-off-by: Manos Pitsidianakis <manos.pitsidiana...@linaro.org>
Reviewed-by: Alex Bennée <alex.ben...@linaro.org>
Tested-by: Alex Bennée <alex.ben...@linaro.org>
Message-Id: <20250410122643.1747913-3-manos.pitsidiana...@linaro.org>
Cc: qemu-sta...@nongnu.org
---
   hw/display/virtio-gpu-virgl.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/hw/display/virtio-gpu-virgl.c
b/hw/display/virtio-gpu-virgl.c
index 71a7500de9..8fbe4e70cc 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -112,7 +112,7 @@ virtio_gpu_virgl_map_resource_blob(VirtIOGPU *g,
       vmr->g = g;
       mr = g_new0(MemoryRegion, 1);
   -    memory_region_init_ram_ptr(mr, OBJECT(mr), "blob", size,
data);
+    memory_region_init_ram_ptr(mr, OBJECT(g), "blob", size, data);
       memory_region_add_subregion(&b->hostmem, offset, mr);
       memory_region_set_enabled(mr, true);

I suggest dropping this patch for now due to the reason I pointed out
for the first version of this series.

This fixes an actual bug - without it we get a hang.


I understand that but it also introduces a regression; "[PATCH v3 14/20] ui/gtk-gl-area: Remove extra draw call in refresh" is also a similar case.

Ideally such a bug should be fixed without regression, but I understand it is sometimes difficult to do that and postponing the bug resolution until figuring out the correct way does not make sense.

In such a case, a bug should be fixed minimizing the regression and the documentation of the regression should be left in the code.

In particular, this patch can cause use-after-free whether TCG is used or not. Instead, I suggest to avoid freeing memory regions at all on TCG. It will surely leak memory, but won't result in use-after-free at least and the other accelerators will be unaffected.

Regards,
Akihiko Odaki

Reply via email to