On 4/24/25 8:32 PM, Andrey Drobyshev wrote:
> Hi all,
> 
> There's a bug in block layer which leads to block graph deadlock.
> Notably, it takes place when blockdev IO is processed within a separate
> iothread.
> 
> This was initially caught by our tests, and I was able to reduce it to a
> relatively simple reproducer.  Such deadlocks are probably supposed to
> be covered in iotests/graph-changes-while-io, but this deadlock isn't.
> 
> Basically what the reproducer does is launches QEMU with a drive having
> 'iothread' option set, creates a chain of 2 snapshots, launches
> block-commit job for a snapshot and then dismisses the job, starting
> from the lower snapshot.  If the guest is issuing IO at the same time,
> there's a race in acquiring block graph lock and a potential deadlock.
> 
> Here's how it can be reproduced:
> 
> [...]
> 

I took a closer look at iotests/graph-changes-while-io, and have managed
to reproduce the same deadlock in a much simpler setup, without a guest.

1. Run QSD:> ./build/storage-daemon/qemu-storage-daemon --object
iothread,id=iothread0 \
>     --blockdev null-co,node-name=node0,read-zeroes=true \                     
>      
>     --nbd-server addr.type=unix,addr.path=/var/run/qsd_nbd.sock \             
>      
>     --export 
> nbd,id=exp0,node-name=node0,iothread=iothread0,fixed-iothread=true,writable=true
>  \
>     --chardev 
> socket,id=qmp-sock,path=/var/run/qsd_qmp.sock,server=on,wait=off \
>     --monitor chardev=qmp-sock
2. Launch IO:
> qemu-img bench -f raw -c 2000000 
> 'nbd+unix:///node0?socket=/var/run/qsd_nbd.sock'

3. Add 2 snapshots and remove lower one (script attached):> while
/bin/true ; do ./rls_qsd.sh ; done

And then it hangs.

I'll also send a patch with corresponding test case added directly to
iotests.

This reproduce seems to be hanging starting from Fiona's commit
67446e605dc ("blockjob: drop AioContext lock before calling
bdrv_graph_wrlock()").  AioContext locks were dropped entirely later on
in Stefan's commit b49f4755c7 ("block: remove AioContext locking"), but
the problem remains.

Andrey

Attachment: rls_qsd.sh
Description: application/shellscript

Reply via email to