Re: [PATCH v3 2/4] block-backend: fix edge case in bdrv_next() where BDS associated to BB changes

Fiona Ebner Tue, 04 Jun 2024 00:58:57 -0700

Am 03.06.24 um 18:21 schrieb Kevin Wolf:
> Am 03.06.2024 um 16:17 hat Fiona Ebner geschrieben:
>> Am 26.03.24 um 13:44 schrieb Kevin Wolf:
>>>
>>> The fix for bdrv_flush_all() is probably to make it bdrv_co_flush_all()
>>> with a coroutine wrapper so that the graph lock is held for the whole
>>> function. Then calling bdrv_co_flush() while iterating the list is safe
>>> and doesn't allow concurrent graph modifications.
>>
>> The second is that iotest 255 ran into an assertion failure upon QMP 'quit':
>>
>>> ../block/graph-lock.c:113: bdrv_graph_wrlock: Assertion 
>>> `!qemu_in_coroutine()' failed.
>>
>> Looking at the backtrace:
>>
>>> #5  0x0000762a90cc3eb2 in __GI___assert_fail
>>>     (assertion=0x5afb07991e7d "!qemu_in_coroutine()", file=0x5afb07991e00 
>>> "../block/graph-lock.c", line=113, function=0x5afb07991f20 
>>> <__PRETTY_FUNCTION__.4> "bdrv_graph_wrlock")
>>>     at ./assert/assert.c:101
>>> #6  0x00005afb07585311 in bdrv_graph_wrlock () at ../block/graph-lock.c:113
>>> #7  0x00005afb07573a36 in blk_remove_bs (blk=0x5afb0af99420) at 
>>> ../block/block-backend.c:901
>>> #8  0x00005afb075729a7 in blk_delete (blk=0x5afb0af99420) at 
>>> ../block/block-backend.c:487
>>> #9  0x00005afb07572d88 in blk_unref (blk=0x5afb0af99420) at 
>>> ../block/block-backend.c:547
>>> #10 0x00005afb07572fe8 in bdrv_next (it=0x762a852fef00) at 
>>> ../block/block-backend.c:618
>>> #11 0x00005afb0758cd65 in bdrv_co_flush_all () at ../block/io.c:2347
>>> #12 0x00005afb0753ba37 in bdrv_co_flush_all_entry (opaque=0x7ffff12c6050) 
>>> at block/block-gen.c:1391
>>> #13 0x00005afb0773bf41 in coroutine_trampoline (i0=168365184, i1=23291)
>>
>> So I guess calling bdrv_next() is not safe from a coroutine, because
>> the function doing the iteration could end up being the last thing to
>> have a reference for the BB.
> 
> Does your bdrv_co_flush_all() take the graph (reader) lock? If so, this
> is surprising, because while we hold the graph lock, no reference should
> be able to go away - you need the writer lock for that and you won't get
> it as long as bdrv_co_flush_all() locks the graph. So whatever had a
> reference before the bdrv_next() loop must still have it now. Do you
> know where it gets dropped?
>


AFAICT, yes, it does hold the graph reader lock. The generated code is:

> static void coroutine_fn bdrv_co_flush_all_entry(void *opaque)
> {
>     BdrvFlushAll *s = opaque;
> 
>     bdrv_graph_co_rdlock();
>     s->ret = bdrv_co_flush_all();
>     bdrv_graph_co_rdunlock();
>     s->poll_state.in_progress = false;
> 
>     aio_wait_kick();
> }

Apparently when the mirror job is aborted/exits, which can happen during
the polling for bdrv_co_flush_all_entry(), a reference can go away
without the write lock (at least my breakpoints didn't trigger) being held:

> #0  blk_unref (blk=0x5cdefe943d20) at ../block/block-backend.c:537
> #1  0x00005cdefb26697e in mirror_exit_common (job=0x5cdefeb53000) at 
> ../block/mirror.c:710
> #2  0x00005cdefb263575 in mirror_abort (job=0x5cdefeb53000) at 
> ../block/mirror.c:823
> #3  0x00005cdefb2248a6 in job_abort (job=0x5cdefeb53000) at ../job.c:825
> #4  0x00005cdefb2245f2 in job_finalize_single_locked (job=0x5cdefeb53000) at 
> ../job.c:855
> #5  0x00005cdefb223852 in job_completed_txn_abort_locked (job=0x5cdefeb53000) 
> at ../job.c:958
> #6  0x00005cdefb223714 in job_completed_locked (job=0x5cdefeb53000) at 
> ../job.c:1065
> #7  0x00005cdefb224a8b in job_exit (opaque=0x5cdefeb53000) at ../job.c:1088
> #8  0x00005cdefb4134fc in aio_bh_call (bh=0x5cdefe7487c0) at 
> ../util/async.c:171
> #9  0x00005cdefb4136ce in aio_bh_poll (ctx=0x5cdefd9cd750) at 
> ../util/async.c:218
> #10 0x00005cdefb3efdfd in aio_poll (ctx=0x5cdefd9cd750, blocking=true) at 
> ../util/aio-posix.c:722
> #11 0x00005cdefb20435e in bdrv_poll_co (s=0x7ffe491621d8) at 
> ../block/block-gen.h:43
> #12 0x00005cdefb206a33 in bdrv_flush_all () at block/block-gen.c:1410
> #13 0x00005cdefae5c8ed in do_vm_stop (state=RUN_STATE_SHUTDOWN, 
> send_stop=false)
>     at ../system/cpus.c:297
> #14 0x00005cdefae5c850 in vm_shutdown () at ../system/cpus.c:308
> #15 0x00005cdefae6d892 in qemu_cleanup (status=0) at ../system/runstate.c:871
> #16 0x00005cdefb1a7e78 in qemu_default_main () at ../system/main.c:38
> #17 0x00005cdefb1a7eb8 in main (argc=34, argv=0x7ffe491623a8) at 
> ../system/main.c:48

Looking at the code in mirror_exit_common(), it doesn't seem to acquire
a write lock:

>     bdrv_graph_rdunlock_main_loop();
> 
>     /*
>      * Remove target parent that still uses BLK_PERM_WRITE/RESIZE before
>      * inserting target_bs at s->to_replace, where we might not be able to get
>      * these permissions.
>      */
>     blk_unref(s->target);
>     s->target = NULL;

The write lock is taken in blk_remove_bs() when the refcount drops to 0
and the BB is actually removed:

>     bdrv_graph_wrlock();
>     bdrv_root_unref_child(root);
>     bdrv_graph_wrunlock();

Best Regards,
Fiona

Re: [PATCH v3 2/4] block-backend: fix edge case in bdrv_next() where BDS associated to BB changes

Reply via email to