Am 24.01.2019 um 08:48 hat Denis Plotnikov geschrieben: > When there is a Backup Block Job running and shutdown command is sent to > a guest, the guest crushes due to assert(!bs->walking_aio_notifiers). > > Call stack: > > 0 __GI_raise > 1 __GI_abort > 2 __assert_fail_base > 3 __GI___assert_fail > 4 bdrv_detach_aio_context (bs=0x55f54d65c000) <<< > 5 bdrv_detach_aio_context (bs=0x55f54fc8a800) > 6 bdrv_set_aio_context (bs=0x55f54fc8a800, ...) > 7 block_job_attached_aio_context > 8 bdrv_attach_aio_context (bs=0x55f54d65c000, ...) <<< > 9 bdrv_set_aio_context (bs=0x55f54d65c000) > 10 blk_set_aio_context > 11 virtio_blk_data_plane_stop > 12 virtio_bus_stop_ioeventfd > 13 virtio_vmstate_change > 14 vm_state_notify (running=0, state=RUN_STATE_SHUTDOWN) > 15 do_vm_stop (state=RUN_STATE_SHUTDOWN, send_stop=true) > 16 vm_stop (state=RUN_STATE_SHUTDOWN) > 17 main_loop_should_exit > 18 main_loop > 19 main > > This happens because of "new" context attachment to VM disk bds. > When attaching a new context the corresponding aio context handler is > called for each of aio_notifiers registered on the VM disk bds context. > Among those handlers there is the block_job_attached_aio_context handler > which sets a new aio context for the block job bds. When doing so, > the old context is detached from all the block job bds children and one of > them is the VM disk bds, serving as backing store for the blockjob bds, > although the VM disk bds is actually the initializer of that process. > Since the VM disk bds is protected with walking_aio_notifiers flag > from double processing in recursive calls, the assert fires. > > The patch fixes the problem by skipping the bds-es in recursive calls > which have started attachment/detachment already. > > Signed-off-by: Denis Plotnikov <dplotni...@virtuozzo.com>
I'm not sure if this fix is actually correct in more than just your special case. It assumes without checking that the nested attach is for the same context as the outer attach. (Of course, I'm not sure what you would do if they were for different contexts! It's a case to avoid, and it should fail very visibly instead of moving nodes to unexpected contexts.) bdrv_set_aio_context() is pretty messy, so it may still turn out that leaving the greater problem unadressed and just fixing what breaks in practice is what we need to do. But did you consider just forbidding nested bdrv_set_aio_context() instead? Essentially, instead of your checks you would assert !walking_aio_notifiers. This would simplify the interface instead of adding more complexity. If I understand the code correctly, at the time of the nested call, the respective nodes have already been moved to the new context, so avoiding the nested calls might be as easy as adding a shortcut to bdrv_set_aio_context() for this case: if (ctx == new_context) { return; } Most importantly, though, please write a test case for this. We will probably need to rework bdrv_set_aio_context() fundamentally (as Berto recently noticed), and we will want to avoid regressing on this bug. Kevin