I came across the following AB-BA deadlock: vCPU thread main thread ----------- ----------- async_safe_run_on_cpu(self, async_synic_update) ... [cpu hot-add] process_queued_cpu_work() qemu_mutex_unlock_iothread() [grab BQL] start_exclusive() cpu_list_add() async_synic_update() finish_safe_work() qemu_mutex_lock_iothread() cpu_exec_start()
ATM async_synic_update seems to be the only async safe work item that grabs BQL. However it isn't quite obvious that it shouldn't; in the past there were more examples of this (e.g. memory_region_do_invalidate_mmio_ptr). It looks like the problem is generally in the lack of the nesting rule for cpu-exclusive sections against BQL, so I thought I would try to address that. This patchset is my feeble attempt at this; I'm not sure I fully comprehend all the consequences (rather, I'm sure I don't) hence RFC. Roman Kagan (2): cpus-common: nuke finish_safe_work cpus-common: assert BQL nesting within cpu-exclusive sections cpus-common.c | 12 ++++-------- 1 file changed, 4 insertions(+), 8 deletions(-) -- 2.21.0