On Thu, May 23, 2019 at 12:31:16PM +0100, Alex Bennée wrote: > > Roman Kagan <rka...@virtuozzo.com> writes: > > > I came across the following AB-BA deadlock: > > > > vCPU thread main thread > > ----------- ----------- > > async_safe_run_on_cpu(self, > > async_synic_update) > > ... [cpu hot-add] > > process_queued_cpu_work() > > qemu_mutex_unlock_iothread() > > [grab BQL] > > start_exclusive() cpu_list_add() > > async_synic_update() finish_safe_work() > > qemu_mutex_lock_iothread() cpu_exec_start() > > > > ATM async_synic_update seems to be the only async safe work item that > > grabs BQL. However it isn't quite obvious that it shouldn't; in the > > past there were more examples of this (e.g. > > memory_region_do_invalidate_mmio_ptr). > > > > It looks like the problem is generally in the lack of the nesting rule > > for cpu-exclusive sections against BQL, so I thought I would try to > > address that. This patchset is my feeble attempt at this; I'm not sure > > I fully comprehend all the consequences (rather, I'm sure I don't) hence > > RFC. > > Hmm I think this is an area touched by: > > Subject: [PATCH v7 00/73] per-CPU locks > Date: Mon, 4 Mar 2019 13:17:00 -0500 > Message-Id: <20190304181813.8075-1-c...@braap.org> > > which has stalled on it's path into the tree. Last time I checked it > explicitly handled the concept of work that needed the BQL and work that > didn't.
I'm still trying to get my head around that patchset, but it looks like it changes nothing in regards to cpu-exclusive sections and safe work, so it doesn't make the problem go. > How do you trigger your deadlock? Just hot-pluging CPUs? Yes. The window is pretty narrow so I only saw it once although this test (where the vms are started and stopped and the cpus are plugged in and out) is in our test loop for quite a bit (probably 2+ years). Roman.