On Mon, 08/21 18:28, Dr. David Alan Gilbert wrote: > > It's not much more than asserting qemu_mutex_iothread_locked(), the problem > > is > > the new monitor thread breaks certain assumptions that was true. > > > > What is interesting in this is that block layer's nested aio_poll() now not > > only > > run in the main thread but also in the monitor thread. Bugs may hide there. > > :) > > > > That's why I suggested a "safe by default" strategy. > > OK, that's going to need some more flags somewhere; we've now > effectively got three types of command: > a) Commands that can only run in the main thread > b) Commands that can run in other monitor threads, but must have the bql > c) Commands that can run in other monitor threads but don't take the > bql > > The class (a) that you point out are a pain; arguably if we have to > split them up then perhaps we should initially only allow (c). > > > One step back, is it possible to "unblock" main thread even upon network > > issue? > > What is the scenario that causes main thread hang? Is there a backtrace? > > There are at least 3 scenarious I know of: > > a) Postcopy: An IO operation takes the lock and accesses guest memory; > the guest memory is missing due to userfault'd memory. > Unfortunately the network connection to the source happens to fail; > so we never receive that page and the thread stays stuck in the > userfault. > We can't issue a recovery command to reopen a network connection > because the monitor is blocked. > b) Postcopy: A monitor command either accesses guest memory or has > to wait on another thread that is doing; e.g. info cpu waits > for the CPU threads to exit the loop, but they might be blocked > waiting on userfault. > c) COLO or migration: The network fails during the critical bit > at the end of migration when we have the bql held. You can't > issue a migration_cancel or a colo-failover via the monitor > because it's blocked.
Thanks for explainaing! What commands are in class (c)? From the cover letter it seems migrate-incoming is the only one in mind, I'm not sure how it resolves any of the three scenarios? > > There are other advantages of being able to do bql'less commands; > things like an 'info status' or the like should be doable without bql, > so just avoding taking the bql when the management layer is doing > stuff (or alternatively getting faster replies on management) > are both useful. Agreed. It is very useful not just for migration. Fam