Am 24.04.2017 um 21:10 hat Markus Armbruster geschrieben: > With 2.9 out of the way, how can we make progress on this one? > > I can see two ways to get asynchronous QMP commands accepted: > > 1. We break QMP compatibility in QEMU 3.0 and convert all long-running > tasks from "synchronous command + event" to "asynchronous command". > > This is design option 1 quoted below. *If* we decide to leave > compatibility behind for 3.0, *and* we decide we like the > asynchronous sufficiently better to put in the work, we can do it. > > I guess there's nothing to do here until we decide on breaking > compatibility in 3.0. > > 2. We don't break QMP compatibility, but we add asynchronous commands > anyway, because we decide that's how we want to do "jobs". > > This is design option 3 quoted below. As I said, I dislike its lack > of orthogonality. But if asynchronous commands help us get jobs > done, I can bury my dislike.
I don't think async commands are attractive at all for doing jobs. I feel they bring up more questions that they answer, for example, what happens if libvirt crashes and then reconnects? Which monitor connection does get the reply for an async command sent on the now disconnected one? We already have a model for doing long-running jobs, and as far as I'm aware, it's working and we're not fighting limitations of the design. So what are we even trying to solve here? In the context of jobs, async commands feel like a solution in need of a problem to me. Things may look a bit different in typically quick, but potentially long-running commands. That is, anything that we currently execute synchronously while holding the BQL, but that involves I/O and could therefore take a while (impacting the performance of the VM) or even block indefinitely. The first problem (we're holding the lock too long) can be addressed by making things async just inside qemu and we don't need to expose the change on the QMP level. The second one (blocking indefinitely) requires being async on the QMP level if we want the monitor to be responsive even if we're using an image on an NFS server that went down. On the other hand, using the traditional job infrastructure is way over the top if all you want to do is 'query-block', so we need something different for making it async. And if a client disconnects, the 'query-block' result can just be thrown away, it's much simpler than actual jobs. So where I can see advantages for a new async command type is not for converting real long-running commands like block jobs, but only for the typically, but not necessarily quick operations. At the same time it is where you're rightfully afraid that the less common case might not receive much testing in management tools. In the end, I'm unsure whether async commands are a good idea, I can see good arguments for both stances. But I'm almost certain that they are the wrong tool for jobs. Kevin