On 2011-10-06 16:27, Avi Kivity wrote: > On 10/05/2011 08:02 PM, Jan Kiszka wrote: >> > >> > Let's examine a concrete example: a user is debugging a guest, which >> > stops at a breakpoint. Meanwhile a live migration is going on, >> > involving internal stops. When the guest does manage to run for a >> bit, >> > it runs out of disk space, generating a stop, which the management >> agent >> > resolves by allocating more space and issuing a cont. >> > >> > With a counting cont, no matter in what order these events happen, >> > things work out fine. How do they work out with your proposal? >> >> We can enforce stop for temporal reasons (migration/savevm), something >> that overrules user/management initiated stops. > > Migration resume shouldn't overrule user stop.
That's not what I had in mind. Migration stop could overrule user resume. But that discussion is moot as there is no time span where this could happen. Migration just needs to re-enter the original state on error, savevm/loadvm restore what it found on entry. All this is atomic /wrt other agents. > > It's really simple. If any agent wants the system stopped, it's > stopped. Only when no one wants it stopped, it may run. > >> >> BTW, does stop due to migration actually have a window where it accepts >> other commands? I thought that phase is synchronous. Then we would just >> have to implement proper state saving/restoring. > > Save: ++stop_count, restore: --stop_count. > >> >> Anyway, there is no point in lock counting for stop reasons that require >> external synchronization anyway. gdb vs. management stack vs. human >> monitor - nothing is solved by counting the stops, they all can step on >> each other's shoes. > > Please elaborate. Every agent can issue every monitor command. If you have a gdb session running, you don't want the management stack to migrate your VM away or mess with it otherwise. If you try to migrate a machine, you don't want any other agent change its configuration beforehand, adding a device that is not present on the target, etc. > >> Even worse, exposing a counting stop via the user >> interface requires additional interfaces to recover lost or forgotten >> locks. We've discussed this in the past IIRC. >> > > Agree with that. So there's the second proposal: > > vm_stop(unsigned reason) > { > if (!stop_state) { > do_vm_stop(); > } > stop_state |= 1 << reason; > } > > vm_resume(unsigned reason) > { > stop_state &= ~(1 << reason); > if (!stop_state) { > do_vm_resume(); > } > } > > so now each agent is separated from the other. > Stop reasons are orthogonal to agents. BTW, the above model would still require extending the user interface to report pending stop reasons and allow specifying resume reasons. Jan
signature.asc
Description: OpenPGP digital signature