On Thu, 19 Apr 2012 13:57:48 -0500 Michael Roth <mdr...@linux.vnet.ibm.com> wrote:
> On Thu, Apr 19, 2012 at 02:36:53PM -0300, Luiz Capitulino wrote: > > Michael, > > > > I'm going to revive this topic one more time. I was working on v2 of my > > fixes > > to the suspend race bugs and really thought that the real problem is that > > the > > code shouldn't be that complex. > > > > Basically, this series drops the automatic reaper and adds waitpid() calls & > > related logic to the suspend and shutdown functions. > > > > My objective with this RFC is to understand why this can't be done. > > Hi Luiz, > > CC'ing Michael as some of these suggestions might affect his work on > libvirt integration. I forgot to CC him and Eric. > Thanks for taking the time to implement what this would look like > this. I think this is a sensible approach in general, the main issue > for me is the specific commands currently impacted by such a change: > suspend and shutdown. > > While we do document that these commands may not return a response, > the new behavior actually introduces a pathological assurance that > they *won't* return a response: > shutdown/pm-suspend/echo mem >/sys/power/state never return, or don't > return till the guest wakes up. Yes, but that can be viewed as an implementation detail because this is also possible with the current code (ie. if the suspend process manages to suspend the guest before qemu-ga is able to emit a response). > So *every* suspend/shutdown command will result in a timeout or > indefinite hang to the user. Honestly I just find the behavior > completely unexpected/unintuitive and would prefer to reduce it to > being a corner case. As the code stands currently it's actually > extremely rare that a response won't be returned before the commands > are executed. I don't think we can assume this is a corner case. It all depends on how the two processes are scheduled. On an idle VM it's more likely that qemu-ga will emit the response first because the suspend process does more I/O, but that really depends on the scheduler/system load. This is also true for the shutdown command. As we have discussed with Anthony on irc months ago, the client can't expect a response from this command and should reset the connection before sending new commands when the VM resumes. If this turns out to be a problem for libvirt, I'd say it's a libvirt bug. > So that's why we're jumping through hoops atm. It's more a > philosophical reason than a technical one. > > But if we were to do things as you proposed and document things as > "this command will *only* show a response on error", I guess maybe I > can live with that... in the case of shutdown older clients will > start seeing timeouts where they previously expected a "nothing" > response. The nothing response never entailed success or failure so > this shouldn't break anything that wasn't already broken however... Yes, and again, depending on the VM load it's possible that the VM vanishes before qemu-ga emits its response. > But, I think if we tell users we'll *only* send response on error, > we should do our part to *not* send the responses, rather than relying > on them having implemented the reset mechanism to throw them away after > guest wake-up. What we could do is allow a command to set a flag, after > it reaps it's child (in the case of suspend this would be after > wake-up, for shutdown it'd basically be a no-op, but worth adding > for readability sake), to have qemu-ga not send a response. We'd > implement it similarly to how we did ga_set_response_delimited(). Fine with me. Stating that "do not wait for an OK response, because none will be sent" sounds clearer than "an OK response may, or may not be emitted. Or it may be emitted when the VM resumes".