On Thu, Apr 19, 2012 at 02:36:53PM -0300, Luiz Capitulino wrote: > Michael, > > I'm going to revive this topic one more time. I was working on v2 of my fixes > to the suspend race bugs and really thought that the real problem is that the > code shouldn't be that complex. > > Basically, this series drops the automatic reaper and adds waitpid() calls & > related logic to the suspend and shutdown functions. > > My objective with this RFC is to understand why this can't be done.
Hi Luiz, CC'ing Michael as some of these suggestions might affect his work on libvirt integration. Thanks for taking the time to implement what this would look like this. I think this is a sensible approach in general, the main issue for me is the specific commands currently impacted by such a change: suspend and shutdown. While we do document that these commands may not return a response, the new behavior actually introduces a pathological assurance that they *won't* return a response: shutdown/pm-suspend/echo mem >/sys/power/state never return, or don't return till the guest wakes up. So *every* suspend/shutdown command will result in a timeout or indefinite hang to the user. Honestly I just find the behavior completely unexpected/unintuitive and would prefer to reduce it to being a corner case. As the code stands currently it's actually extremely rare that a response won't be returned before the commands are executed. So that's why we're jumping through hoops atm. It's more a philosophical reason than a technical one. But if we were to do things as you proposed and document things as "this command will *only* show a response on error", I guess maybe I can live with that... in the case of shutdown older clients will start seeing timeouts where they previously expected a "nothing" response. The nothing response never entailed success or failure so this shouldn't break anything that wasn't already broken however... But, I think if we tell users we'll *only* send response on error, we should do our part to *not* send the responses, rather than relying on them having implemented the reset mechanism to throw them away after guest wake-up. What we could do is allow a command to set a flag, after it reaps it's child (in the case of suspend this would be after wake-up, for shutdown it'd basically be a no-op, but worth adding for readability sake), to have qemu-ga not send a response. We'd implement it similarly to how we did ga_set_response_delimited(). Thoughts? > > Thanks. > > qemu-ga.c | 17 +------ > qga/commands-posix.c | 136 > ++++++++++++++++++++++---------------------------- > 2 files changed, 60 insertions(+), 93 deletions(-) >