On Tue, 10 Apr 2012 16:27:45 -0300 Luiz Capitulino <lcapitul...@redhat.com> wrote:
> On Tue, 10 Apr 2012 13:50:28 -0500 > Michael Roth <mdr...@linux.vnet.ibm.com> wrote: > > > On Tue, Apr 10, 2012 at 01:45:15PM -0300, Luiz Capitulino wrote: > > > On Mon, 9 Apr 2012 17:29:51 -0500 > > > Michael Roth <mdr...@linux.vnet.ibm.com> wrote: > > > > > > > On Mon, Apr 09, 2012 at 04:57:48PM -0300, Luiz Capitulino wrote: > > > > > Hi Michael, > > > > > > > > > > There's a possible race condition in the bios_supports_mode() > > > > > function used > > > > > by all suspend commands in qemu-ga: if the parent process receives a > > > > > SIGCHLD > > > > > while executing: > > > > > > > > > > close(pipefds[1]); > > > > > g_free(pmutils_path); > > > > > > > > > > Or: > > > > > > > > > > ret = read(pipefds[0], &status, sizeof(status)); > > > > > > > > > > Then bad things can happen, like reporting that the guest doesn't > > > > > support > > > > > suspend or a resource leak (a segfault may be possible too). > > > > > > > > > > The easy & obvious way to fix this is just to block SIGCLHD while in > > > > > close() > > > > > and g_free() and to loop read() on EINTR. > > > > > > > > > > However, the real problem here is that the double fork schema used in > > > > > the > > > > > suspend commands got complex. It's this way because I was trying to > > > > > avoid > > > > > causing a conflict with your series that adds support for running > > > > > commands > > > > > in the guest. > > > > > > > > > > The ideal solution is to add an internal API to qemu-ga to create & > > > > > wait for > > > > > child processes, like we discussed during the suspend commands review. > > > > > > > > > > Now, what it's best way to go with this bug? Should I fix it the easy > > > > > way > > > > > or should I wait for the new API (which I believe you're going to > > > > > work on)? > > > > > > > > Was initially planning on using qemu_add_child_watch(), but this may end > > > > up not being workable for w32 so I'm prefer to hold off on trying to > > > > come up with a general solution till I start looking at that. I'm hoping > > > > g_spawn* can save the day there but haven't looked at it too much > > > > detail. > > > > > > > > In the meantime let's go with the "easy way". We should take care to > > > > restore the default SIGCHLD in the 2nd child though to avoid causing > > > > anything we exec() run to with SIGCHLD blocked though, since that may > > > > hang us in certain situations. > > > > > > Another option would be to drop the double fork. This would require us to > > > remove the SIGCHLD handler and call waitpid() in the suspend functions. > > > This > > > would simplify the code a lot and would make the bug go away, but any new > > > command that executes new processes would have to change that. > > > > Unfortunately shutdown relies on it too. It's not so much an issue > > there given qga tends not to exist for much longer afterward, but > > technically the command can fail and lead to accumulated zombies. > > Right. I've considered shutdown too but thought that it would be so unlikely > that it wouldn't matter. But maybe it's better to keep it complex but correct > (instead of simple with the possibility of a bug). > > Actually, it's not so unlikely: suspend wouldn't work correctly after a > failed shutdown. I'm wrong here, we obviously can wait for specific children. So the worse case would be what you said above: zombies.