On Wed, 14 Dec 2011 12:06:13 -0600 Michael Roth <mdr...@linux.vnet.ibm.com> wrote:
> On 12/14/2011 10:38 AM, Luiz Capitulino wrote: > > On Wed, 14 Dec 2011 09:54:29 -0600 > > Michael Roth<mdr...@linux.vnet.ibm.com> wrote: > > > >> On 12/14/2011 07:00 AM, Luiz Capitulino wrote: > >>> On Tue, 13 Dec 2011 14:03:08 -0600 > >>> Michael Roth<mdr...@linux.vnet.ibm.com> wrote: > >>> > >>>> On 12/13/2011 12:28 PM, Luiz Capitulino wrote: > >>>>> It supports two modes: "hibernate" (which corresponds to S4) and > >>>>> "sleep" (which corresponds to S3). It will try to execute the > >>>>> pm-hibernate or pm-suspend scripts, if the scripts don't exist > >>>>> the command will try to suspend by directly writing to the > >>>>> "/sys/power/state" file. > >>>>> > >>>>> An interesting implementation detail is how to cleanup the child's > >>>>> status on termination, so that we don't create zombies. I've > >>>>> choosen to ignore the SIGCHLD signal. This will cause the kernel to > >>>>> automatically cleanup the child's status on its termination. > >>>> > >>>> One downside to blocking SIGCHLD is it can screw with child processes > >>>> that utilize it. `sudo` for instance will just hang indefinitely because > >>>> it handles it's own cleanup via SIGCHLD, we might run into similar cases > >>>> with various pm-hibernate/pm-suspend implementations as well. > >>>> > >>>> This will also screw with anything we launch via guest-exec as well, > >>>> once that goes in. > >>>> > >>>> I wonder if we can tie the qemu_add_child_watch() stuff into qemu-ga's > >>>> main loop, or maybe just implement something similar... > >>>> > >>>> Basically: > >>>> > >>>> - add a qemu-ga.c:signal_channel_add() that creates a non-blocking > >>>> pipe and associate the read-side with a GIOChannel, then ties the > >>>> channel into the main loop via g_io_add_watch() on qemu-ga startup, with > >>>> an associated callback that reads everything off the pipe, then iterates > >>>> through a list of registered pids and does a non-blocking wait(pid, ...) > >>>> on each. > >>>> > >>>> - add a SIGCHLD handler that writes a 1 to the write side of the pipe > >>>> whenever it gets called > >>> > >>> Is the pipe really needed? Is there any side effect if we call waitpid() > >>> from the signal handler considering it won't block? > >> > >> In the current state of things, I believe that might actually be > >> sufficient. I was thinking of the eventual guest-exec case though: we > >> need to be able to poll for a guest-exec'd process' return status > >> asynchronously by calling a guest-exec-status command that does the > >> wait(), so if we use a waitpid(-1, ...) SIGCHLD handler, or block > >> SIGCHLD, the return status would be intercepted/lost. > > > > What I had in mind was to do what qemu_add_child_watch() does, ie. it > > iterates through a list of child pids calling waitpid() on them instead > > of doing waitpid(-1,...). The only difference is that we would do it from > > a signal handler. > > Ah, yah that might work. I'm not sure how well our list functions will > behave when accessed via an interrupt handler. We'll have race > conditions at least: > > #define QTAILQ_INSERT_TAIL(head, elm, field) do { \ > (elm)->field.tqe_next = NULL; \ > (elm)->field.tqe_prev = (head)->tqh_last; \ > /* interrupt */ > *(head)->tqh_last = (elm); \ > (head)->tqh_last = &(elm)->field.tqe_next; > > So if we fork() and the process completes and triggers the interrupt > before we add the pid to the list, it will remain a zombie until > something else triggers the handler. It's not a huge deal...we at least > avoid accumulating zombies, but I'd be weary of other side-effects as > well, especially if we also remove entries from the list if they've been > waited on successfully. We can block SIGCHLD temporarily while inserting. If a new signal arrives while it's blocked, it will be queued and will be delivered when unblocked. > > Maybe that could work for guest-exec too: we could let the signal handler > > collect the exit status and store them. Then guest-status could retrieve > > the status from there. > > Yah, I think we can tie all this together rather nicely with a little work. > > > > > We have several options here. Maybe I should do the simplest solution for > > the guest-suspend command and you work on improving it for guest-exec. > > That's fine, I need to look at this more. But if we're gonna take the > simplest approach for now, let's not even bother with the pid > registration (and potential races), and just do a waitpid(-1, ...) in a > SIGCHLD handler like QEMU did before the child watch stuff was added, > and I'll make the changes necessary for the guest-exec stuff before that > gets added. Okay, that works for me. > > > > >>> > >>> I'm also wondering if we could use g_child_watch_add(), but it's not clear > >>> to me if it works with processes not created with g_spawn_*() functions. > >> > >> GPid's map to something other than PIDs on Windows, so I think we'd have > >> issues there. But our fork() approach probably wouldn't work at all on > >> Windows except maybe under cygwin, so at some point we'd probably want > >> to switch over to g_spawn for this kind of stuff anyway... > >> > >> So this might be a good point to switch over to using the glib functions. > >> > >> Would you mind trying to do the hibernate/zombie reaping stuff using > >> g_spawn+g_child_watch_add()? It might end up being the easiest route. > >> Otherwise I can take a look at it later today. > > > > Well, there are two problems with g_spawn wrt to the manual method of > > writing to the sysfs file. The first one is that I'm not sure if g_spawn() > > reports the file not found error synchronously. The other problem is that, > > "error can be NULL to ignore errors, or non-NULL to report errors. If an > error is set, the function returns FALSE. Errors are reported even if > they occur in the child (for example if the executable in argv[0] is not > found). Typically the message field of returned errors should be > displayed to users. Possible errors are those from the G_SPAWN_ERROR > domain." > > So I'd think we'd be able to report the FNF synchronously even if we > launch into the background. Yeah, from my testing looks like it does. > > > I'd have to fork() anyway to write to the sysfs file (unless we decide that > > it's ok to do this synchronously, which seems ok to me). > > I think what we'd have to do is open the file, then pass it in as stdout > for a g_spawn'd `echo "disk"` command. I'm not sure how we'd manage > cleaning up the FD though, if we're doing it asynchronously. And > synchronous means every guest-suspend call will eventually time out, so > I don't think that's doable right now. > > So I'd say keep it the way it is for now. Probably not worth spending > too much time on until we see what the Windows stuff looks like anyway. Ok. > > > > >> > >>> > >>>> > >>>> Then, when creating a child, you just register the child by adding the > >>>> pid to the list that the signal channel callback checks. > >>>> > >>>>> > >>>>> Signed-off-by: Luiz Capitulino<lcapitul...@redhat.com> > >>>>> --- > >>>>> > >>>>> I've tested this w/o any virtio driver, as they don't support S4 yet. > >>>>> For > >>>>> S4 it seems to work ok. I couldn't fully test S3 because we lack a way > >>>>> to > >>>>> resume from it, but by checking the logs it seems to work fine. > >>>>> > >>>>> changelog > >>>>> --------- > >>>>> > >>>>> v2 > >>>>> > >>>>> o Rename the command to 'guest-suspend' > >>>>> o Add 'mode' parameter > >>>>> o Use pm-utils scripts > >>>>> o Cleanup child termination status > >>>>> > >>>>> qapi-schema-guest.json | 17 +++++++++++ > >>>>> qemu-ga.c | 11 +++++++- > >>>>> qga/guest-agent-commands.c | 64 > >>>>> ++++++++++++++++++++++++++++++++++++++++++++ > >>>>> 3 files changed, 91 insertions(+), 1 deletions(-) > >>>>> > >>>>> diff --git a/qapi-schema-guest.json b/qapi-schema-guest.json > >>>>> index 29989fe..656bde9 100644 > >>>>> --- a/qapi-schema-guest.json > >>>>> +++ b/qapi-schema-guest.json > >>>>> @@ -219,3 +219,20 @@ > >>>>> ## > >>>>> { 'command': 'guest-fsfreeze-thaw', > >>>>> 'returns': 'int' } > >>>>> + > >>>>> +## > >>>>> +# @guest-suspend > >>>>> +# > >>>>> +# Suspend guest execution by entering ACPI power state S3 or S4. > >>>>> +# > >>>>> +# @mode: 'hibernate' RAM content is saved in the disk and the guest is > >>>>> +# powered down (this corresponds to ACPI S4) > >>>>> +# 'sleep' execution is suspended but the RAM retains its > >>>>> contents > >>>>> +# (this corresponds to ACPI S3) > >>>>> +# > >>>>> +# Notes: This is an asynchronous request. There's no guarantee it will > >>>>> +# succeed. Errors will be logged to guest's syslog. > >>>>> +# > >>>>> +# Since: 1.1 > >>>>> +## > >>>>> +{ 'command': 'guest-suspend', 'data': { 'mode': 'str' } } > >>>>> diff --git a/qemu-ga.c b/qemu-ga.c > >>>>> index 60d4972..b32e96c 100644 > >>>>> --- a/qemu-ga.c > >>>>> +++ b/qemu-ga.c > >>>>> @@ -61,7 +61,7 @@ static void quit_handler(int sig) > >>>>> > >>>>> static void register_signal_handlers(void) > >>>>> { > >>>>> - struct sigaction sigact; > >>>>> + struct sigaction sigact, sigact_chld; > >>>>> int ret; > >>>>> > >>>>> memset(&sigact, 0, sizeof(struct sigaction)); > >>>>> @@ -76,6 +76,15 @@ static void register_signal_handlers(void) > >>>>> if (ret == -1) { > >>>>> g_error("error configuring signal handler: %s", > >>>>> strerror(errno)); > >>>>> } > >>>>> + > >>>>> + /* This should cause the kernel to automatically cleanup child > >>>>> + termination status */ > >>>>> + memset(&sigact_chld, 0, sizeof(struct sigaction)); > >>>>> + sigact_chld.sa_handler = SIG_IGN; > >>>>> + ret = sigaction(SIGCHLD,&sigact_chld, NULL); > >>>>> + if (ret == -1) { > >>>>> + g_error("error configuring signal handler: %s", > >>>>> strerror(errno)); > >>>>> + } > >>>>> } > >>>>> > >>>>> static void usage(const char *cmd) > >>>>> diff --git a/qga/guest-agent-commands.c b/qga/guest-agent-commands.c > >>>>> index a09c8ca..4799638 100644 > >>>>> --- a/qga/guest-agent-commands.c > >>>>> +++ b/qga/guest-agent-commands.c > >>>>> @@ -574,6 +574,70 @@ int64_t qmp_guest_fsfreeze_thaw(Error **err) > >>>>> } > >>>>> #endif > >>>>> > >>>>> +#define LINUX_PM_UTILS_PATH "/usr/sbin" > >>>>> +#define LINUX_SYS_STATE_FILE "/sys/power/state" > >>>>> + > >>>>> +void qmp_guest_suspend(const char *mode, Error **err) > >>>>> +{ > >>>>> + int ret, fd = -1; > >>>>> + const char *pmutils_bin; > >>>>> + char pmutils_bin_path[PATH_MAX]; > >>>>> + > >>>>> + if (strcmp(mode, "hibernate") == 0) { > >>>>> + pmutils_bin = "pm-hibernate"; > >>>>> + } else if (strcmp(mode, "sleep") == 0) { > >>>>> + pmutils_bin = "pm-suspend"; > >>>>> + } else { > >>>>> + error_set(err, QERR_INVALID_PARAMETER, "mode"); > >>>>> + return; > >>>>> + } > >>>>> + > >>>>> + snprintf(pmutils_bin_path, sizeof(pmutils_bin_path), "%s/%s", > >>>>> + LINUX_PM_UTILS_PATH, pmutils_bin); > >>>>> + > >>>>> + if (access(pmutils_bin_path, X_OK) != 0) { > >>>>> + pmutils_bin = NULL; > >>>>> + fd = open(LINUX_SYS_STATE_FILE, O_WRONLY); > >>>>> + if (fd< 0) { > >>>>> + error_set(err, QERR_OPEN_FILE_FAILED, > >>>>> LINUX_SYS_STATE_FILE); > >>>>> + return; > >>>>> + } > >>>>> + } > >>>>> + > >>>>> + ret = fork(); > >>>>> + if (ret == 0) { > >>>>> + /* child */ > >>>>> + setsid(); > >>>>> + fclose(stdin); > >>>>> + fclose(stdout); > >>>>> + fclose(stderr); > >>>>> + > >>>>> + if (pmutils_bin) { > >>>>> + ret = execl(pmutils_bin_path, pmutils_bin, NULL); > >>>>> + if (ret) { > >>>>> + slog("%s failed: %s", pmutils_bin_path, > >>>>> strerror(errno)); > >>>>> + } > >>>>> + } else { > >>>>> + const char *cmd = strcmp(mode, "sleep") == 0 ? "mem" : > >>>>> "disk"; > >>>>> + ret = write(fd, cmd, strlen(cmd)); > >>>>> + if (ret< 0) { > >>>>> + slog("can't write to %s: %s\n", LINUX_SYS_STATE_FILE, > >>>>> + strerror(errno)); > >>>>> + } > >>>>> + close(fd); > >>>>> + } > >>>>> + > >>>>> + exit(!!ret); > >>>>> + } else if (ret< 0) { > >>>>> + error_set(err, QERR_UNDEFINED_ERROR); > >>>>> + return; > >>>>> + } > >>>>> + > >>>>> + if (!pmutils_bin) { > >>>>> + close(fd); > >>>>> + } > >>>>> +} > >>>>> + > >>>>> /* register init/cleanup routines for stateful command groups */ > >>>>> void ga_command_state_init(GAState *s, GACommandState *cs) > >>>>> { > >>>> > >>> > >> > > >