On Thu, Sep 22, 2022 at 04:19:33PM +0200, Dominik Csapak wrote: > currently, the 'forced_cleanup' (sending SIGKILL to the qemu process), > is intended to be triggered 5 seconds after sending the initial shutdown > signal (SIGTERM) which is sadly not enough for some setups. > > Accidentally, it could be triggered earlier than 5 seconds, if a > SIGALRM triggers in the timespan directly before setting it again. > > Also, this approach means that depending on when machines are shutdown > their forced cleanup may happen after 5 seconds, or any time after, if > new vms are shut off in the meantime. > > Improve this situation by reworking the way we deal with this cleanup. > We save the time incl. timeout in the CleanupData, and set a timeout > to 'epoll_wait' of 10 seconds, which will then trigger a forced_cleanup. > Remove entries from the forced_cleanup list when that entry is killed, > or when the normal cleanup took place. > > To improve the shutdown behaviour, increase the default timeout to 60 > seconds, which should be enough, but add a commandline toggle where > users can set it to a different value. > > Signed-off-by: Dominik Csapak <d.csa...@proxmox.com> > --- > qmeventd/qmeventd.c | 73 +++++++++++++++++++++++---------------------- > qmeventd/qmeventd.h | 2 ++ > 2 files changed, 39 insertions(+), 36 deletions(-) > > diff --git a/qmeventd/qmeventd.c b/qmeventd/qmeventd.c > index 8d32827..46bc7eb 100644 > --- a/qmeventd/qmeventd.c > +++ b/qmeventd/qmeventd.c > @@ -551,27 +558,16 @@ handle_client(struct Client *client) > json_tokener_free(tok); > } > > - > -/* > - * SIGALRM and cleanup handling > - * > - * terminate_client will set an alarm for 5 seconds and add its client's PID > to > - * the forced_cleanups list - when the timer expires, we iterate the list and > - * attempt to issue SIGKILL to all processes which haven't yet stopped. > - */ > - > -static void > -alarm_handler(__attribute__((unused)) int signum) > -{ > - alarm_triggered = 1; > -} > - > static void > -sigkill(void *ptr, __attribute__((unused)) void *unused)
If you change the style here... (which I'm not a fan of btw.) > +sigkill(struct CleanupData *ptr, time_t *cur_time) > { > - struct CleanupData data = *((struct CleanupData *)ptr); > + struct CleanupData data = *ptr; ...at least get rid of this line completely ^ and just use `ptr->` instea of `data.`, I see no reason to keep copying the data onto the stack? (or with the old style, make `data` a pointer and skip the cast) > int err; > > + if (data.timeout > *cur_time) { > + return; > + } > + > if (data.pidfd > 0) { > err = pidfd_send_signal(data.pidfd, SIGKILL, NULL, 0); > (void)close(data.pidfd); _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel