Package: monit Version: 1:5.27.2-1 Severity: serious Forwarded: https://bitbucket.org/tildeslash/monit/pull-requests/110/use-an-epsilon-when-doing-the-reboot-boot
Hi! On Linux the current method to retrieve the boot timestamp is racy, which means that the reboot checks (the daemon start delay), and the state machinery can be affected. The former is an annoyance as monit will not respond to commands during the set amount of time. But the latter means that services set to «onreboot nostart» and managed f.ex. by a HA system will lose their state on «monit restart», which can be rather bad. I notice that Hurd patch modifies the Linux _getStartTime() implementation making it need way more syscalls, which can exacerbate this problem, and IMO should be either reverted for bullseye, or modified so that the Hurd has its own sysdep file with that change. I'm attaching the patch I've submitted upstream, which fixes the problem for us. To reproduce I added a Log_debug() entry to see the exact timestamps, but this can be easily seen anyway by adding a start delay and checking whether the delay gets skipped or taken into account after each «service monit restart». Thanks, Guillem
From dc50d6cca9a69922f75f98fe73d95bb2f1067cfa Mon Sep 17 00:00:00 2001 From: Guillem Jover <gjo...@sipwise.com> Date: Tue, 20 Jul 2021 20:40:54 +0200 Subject: [PATCH] Use an epsilon when doing the reboot boot timestamp comparisons At least on Linux, the current method to retrieve the boot timestamp is racy, as we first fetch the system uptime (seconds since the epoch), then subtract that from the current time. But if between these two syscalls a new or more seconds elapse, then the result can change depending on the invocation compared to the stored state. This is trivial to trigger at least on a VirtualBox machine. To avoid this problem we refactor the reboot checks to use State_reboot(), and change it to use a small epsilon value, either positive or negative, to account for a possible mismatch in the current timestamp or the state one. --- src/state.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/src/state.c b/src/state.c index 05c3b64e..986202b7 100644 --- a/src/state.c +++ b/src/state.c @@ -243,7 +243,7 @@ static void _updateStart(Service_T S, int nstart, int ncycle) { static void _updateMonitor(Service_T S, Monitor_State monitor) { - if (systeminfo.booted == booted || S->onreboot == Onreboot_Laststate) { + if (!State_reboot() || S->onreboot == Onreboot_Laststate) { // Monit reload or restart within the same boot session OR persistent state => restore the monitoring state if (monitor == Monitor_Not) S->monitor = Monitor_Not; @@ -664,6 +664,9 @@ void State_restore() { bool State_reboot() { - return systeminfo.booted == booted ? false : true; + /* We need to use a small epsilon when comparing the boot timestamps, + * as at least on Linux its gathering is racy and can diverge from the + * real boot timestamp. */ + return abs(systeminfo.booted - booted) < 2 ? false : true; } -- 2.32.0