bug#76516: [shepherd] Timer not executed

Tomas Volf Mon, 24 Feb 2025 11:25:57 -0800

Ludovic Courtès <l...@gnu.org> writes:

> Ludovic Courtès <l...@gnu.org> skribis:
>
>>> 2025-02-23 12:00:02 Waiting anew for timer 'kerberos-log-in-refresh' 
>>> (resuming from sleep state?).
>
> The “Waiting anew” message happens when the timer fires 2 seconds or
> more later than expected (see ‘sleep-operation/check’), which is indeed
> the case here.
>
> It’s not supposed to happen normally.  Before we bump that to 10
> seconds, say, it would be good to understand why the timer got late
> here.


I definitely agree on this.

(I wonder if there is better way to detect the sleep.  I feel like *any*
number will be wrong for someone.  Do we know how for example systemd's
timers handle this?)

>
> Are there services that could block shepherd somehow, for instance by
> calling ‘waitpid’, or running computations at 12:00pm?

Not really (I think).  This is full shepherd status output:

--8<---------------cut here---------------start------------->8---
$ herd status
Started:
 + dbus
 + pulseaudio
 + root
 + timer
 + transient
Running timers:
 + kerberos-log-in-refresh
 + log-rotation
One-shot:
 * kerberos-log-in
 * kerberos-reachable?
--8<---------------cut here---------------end--------------->8---

I have already shared the definition of kerberos-log-in-refresh.  There
is no other timer scheduled (except for log rotation).  Other services
are from Guix, with the exception of pulseaudio:

--8<---------------cut here---------------start------------->8---
(define (home-pulseaudio-shepherd-services _)
  "Return a shepherd service to run a pulseaudio daemon.

Currently no configuration is supported."
  (list
   (shepherd-service
    (documentation "Run a pulseaudio daemon.")
    (provision '(pulseaudio))
    (start #~(make-forkexec-constructor
              '(#$(file-append pulseaudio "/bin/pulseaudio")
                "--daemonize=false")))
    (stop #~(make-kill-destructor)))))
--8<---------------cut here---------------end--------------->8---

There is a timer scheduled to run every 15 minutes in the system
shepherd, but is it not compute heavy (it just checks error counts from
the root filesystem).  The machine has 12 cores, each at ~3GHz, 32GB of
RAM and SSD for /.  I am not aware of any significant resource use that
should happen at noon, but even if there would be one, it is hard to
believe shepherd would not get a time slice on *any* core for 2 seconds.

For what it is worth, today the cronjob worked fine, however even today
it was executed at :01, so a second later then it should have been.

--8<---------------cut here---------------start------------->8---
2025-02-24 12:00:01 Timer 'kerberos-log-in-refresh' spawned process 24129.
2025-02-24 12:00:01 Registering new logger for kerberos-log-in-refresh.
--8<---------------cut here---------------end--------------->8---

If you have any idea what additional information would be useful, I have
no problem deploying patched shepherd with extra logging to this machine
(assuming you know what extra logs we need).

Tomas

-- 
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.

signature.asc
Description: PGP signature

bug#76516: [shepherd] Timer not executed

Reply via email to