Ludovic Courtès <l...@gnu.org> writes: > Ludovic Courtès <l...@gnu.org> skribis: > >>> 2025-02-23 12:00:02 Waiting anew for timer 'kerberos-log-in-refresh' >>> (resuming from sleep state?). > > The “Waiting anew” message happens when the timer fires 2 seconds or > more later than expected (see ‘sleep-operation/check’), which is indeed > the case here. > > It’s not supposed to happen normally. Before we bump that to 10 > seconds, say, it would be good to understand why the timer got late > here.
I definitely agree on this. (I wonder if there is better way to detect the sleep. I feel like *any* number will be wrong for someone. Do we know how for example systemd's timers handle this?) > > Are there services that could block shepherd somehow, for instance by > calling ‘waitpid’, or running computations at 12:00pm? Not really (I think). This is full shepherd status output: --8<---------------cut here---------------start------------->8--- $ herd status Started: + dbus + pulseaudio + root + timer + transient Running timers: + kerberos-log-in-refresh + log-rotation One-shot: * kerberos-log-in * kerberos-reachable? --8<---------------cut here---------------end--------------->8--- I have already shared the definition of kerberos-log-in-refresh. There is no other timer scheduled (except for log rotation). Other services are from Guix, with the exception of pulseaudio: --8<---------------cut here---------------start------------->8--- (define (home-pulseaudio-shepherd-services _) "Return a shepherd service to run a pulseaudio daemon. Currently no configuration is supported." (list (shepherd-service (documentation "Run a pulseaudio daemon.") (provision '(pulseaudio)) (start #~(make-forkexec-constructor '(#$(file-append pulseaudio "/bin/pulseaudio") "--daemonize=false"))) (stop #~(make-kill-destructor))))) --8<---------------cut here---------------end--------------->8--- There is a timer scheduled to run every 15 minutes in the system shepherd, but is it not compute heavy (it just checks error counts from the root filesystem). The machine has 12 cores, each at ~3GHz, 32GB of RAM and SSD for /. I am not aware of any significant resource use that should happen at noon, but even if there would be one, it is hard to believe shepherd would not get a time slice on *any* core for 2 seconds. For what it is worth, today the cronjob worked fine, however even today it was executed at :01, so a second later then it should have been. --8<---------------cut here---------------start------------->8--- 2025-02-24 12:00:01 Timer 'kerberos-log-in-refresh' spawned process 24129. 2025-02-24 12:00:01 Registering new logger for kerberos-log-in-refresh. --8<---------------cut here---------------end--------------->8--- If you have any idea what additional information would be useful, I have no problem deploying patched shepherd with extra logging to this machine (assuming you know what extra logs we need). Tomas -- There are only two hard things in Computer Science: cache invalidation, naming things and off-by-one errors.
signature.asc
Description: PGP signature