Ludovic Courtès <l...@gnu.org> skribis: > So why is ‘wpa-supplicant’ marked as failing to start on the first > attempt? > > The only reason I can think of is if ‘read-pid-file’ from (shepherd > service) returns immediately and returns #f instead of a number. That > can actually happen if the PID file exists but is empty (or contains > garbage).
I’ve produced an ISO with the patch below and ran it on the bare metal to get confirmation (too bad the bug doesn’t show in QEMU :-/). Indeed, ‘read-pid-file’ for /var/run/wpa_supplicant.pid systematically reads the empty string the first time the ‘wpa-supplicant’ service is started. (After that, if we kill the process and try to restart the service, the problem doesn’t show up.)
diff --git a/modules/shepherd/service.scm b/modules/shepherd/service.scm index 53437b6..e21492e 100644 --- a/modules/shepherd/service.scm +++ b/modules/shepherd/service.scm @@ -717,9 +717,12 @@ otherwise return the number that was read (a PID)." (let loop () (catch 'system-error (lambda () - (string->number - (string-trim-both - (call-with-input-file file get-string-all)))) + (define str + (call-with-input-file file get-string-all)) + + (local-output (l10n "read-pid-file ~s -> ~s") + file str) + (string->number (string-trim-both str))) (lambda args (let ((errno (system-error-errno args))) (if (= ENOENT errno)
With the second patch below, I confirm that the ‘wpa-supplicant’ starts correctly. We can see in /var/log/messages that ‘read-pid-file’ first reads the empty string from /var/run/wpa_supplicant.pid, then tries again, and gets a valid PID on the second attempt. It’s surprising that the timing is always like that, and only on the bare metal, but that’s the way it is. It’d be great if you could do some testing with the patch below. Then I guess we’ll push a Shepherd release with this fix. I wonder if this could also explain <https://issues.guix.info/issue/30993>. Thanks, Ludo’.
diff --git a/gnu/packages/admin.scm b/gnu/packages/admin.scm index dfc3467bf8..e1dd248679 100644 --- a/gnu/packages/admin.scm +++ b/gnu/packages/admin.scm @@ -188,7 +188,8 @@ and provides a \"top-like\" mode (monitoring).") version ".tar.gz")) (sha256 (base32 - "1ys2w83vm62spr8bx38sccfdpy9fqmj7wfywm5k8ihsy2k61da2i")))) + "1ys2w83vm62spr8bx38sccfdpy9fqmj7wfywm5k8ihsy2k61da2i")) + (patches (search-patches "shepherd-debug.patch")))) (build-system gnu-build-system) (arguments '(#:configure-flags '("--localstatedir=/var"))) diff --git a/gnu/packages/patches/shepherd-debug.patch b/gnu/packages/patches/shepherd-debug.patch new file mode 100644 index 0000000000..2fd97cc578 --- /dev/null +++ b/gnu/packages/patches/shepherd-debug.patch @@ -0,0 +1,43 @@ +diff --git a/modules/shepherd/service.scm b/modules/shepherd/service.scm +index 53437b6..bef8f42 100644 +--- a/modules/shepherd/service.scm ++++ b/modules/shepherd/service.scm +@@ -715,21 +715,28 @@ number. Return #f if FILE was not created or does not contain a number; + otherwise return the number that was read (a PID)." + (define start (current-time)) + (let loop () ++ (define (retry) ++ (and (< (current-time) (+ start max-delay)) ++ (begin ++ ;; FILE does not exist yet, so wait and try again. ++ ;; XXX: Ideally we would yield to the main event loop ++ ;; and/or use inotify. ++ (sleep 1) ++ (loop)))) ++ + (catch 'system-error + (lambda () +- (string->number +- (string-trim-both +- (call-with-input-file file get-string-all)))) ++ (define str ++ (call-with-input-file file get-string-all)) ++ ++ (local-output (l10n "read-pid-file ~s -> ~s") ++ file str) ++ (or (string->number (string-trim-both str)) ++ (retry))) + (lambda args + (let ((errno (system-error-errno args))) + (if (= ENOENT errno) +- (and (< (current-time) (+ start max-delay)) +- (begin +- ;; FILE does not exist yet, so wait and try again. +- ;; XXX: Ideally we would yield to the main event loop +- ;; and/or use inotify. +- (sleep 1) +- (loop))) ++ (retry) + (apply throw args))))))) + + (define* (exec-command command