Hi Ludo, About my patch here: The store is not present at the time where fuser would be invoked, so my patch here won't work. This is a problem for most of the approaches, including the load-initramfs-again approach (initrd image is in the store).
Frankly, after now understanding the problem, I'm not rebooting needlessly at all until some mitigation is in guix. It's just too dangerous. I don't say that to pressure anyone--but I'm using the laptop for work--and spending >20 h restoring 1 TB of data via a transatlantic network connection is not fun. I'm not causing this problem again on purpose if there's no chance of it being any different, or of any fixture debugging tools being in place (I think even hanging indefinitely once the umount fails would be better than just rebooting anyway), or of "sudo halt" not turning off my screen (I guess I can work around that last thing...). I've now implemented the relevant parts of fuser in guile, see <https://issues.guix.gnu.org/78051>. I've tested it as a standalone module and it seems to work fine. However, I need help to integrate something like that into guix (and/or shepherd, hmm). If you have time, let's work together on issue# 78051. Does shepherd use threads, too? I ask because I currently exempt the result of (getpid) from the killing, for obvious reasons. Any other kernel "tasks" we need to worry about? Also, I have questions on how to test that: 1. sudo ./pre-inst-env guix system reconfigure /etc/config.scm doesn't use my channels, so I can't actually use my config.scm . How do I make that work? I've changed gnu/services/base.scm , so that needs to be preserved (before I commit it to guix master). We should create a system test that invokes a program that writes to a file and keeps it open--and then we should forget to kill it. 2. How do I test mount, umount, ioctl etc ? It seems the regular guile doesn't have it. How do I get a repl to the irregular guile? I have put (shepherd service repl) into my system shepherd a long time ago, now what? /var/run/shepherd/repl , aha 3. How do I replace the stop method of an existing shepherd service on-the-fly using the shepherd repl? Shepherd REPL woes: $ sudo guile /var/run/shepherd/repl ;;; note: auto-compilation is enabled, set GUILE_AUTO_COMPILE=0 ;;; or pass the --no-auto-compile argument to disable. ;;; compiling /var/run/shepherd/repl ;;; WARNING: compilation of /var/run/shepherd/repl failed: ;;; In procedure open-file: No such device or address: "/var/run/shepherd/repl" Backtrace: 0 (primitive-load "/var/run/shepherd/repl") ERROR: In procedure primitive-load: In procedure open-file: No such device or address: "/var/run/shepherd/repl" /etc/config.scm has: (simple-service 'shepherd-repl shepherd-root-service-type (list (shepherd-service (provision '(repl)) (modules '((shepherd service repl))) (free-form #~(repl-service))))) dannym@nova ~$ sudo lsof -p 1 |grep repl lsof: WARNING: can't stat() fuse.portal file system /run/user/1000/doc Output information may be incomplete. shepherd 1 root mem REG 253,0 80053 261309955 /gnu/store/ckghf0bkrj9qrybp1c64q2irv6vx728k-shepherd-1.0.4/lib/guile/3.0/site-ccache/shepherd/service/repl.go shepherd 1 root mem REG 253,0 69413 271659842 /gnu/store/h7bgg78xc14994qknp2xgqwcry4dixkp-shepherd-repl.go shepherd 1 root mem REG 253,0 81829 261115824 /gnu/store/3zdc679dcs33yaljrjrkaq1fm7w3sjpy-guile-3.0.9/lib/guile/3.0/ccache/system/repl/error-handling.go shepherd 1 root mem REG 253,0 84013 261115823 /gnu/store/3zdc679dcs33yaljrjrkaq1fm7w3sjpy-guile-3.0.9/lib/guile/3.0/ccache/system/repl/debug.go shepherd 1 root mem REG 253,0 332773 261115820 /gnu/store/3zdc679dcs33yaljrjrkaq1fm7w3sjpy-guile-3.0.9/lib/guile/3.0/ccache/system/repl/command.go shepherd 1 root mem REG 253,0 91917 261115821 /gnu/store/3zdc679dcs33yaljrjrkaq1fm7w3sjpy-guile-3.0.9/lib/guile/3.0/ccache/system/repl/common.go shepherd 1 root mem REG 253,0 84605 261297511 /gnu/store/npxvddsza0hgix6am143ij8ivm3xp97g-guile-fibers-1.3.1/lib/guile/3.0/site-ccache/fibers/repl.go shepherd 1 root 29u unix 0x00000000a03db542 0t0 17506 /var/run/shepherd/repl type=STREAM (LISTEN) ... so? What's up? "sudo herd status" works. $ sudo herd status repl ● Status of repl: It is running since Mon 21 Apr 2025 03:05:00 PM CEST (4 days ago). Running value is "#<input-output: socket 29>". It is enabled. Provides: repl Will not be respawned. root@nova /proc/1/fd# ls -lrt 29 lrwx------ 1 root root 64 Apr 25 13:14 29 -> 'socket:[17506]' 17506, you say ? Linux, never change /s