Hi Ludo,

About my patch here: The store is not present at the time where fuser
would be invoked, so my patch here won't work.  This is a problem for
most of the approaches, including the load-initramfs-again approach
(initrd image is in the store).

Frankly, after now understanding the problem, I'm not rebooting
needlessly at all until some mitigation is in guix.  It's just too
dangerous.

I don't say that to pressure anyone--but I'm using the laptop for
work--and spending >20 h restoring 1 TB of data via a transatlantic
network connection is not fun.  I'm not causing this problem again
on purpose if there's no chance of it being any different,
or of any fixture debugging tools being in place (I think even hanging
indefinitely once the umount fails would be better than just rebooting
anyway),
or of "sudo halt" not turning off my screen (I guess I can work around
that last thing...).

I've now implemented the relevant parts of fuser in guile, see
<https://issues.guix.gnu.org/78051>.  I've tested it as a standalone
module and it seems to work fine.  However, I need help to integrate
something like that into guix (and/or shepherd, hmm).

If you have time, let's work together on issue# 78051.

Does shepherd use threads, too?  I ask because I currently exempt the
result of (getpid) from the killing, for obvious reasons.  Any other
kernel "tasks" we need to worry about?

Also, I have questions on how to test that:

1. sudo ./pre-inst-env guix system reconfigure /etc/config.scm
   doesn't use my channels, so I can't actually use my config.scm .
   How do I make that work?
   I've changed gnu/services/base.scm , so that needs to be preserved
   (before I commit it to guix master).
   We should create a system test that invokes a program that writes to
   a file and keeps it open--and then we should forget to kill it.

2. How do I test mount, umount, ioctl etc ? It seems the regular guile
   doesn't have it.
   How do I get a repl to the irregular guile?
   I have put (shepherd service repl) into my system shepherd a long
   time ago, now what?
   /var/run/shepherd/repl , aha

3. How do I replace the stop method of an existing shepherd service
   on-the-fly using the shepherd repl?

Shepherd REPL woes:

$ sudo guile /var/run/shepherd/repl
;;; note: auto-compilation is enabled, set GUILE_AUTO_COMPILE=0
;;;       or pass the --no-auto-compile argument to disable.
;;; compiling /var/run/shepherd/repl
;;; WARNING: compilation of /var/run/shepherd/repl failed:
;;; In procedure open-file: No such device or address: "/var/run/shepherd/repl"
Backtrace:
           0 (primitive-load "/var/run/shepherd/repl")

ERROR: In procedure primitive-load:
In procedure open-file: No such device or address: "/var/run/shepherd/repl"

/etc/config.scm has:

(simple-service 'shepherd-repl
                     shepherd-root-service-type
                     (list (shepherd-service
                            (provision '(repl))
                            (modules '((shepherd service repl)))
                            (free-form #~(repl-service)))))

dannym@nova ~$ sudo lsof -p 1 |grep repl
lsof: WARNING: can't stat() fuse.portal file system /run/user/1000/doc
      Output information may be incomplete.
shepherd   1 root mem       REG              253,0    80053  261309955 
/gnu/store/ckghf0bkrj9qrybp1c64q2irv6vx728k-shepherd-1.0.4/lib/guile/3.0/site-ccache/shepherd/service/repl.go
shepherd   1 root mem       REG              253,0    69413  271659842 
/gnu/store/h7bgg78xc14994qknp2xgqwcry4dixkp-shepherd-repl.go
shepherd   1 root mem       REG              253,0    81829  261115824 
/gnu/store/3zdc679dcs33yaljrjrkaq1fm7w3sjpy-guile-3.0.9/lib/guile/3.0/ccache/system/repl/error-handling.go
shepherd   1 root mem       REG              253,0    84013  261115823 
/gnu/store/3zdc679dcs33yaljrjrkaq1fm7w3sjpy-guile-3.0.9/lib/guile/3.0/ccache/system/repl/debug.go
shepherd   1 root mem       REG              253,0   332773  261115820 
/gnu/store/3zdc679dcs33yaljrjrkaq1fm7w3sjpy-guile-3.0.9/lib/guile/3.0/ccache/system/repl/command.go
shepherd   1 root mem       REG              253,0    91917  261115821 
/gnu/store/3zdc679dcs33yaljrjrkaq1fm7w3sjpy-guile-3.0.9/lib/guile/3.0/ccache/system/repl/common.go
shepherd   1 root mem       REG              253,0    84605  261297511 
/gnu/store/npxvddsza0hgix6am143ij8ivm3xp97g-guile-fibers-1.3.1/lib/guile/3.0/site-ccache/fibers/repl.go
shepherd   1 root  29u     unix 0x00000000a03db542      0t0      17506 
/var/run/shepherd/repl type=STREAM (LISTEN)

... so?  What's up?

"sudo herd status" works.

$ sudo herd status repl
● Status of repl:
  It is running since Mon 21 Apr 2025 03:05:00 PM CEST (4 days ago).
  Running value is "#<input-output: socket 29>".
  It is enabled.
  Provides: repl
  Will not be respawned.

root@nova /proc/1/fd# ls -lrt 29  
lrwx------ 1 root root 64 Apr 25 13:14 29 -> 'socket:[17506]'

17506, you say ?  Linux, never change /s

Reply via email to