Hello, The ‘guix offload’ processes on berlin regularly hang while calling ‘channel-get-exit-status’:
--8<---------------cut here---------------start------------->8--- (gdb) bt #0 0x00007f299fb330f1 in __GI___poll (fds=0x1dd58c0, nfds=1, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29 #1 0x00007f2994287577 in ssh_poll_ctx_dopoll () from target:/gnu/store/wmpg67bn7i7pqc0p4xjp1npnqixk9znd-libssh-0.7.6/lib/libssh.so.4 #2 0x00007f29942884d9 in ssh_handle_packets () from target:/gnu/store/wmpg67bn7i7pqc0p4xjp1npnqixk9znd-libssh-0.7.6/lib/libssh.so.4 #3 0x00007f29942885ad in ssh_handle_packets_termination () from target:/gnu/store/wmpg67bn7i7pqc0p4xjp1npnqixk9znd-libssh-0.7.6/lib/libssh.so.4 #4 0x00007f2994275080 in ssh_channel_get_exit_status () from target:/gnu/store/wmpg67bn7i7pqc0p4xjp1npnqixk9znd-libssh-0.7.6/lib/libssh.so.4 #5 0x00007f29946dd11a in guile_ssh_channel_get_exit_status () from target:/gnu/store/i3nfl17wfx7sryq6w15r9wxl7ilmq4rb-guile-ssh-0.11.3/lib/libguile-ssh.so.11 #6 0x00007f29a1765965 in vm_regular_engine (thread=0x1dd58c0, vp=0x1d4df30, registers=0xffffffff, resume=-1615646479) at vm-engine.c:786 #7 0x00007f29a1768fba in scm_call_n (proc=#<program 7f29a1be0030>, argv=argv@entry=0x7ffc76b1ece8, nargs=nargs@entry=1) at vm.c:1257 #8 0x00007f29a16ecff7 in scm_primitive_eval ( exp=exp@entry=((@ (ice-9 control) %) (begin ((@@ (ice-9 command-line) load/lang) "/gnu/store/zz3b7j4iv6v143v7cqyr77k83zc5n3zw-guix-0.15.0-6.f9a8fce/bin/.guix-real") (main (command-line)) (quit)))) at eval.c:662 #9 0x00007f29a16ed053 in scm_eval ( exp=((@ (ice-9 control) %) (begin ((@@ (ice-9 command-line) load/lang) "/gnu/store/zz3b7j4iv6v143v7cqyr77k83zc5n3zw-guix-0.15.0-6.f9a8fce/bin/.guix-real") (main (command-line)) (quit))), module_or_state=module_or_state@entry="#<struct module>" = {...}) at eval.c:696 #10 0x00007f29a1738220 in scm_shell (argc=11, argv=0x1dd5280) at script.c:454 (gdb) frame 0 #0 0x00007f299fb330f1 in __GI___poll (fds=0x1dd58c0, nfds=1, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29 29 in ../sysdeps/unix/sysv/linux/poll.c (gdb) p *fds $1 = {fd = 14, events = 1, revents = 0} (gdb) shell ls -l /proc/12605/fd total 0 lr-x------ 1 root root 64 Nov 2 11:20 0 -> 'pipe:[44413497]' l-wx------ 1 root root 64 Nov 2 11:33 1 -> 'pipe:[44413496]' lr-x------ 1 root root 64 Nov 2 11:33 10 -> 'pipe:[44459532]' l-wx------ 1 root root 64 Nov 2 11:33 11 -> 'pipe:[44459532]' lr-x------ 1 root root 64 Nov 2 11:33 12 -> 'pipe:[44429590]' l-wx------ 1 root root 64 Nov 2 11:33 13 -> 'pipe:[44429590]' lrwx------ 1 root root 64 Nov 2 11:33 14 -> 'socket:[44444783]' lrwx------ 1 root root 64 Nov 2 11:33 15 -> 'socket:[44444784]' l-wx------ 1 root root 64 Nov 2 11:33 16 -> /var/guix/offload/141.80.167.140/0 l-wx------ 1 root root 64 Nov 2 11:33 2 -> 'pipe:[44413496]' lr-x------ 1 root root 64 Nov 2 11:33 3 -> 'pipe:[44459528]' lr-x------ 1 root root 64 Nov 2 11:33 33 -> /dev/urandom l-wx------ 1 root root 64 Nov 2 11:33 4 -> 'pipe:[44413498]' l-wx------ 1 root root 64 Nov 2 11:33 5 -> 'pipe:[44459528]' lr-x------ 1 root root 64 Nov 2 11:33 6 -> 'pipe:[44459531]' l-wx------ 1 root root 64 Nov 2 11:33 7 -> 'pipe:[44459531]' lr-x------ 1 root root 64 Nov 2 11:33 8 -> 'pipe:[44453928]' l-wx------ 1 root root 64 Nov 2 11:33 9 -> 'pipe:[44453928]' --8<---------------cut here---------------end--------------->8--- I believe this is because in (guix ssh) we don’t ensure the remote process is dead by the time we call ‘channel-get-exit-status’, as in this example: --8<---------------cut here---------------start------------->8--- scheme@(guix ssh)> (define s (open-ssh-session "localhost" #:user "ludo" #:port 22)) scheme@(guix ssh)> (define c (open-remote-pipe* s OPEN_BOTH "sleep 1000")) scheme@(guix ssh)> (channel-send-eof c) $4 = #<undefined> scheme@(guix ssh)> (channel-get-exit-status c) ;; hangs --8<---------------cut here---------------end--------------->8--- Problem is that calling ‘channel-get-exit-status’ on a closed port doesn’t work, so forcing a port close isn’t really an option: --8<---------------cut here---------------start------------->8--- scheme@(guix ssh)> (define c (open-remote-pipe* s OPEN_BOTH "sleep 100")) scheme@(guix ssh)> (close-port c) $4 = #t scheme@(guix ssh)> (channel-get-exit-status c) ERROR: In procedure channel-get-exit-status: In procedure channel-get-exit-status: Wrong type argument in position 1 (expecting open channel): #<unknown channel (freed) 221d5c0> --8<---------------cut here---------------end--------------->8--- To be continued… Ludo’.