Hi, Maxim Cournoyer <maxim.courno...@gmail.com> skribis:
> The build failures are not captured in the logs, which appear truncated, > e.g. from <https://ci.guix.gnu.org/build/2294052/details> (log empty -- > builder: sjd-p9, worker N38vG7OW) On sjd-p9, I get: --8<---------------cut here---------------start------------->8--- root@sjd-p9:~# journalctl -u cuirass-remote-worker.service |grep -C2 s167lh0w0azfxx83ri8k0mn2gj57i5rl Oct 23 00:06:21 sjd-p9 cuirass[531]: N38vG7OW: derivation /gnu/store/kcv6hfqz94f1vdw24viw7d9ih6v9zi1j-libkmahjongg-23.04.3.drv build succeeded. Oct 23 00:06:21 sjd-p9 cuirass[531]: N38vG7OW: request work. Oct 23 00:06:21 sjd-p9 cuirass[531]: N38vG7OW: building derivation `/gnu/store/s167lh0w0azfxx83ri8k0mn2gj57i5rl-u-boot-rpi-4-32b-2023.07.02.drv' (system: powerpc64le-linux) Oct 23 00:06:32 sjd-p9 cuirass[531]: C2q41fts: ping tcp://10.0.0.1:5555. Oct 23 00:06:32 sjd-p9 cuirass[531]: BrnR2Sju: ping tcp://10.0.0.1:5555. -- Oct 23 00:34:16 sjd-p9 cuirass[531]: vA6cYbb5: request work. Oct 23 00:34:16 sjd-p9 cuirass[531]: vA6cYbb5: building derivation `/gnu/store/lj4h1g1zq235l6i8mjh88s1ylvj7ksxd-u-boot-pine64-plus-2023.07.02.drv' (system: powerpc64le-linux) Oct 23 00:34:29 sjd-p9 cuirass[531]: N38vG7OW: derivation /gnu/store/s167lh0w0azfxx83ri8k0mn2gj57i5rl-u-boot-rpi-4-32b-2023.07.02.drv build failed. Oct 23 00:34:29 sjd-p9 cuirass[531]: N38vG7OW: request work. Oct 23 00:34:29 sjd-p9 cuirass[531]: N38vG7OW: building derivation `/gnu/store/dy8g5pi1gv8dza1av9957ic6zzj4l88j-u-boot-tools-2023.07.02.drv' (system: powerpc64le-linux) root@sjd-p9:~# guix build --log-file /gnu/store/s167lh0w0azfxx83ri8k0mn2gj57i5rl-u-boot-rpi-4-32b-2023.07.02.drv https://ci.guix.gnu.org/log/0yzfwl2b017gsnf60wvr7dnn2g26wrj7-u-boot-rpi-4-32b-2023.07.02 --8<---------------cut here---------------end--------------->8--- So it looks as though the build was attempted and failed, but there’s no local build log in /var/log/guix/drvs, meaning that, most likely, no build process was even started. The log of guix-daemon provide few hints: --8<---------------cut here---------------start------------->8--- root@sjd-p9:~# journalctl -u guix-daemon |grep 'Oct 23 00:' [...] Oct 23 00:06:21 sjd-p9 guix-daemon[534]: accepted connection from pid 531, user root Oct 23 00:06:21 sjd-p9 guix-daemon[534]: accepted connection from pid 531, user root Oct 23 00:06:55 sjd-p9 guix-daemon[534]: accepted connection from pid 531, user root Oct 23 00:07:51 sjd-p9 guix-daemon[534]: accepted connection from pid 531, user root Oct 23 00:16:25 sjd-p9 guix-daemon[534]: accepted connection from pid 531, user root Oct 23 00:17:15 sjd-p9 guix-daemon[534]: accepted connection from pid 531, user root Oct 23 00:33:42 sjd-p9 guix-daemon[2097426]: SIGPOLL Oct 23 00:33:42 sjd-p9 guix-daemon[534]: accepted connection from pid 531, user root Oct 23 00:33:49 sjd-p9 guix-daemon[2097426]: unexpected build daemon error: interrupted by the user Oct 23 00:34:16 sjd-p9 guix-daemon[534]: accepted connection from pid 531, user root Oct 23 00:34:16 sjd-p9 guix-daemon[1990649]: SIGPOLL Oct 23 00:34:23 sjd-p9 guix-daemon[1990649]: unexpected build daemon error: interrupted by the user Oct 23 00:34:29 sjd-p9 guix-daemon[2052532]: SIGPOLL Oct 23 00:34:29 sjd-p9 guix-daemon[534]: accepted connection from pid 531, user root Oct 23 00:34:30 sjd-p9 guix-daemon[2052532]: unexpected build daemon error: interrupted by the user Oct 23 00:34:38 sjd-p9 guix-daemon[1996946]: SIGPOLL Oct 23 00:34:38 sjd-p9 guix-daemon[534]: accepted connection from pid 531, user root Oct 23 00:34:38 sjd-p9 guix-daemon[1996946]: unexpected build daemon error: interrupted by the user Oct 23 00:34:45 sjd-p9 guix-daemon[2248556]: SIGPOLL Oct 23 00:34:45 sjd-p9 guix-daemon[534]: accepted connection from pid 531, user root Oct 23 00:34:45 sjd-p9 guix-daemon[2248556]: unexpected build daemon error: interrupted by the user Oct 23 00:35:33 sjd-p9 guix-daemon[534]: accepted connection from pid 531, user root Oct 23 00:36:46 sjd-p9 guix-daemon[534]: accepted connection from pid 531, user root --8<---------------cut here---------------end--------------->8--- One possibility would have been that ‘guix gc’ was running, preventing the build process from starting (“big GC lock” taken); however, if this were the case, we’d see a connection from a different process in the guix-daemon log. > or from <https://ci.guix.gnu.org/build/2294218/details> (builder: > hydra-guix-107, worker 4CWieLpQ) which ends with: Here we get: --8<---------------cut here---------------start------------->8--- root@hydra-guix-107 ~# grep -C2 jy463789qf42dn4zk5i5jzry2lf82ypy /var/log/cuirass-remote-worker.log 2023-10-23 00:17:34 vDKeRcHy: no available build. 2023-10-23 00:17:46 4CWieLpQ: request work. 2023-10-23 00:17:46 4CWieLpQ: building derivation `/gnu/store/jy463789qf42dn4zk5i5jzry2lf82ypy-u-boot-cubietruck-2023.07.02.drv' (system: x86_64-linux) 2023-10-23 00:18:03 zARGXQMa: request work. 2023-10-23 00:18:03 zARGXQMa: no available build. -- 2023-10-23 00:27:35 vDKeRcHy: request work. 2023-10-23 00:27:35 vDKeRcHy: no available build. 2023-10-23 00:27:42 4CWieLpQ: derivation /gnu/store/jy463789qf42dn4zk5i5jzry2lf82ypy-u-boot-cubietruck-2023.07.02.drv build failed. 2023-10-23 00:27:42 4CWieLpQ: request work. 2023-10-23 00:27:42 4CWieLpQ: no available build. root@hydra-guix-107 ~# ls /var/log/guix/drvs/*/*u-boot-cubietruck-2023* ls: cannot access '/var/log/guix/drvs/*/*u-boot-cubietruck-2023*': No such file or directory --8<---------------cut here---------------end--------------->8--- And likewise: --8<---------------cut here---------------start------------->8--- root@hydra-guix-107 ~# grep -C2 '10-23 00:27:' /var/log/guix-daemon.log 2023-10-23 00:17:46 accepted connection from pid 64022, user root 2023-10-23 00:19:34 accepted connection from pid 64022, user root 2023-10-23 00:27:42 SIGPOLL 2023-10-23 00:27:45 unexpected build daemon error: interrupted by the user 2023-10-23 00:33:33 accepted connection from pid 18016, user hydra 2023-10-23 00:33:34 accepted connection from pid 18035, user hydra --8<---------------cut here---------------end--------------->8--- But hey, as a matter of fact, ‘cuirass remote-server’ on berlin was restarted at around that time, as can be seen in /var/log/cuirass-remote-server.log: --8<---------------cut here---------------start------------->8--- 2023-10-23 00:26:06 0 items queued for eventual download 2023-10-23 00:26:55 period update: 0 resumable, 0 failed builds. 2023-10-23 00:26:58 error: failed to start worker/database proxy: Invalid argument 2023-10-23 00:26:59 increased maximum number of open files from 1024 to 2048 2023-10-23 00:26:59 listening for build logs on port 5556 2023-10-23 00:26:59 deleting old build logs from '/var/cache/cuirass/remote/'... 2023-10-23 00:28:49 selected 1115 build logs to remove 2023-10-23 00:28:49 WARNING: (cuirass scripts remote-server): imported module (fibers) overrides core binding `sleep' 2023-10-23 00:28:49 starting fetch worker with up to 8 concurrent downloads 2023-10-23 00:28:49 build failed: '/gnu/store/qlmhcyk9ac30464v99i2ys0irs7jz3lg-u-boot-am335x-evm-2023.07.02.drv' 2023-10-23 00:28:49 build failed: '/gnu/store/dl3far2xa4paq6nl6pp9iklpxrwg1wy2-ungoogled-chromium-wayland-112.0.5615.165-1.drv' 2023-10-23 00:28:50 build failed: '/gnu/store/y9fwzzgvyhvrmf0avihm6944j8rvpg6m-u-boot-mx6cuboxi-2023.07.02.drv' 2023-10-23 00:28:50 build failed: '/gnu/store/5bhl5q86j6bkqx8zybq29khndhf050jm-u-boot-rpi-2-2023.07.02.drv' 2023-10-23 00:28:50 build failed: '/gnu/store/16d5vq3dlzy1f4s7vazlfg6358bl8dlc-u-boot-qemu-arm-2023.07.02.drv' 2023-10-23 00:28:51 build failed: '/gnu/store/i88vczhcnlz462cmpls8qgx5n3ssw7rc-u-boot-qemu-arm-2023.07.02.drv' 2023-10-23 00:28:51 build failed: '/gnu/store/jy463789qf42dn4zk5i5jzry2lf82ypy-u-boot-cubietruck-2023.07.02.drv' 2023-10-23 00:28:51 build failed: '/gnu/store/i1ri2nkh1bcfzynmggbjg0rjlyi121y8-u-boot-rpi-3-32b-2023.07.02.drv' 2023-10-23 00:28:51 build failed: '/gnu/store/q08xw052r484c1c1nj31sx0fg5yf2py9-openjdk-18.0.2.drv' 2023-10-23 00:28:52 build failed: '/gnu/store/5j4pp2b5dx2ykw4na2wbmiwp84g0hi4i-u-boot-am335x-boneblack-2023.07.02.drv' 2023-10-23 00:28:52 build failed: '/gnu/store/nqp61id4bavvcsib307rw5234myl7ykr-u-boot-rpi-3-32b-efi-2023.07.02.drv' 2023-10-23 00:28:52 build failed: '/gnu/store/9h7x080cn44ik79mrzycbg2x0dvvgigc-u-boot-a20-olinuxino-lime-2023.07.02.drv' 2023-10-23 00:28:52 build failed: '/gnu/store/mkwy8ywa6mrw5zdxwr88skyw94afgaqv-u-boot-novena-2023.07.02.drv' 2023-10-23 00:28:53 build failed: '/gnu/store/lfjp146k0z0w2yqd3y6m2mv09fq01948-u-boot-sifive-unmatched-2023.07.02.drv' 2023-10-23 00:28:53 build failed: '/gnu/store/425ykg0ci4i612ryc7z79as94vipp9rb-u-boot-rpi-2-efi-2023.07.02.drv' 2023-10-23 00:28:57 build failed: '/gnu/store/7fyq24jsmvclnapjgf1li43w6dmm5n06-u-boot-mx6cuboxi-2023.07.02.drv' […] 2023-10-23 00:34:29 build failed: '/gnu/store/s167lh0w0azfxx83ri8k0mn2gj57i5rl-u-boot-rpi-4-32b-2023.07.02.drv' --8<---------------cut here---------------end--------------->8--- /var/log/messages shows that ‘remote-server’ failed and was respawned (the failure manifests as “failed to start worker/database proxy” in the cuirass-remote-server.log excerpt above): --8<---------------cut here---------------start------------->8--- Oct 23 00:24:33 localhost vmunix: [12534110.756389] BTRFS error (device sde2): parent transid verify failed on logical 30457856 mirror 2 wanted 61709 found 11 Oct 23 00:24:33 localhost vmunix: [12534110.768173] BTRFS error (device sde2): parent transid verify failed on logical 30457856 mirror 1 wanted 61709 found 11 Oct 23 00:24:34 localhost vmunix: [12534110.779454] BTRFS error (device sde2): parent transid verify failed on logical 30457856 mirror 2 wanted 61709 found 11 Oct 23 00:24:59 localhost ntpd[1953]: Soliciting pool server 81.169.208.5 Oct 23 00:26:04 localhost ntpd[1953]: Soliciting pool server 141.98.136.83 Oct 23 00:26:58 localhost shepherd[1]: Service cuirass-remote-server (PID 92414) exited with 1. Oct 23 00:26:58 localhost shepherd[1]: Respawning cuirass-remote-server. Oct 23 00:26:58 localhost shepherd[1]: Starting service user-homes... Oct 23 00:26:58 localhost shepherd[1]: Service user-homes has been started. Oct 23 00:26:58 localhost shepherd[1]: Service user-homes started. Oct 23 00:26:58 localhost shepherd[1]: Service user-homes running with value #t. Oct 23 00:26:58 localhost shepherd[1]: Starting service postgres-roles... Oct 23 00:26:58 localhost shepherd[1]: Service postgres-roles has been started. Oct 23 00:26:58 localhost shepherd[1]: Service postgres-roles started. Oct 23 00:26:58 localhost shepherd[1]: Service postgres-roles running with value #t. Oct 23 00:26:58 localhost shepherd[1]: Starting service cuirass-remote-server... Oct 23 00:26:58 localhost shepherd[1]: Service cuirass-remote-server has been started. Oct 23 00:26:58 localhost shepherd[1]: Service cuirass-remote-server started. Oct 23 00:26:58 localhost shepherd[1]: Service cuirass-remote-server running with value 63052. Oct 23 00:26:59 localhost vmunix: [12534255.969754] validate_extent_buffer: 58 callbacks suppressed Oct 23 00:26:59 localhost vmunix: [12534255.969764] BTRFS error (device sde2): parent transid verify failed on logical 30457856 mirror 1 wanted 61709 found 11 --8<---------------cut here---------------end--------------->8--- To summarize: 1. ‘cuirass remote-server’ failed for some unknown reason and was respawned. 2. All the ongoing and about-to-be-started builds carried out by worker processes at that point were marked as failed. We can improve on both parts. To be continued… Ludo’.