Hello, Mathieu Othacehe <othac...@gnu.org> writes:
> Hello, > > A lot of builds, among them ~20 system tests[1], are failing with: > "cannot build missing derivation > ?/gnu/store/hs6kp1lqgymhyp3jndc0dsp0pn4psgv0-gui-installed-desktop-os-encrypted.drv?" > errors. > > Those derivations are present on the CI head node. This means that the > errors occur during substitution. This is most likely caused by some > issue with the publish server, because: > > - The publish server serves a 404 error. We should get rid once and for > all of this 404 thing, pushing something like: > https://issues.guix.gnu.org/50040. > > or > > - The publish server is not fast enough and hits an Nginx timeout that > closes the communication. > > Any other cause I could be missing? Looking at multiple of recent 'cannot build missing derivation' build failures on Cuirass, I see for example: --8<---------------cut here---------------start------------->8--- substitute: substitute: [Kupdating substitutes from 'http://141.80.167.131'... 0.0% substitute: [Kcould not fetch http://141.80.167.131/rhgrs3ac6h64siz0krqh2ia8kkn3h6ym.narinfo 504 substitute: updating substitutes from 'http://141.80.167.131'... 100.0% cannot build missing derivation ?/gnu/store/rhgrs3ac6h64siz0krqh2ia8kkn3h6ym-python-asdf-standard-1.0.3.drv? --8<---------------cut here---------------end--------------->8--- So it seems the error originated from guix-publish being too heavily under load to produce a timely reply, and the nginx proxy issued a 504 (timeout) error response. Looking into /var/log/guix-publish.log for a corresponding entry, I found: --8<---------------cut here---------------start------------->8--- 2023-08-21 23:59:35 GET /rhgrs3ac6h64siz0krqh2ia8kkn3h6ym.narinfo 2023-08-21 23:59:35 In web/server/http.scm: 2023-08-21 23:59:35 159:7 2 (http-write #<<http-server> socket: #<input-output: fi…> …) 2023-08-21 23:59:35 In unknown file: 2023-08-21 23:59:35 1 (put-bytevector #<input-output: socket 42> #vu8(83 # …) …) 2023-08-21 23:59:35 In ice-9/boot-9.scm: 2023-08-21 23:59:35 1685:16 0 (raise-exception _ #:continuable? _) 2023-08-21 23:59:35 In procedure fport_write: Broken pipe --8<---------------cut here---------------end--------------->8--- So the connection was apparently severed (?), resulting in the "broken pipe" error. Here's a different one: --8<---------------cut here---------------start------------->8--- substitute: substitute: [Kupdating substitutes from 'http://141.80.167.131'... 0.0% substitute: [Kcould not fetch http://141.80.167.131/p2lfyvbxicjqsm4qp6368bx76gp0g948.narinfo 504 substitute: updating substitutes from 'http://141.80.167.131'... 100.0% cannot build missing derivation ?/gnu/store/p2lfyvbxicjqsm4qp6368bx76gp0g948-python-astropy-healpix-0.7.drv? --8<---------------cut here---------------end--------------->8--- it occurred around the same time, and the failing mode was the same, per guix-publish.log: --8<---------------cut here---------------start------------->8--- 2023-08-21 23:59:35 GET /p2lfyvbxicjqsm4qp6368bx76gp0g948.narinfo 2023-08-21 23:59:35 In web/server/http.scm: 2023-08-21 23:59:35 159:7 2 (http-write #<<http-server> socket: #<input-output: fi…> …) 2023-08-21 23:59:35 In unknown file: 2023-08-21 23:59:35 1 (put-bytevector #<input-output: socket 50> #vu8(83 # …) …) 2023-08-21 23:59:35 In ice-9/boot-9.scm: 2023-08-21 23:59:35 1685:16 0 (raise-exception _ #:continuable? _) 2023-08-21 23:59:35 In procedure fport_write: Broken pipe --8<---------------cut here---------------end--------------->8--- I wonder if these could be related to the DDoS protection discovered on the Berlin network. I'll keep looking for other, potentially different occurrences. -- Thanks, Maxim