Hi all, On Thu, Apr 04, 2024 at 11:33 PM, Ludovic Courtès wrote:
> Hello! > > News from the everlasting bug! > > cannot build missing derivation > ‘/gnu/store/dfgc46q3l8wlnymv49a1wjnxypin8p0y-plink-1.07.drv’ > > (From <https://ci.guix.gnu.org/build/3861708/>.) > > Why was it missing this time? /var/log/nginx/error.log: > > 2024/04/04 17:15:03 [error] 98751#0: *152293778 upstream timed out (110: > Connection timed out) while reading response header from upstream, client: > 141.80.167.169, server: ci.guix.gnu.org, request: "GET > /dfgc46q3l8wlnymv49a1wjnxypin8p0y.narinfo HTTP/1.1", upstream: > "http://127.0.0.1:3000/dfgc46q3l8wlnymv49a1wjnxypin8p0y.narinfo", host: > "141.80.167.131" > > > Oops! (There are dozens of upstream timeouts logged on that minute.) > > /var/log/guix-publish.log: > > 2024-04-04 17:14:51 GET > /nar/lzip/pz39bkq7pd1hgy5rwiynqa33gyjvpgs5-python-pygments-2.12.0 > 2024-04-04 17:14:51 GET /z2xxwwxswdd4b8c8iwmxhqnqbp5nwz09.narinfo > 2024-04-04 17:14:51 GET /lgyck285bsxzwrnh3x5ix5dwzd3n3wga.narinfo > 2024-04-04 17:14:51 GET > /nar/zstd/jxkglr445f215m2faqz1i2lgmbans4rf-texlive-amsmath-66594-doc > 2024-04-04 17:15:33 GET /qg5cxb869i42jn7x2dm6k5l41ikkz21w.narinfo > 2024-04-04 17:15:33 GET > /nar/zstd/i2hp3q2pfhsyl0al7z38am7cqpddi4qr-texlive-capt-of-66594-doc > 2024-04-04 17:15:33 GET /hh0gdbljj3cjdnjbr88kfm21mhys5sy7.narinfo > 2024-04-04 17:15:33 GET /dfgc46q3l8wlnymv49a1wjnxypin8p0y.narinfo > 2024-04-04 17:15:33 GET /yj63wifalfr6sla42h7mkqg011qrl5d0.narinfo > 2024-04-04 17:15:33 GET /h2s2g2adxbnd67g34mnjnpcr6p3nhr69.narinfo > 2024-04-04 17:15:33 -> GET /h2s2g2adxbnd67g34mnjnpcr6p3nhr69.narinfo: 404 > 2024-04-04 17:15:33 GET > /nar/lzip/6zxlrw15b9dsv73s7v5fqabl7iv5v5il-python-exceptiongroup-1.1.1 > 2024-04-04 17:15:33 GET > /nar/zstd/pychjd114abscbqlzcr3s7myf1497vw2-julia-compilersupportlibraries-jll-0.4.0%2B1 > > ‘guix publish’ replied, but 40s too late (nginx has > “proxy_connect_timeout 10s;” for .narinfo URLs¹). > > Notice the 40s pause time between 17:14:51 and 17:15:33. Stop-the-world > GC? Unlikely, because ‘guix publish’ had been running for ~3h, so even > with a leak², it’s hard to believe GC could take this long. > > Ludo’. > > ¹ > https://git.savannah.gnu.org/cgit/guix/maintenance.git/tree/hydra/nginx/berlin.scm#n103 > ² https://issues.guix.gnu.org/69596 I don't have any insight, but if anyone wants to see this in action at a large scale, take look at pretty much any red dot on https://ci.guix.gnu.org/eval/1238471/dashboard?system=i686-linux >From my quick look all the CL and texlive failures were all missing derivation. I've tried restarting a bunch to get i686 coverage going, so hopefully some will disappear. But I can't/won't manually restart the thousands(?) of failed builds. I didn't see such issues on x86_64, while other architectures take a really long time to build on Berlin so I haven't looked. I don't know if this is helpful, but thought I would chime in if anyone wants potentially a bunch of data. And if there are good ideas to recover (just restart all builds?) that would be great so mesa-updates will be build on i686 since otherwise it looks good. Thanks! John