Hi, Bichir's been stuck for the past month and is unable to run regression tests since 6a2a70a02018d6362f9841cc2f499cc45405e86b.
It is interesting that that commit's a month old and probably no other client has complained since, but diving in, I can see that it's been unable to even start regression tests after that commit went in. Note that Bichir is running on WSL1 (not WSL2) - i.e. Windows Subsystem for Linux inside Windows 10 - and so isn't really production use-case. The only run that actually got submitted to Buildfarm was from a few days back when I killed it after a long wait - see [1]. Since yesterday, I have another run that's again stuck on CREATE DATABASE (see outputs below) and although pstack not working may be a limitation of the architecture / installation (unsure), a trace shows it is stuck at poll. Tracing commits, it seems that the commit 6a2a70a02018d6362f9841cc2f499cc45405e86b broke things and I can confirm that 'make check' works if I rollback to the preceding commit ( 83709a0d5a46559db016c50ded1a95fd3b0d3be6 ). Not sure if many agree but 2 things stood out here: 1) Buildfarm never got the message that a commit broke an instance. Ideally I'd have expected buildfarm to have an optimistic timeout that could have helped - for e.g. right now, the CREATE DATABASE is still stuck since 18 hrs. 2) bichir is clearly not a production use-case (it takes 5 hrs to complete a HEAD run!), so let me know if this change is intentional (I guess I'll stop maintaining it if so) but thought I'd still put this out in case it interests someone. - thanks robins Reference: 1) Last run that I had to kill - https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bichir&dt=2021-03-31%2012%3A00%3A05 ##################################################### The current run is running since yesterday. postgres@WSLv1:/opt/postgres/bf/v11/buildroot/HEAD/bichir.lastrun-logs$ tail -2 lastcommand.log running on port 5678 with PID 8715 ============== creating database "regression" ============== postgres@WSLv1:/opt/postgres/bf/v11/buildroot/HEAD/bichir.lastrun-logs$ date Wed Apr 7 12:48:26 AEST 2021 postgres@WSLv1:/opt/postgres/bf/v11/buildroot/HEAD/bichir.lastrun-logs$ ls -la total 840 drwxrwxr-x 1 postgres postgres 4096 Apr 6 09:00 . drwxrwxr-x 1 postgres postgres 4096 Apr 6 08:55 .. -rw-rw-r-- 1 postgres postgres 1358 Apr 6 08:55 SCM-checkout.log -rw-rw-r-- 1 postgres postgres 91546 Apr 6 08:56 configure.log -rw-rw-r-- 1 postgres postgres 40 Apr 6 08:55 githead.log -rw-rw-r-- 1 postgres postgres 2890 Apr 6 09:01 lastcommand.log -rw-rw-r-- 1 postgres postgres 712306 Apr 6 09:00 make.log root@WSLv1:~# pstack 8729 8729: psql -X -c CREATE DATABASE "regression" TEMPLATE=template0 LC_COLLATE='C' LC_CTYPE='C' postgres pstack: Bad address failed to read target. root@WSLv1:~# gdb -batch -ex bt -p 8729 [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". 0x00007f41a8ea4c84 in __GI___poll (fds=fds@entry=0x7fffe13d7be8, nfds=nfds@entry=1, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29 29 ../sysdeps/unix/sysv/linux/poll.c: No such file or directory. #0 0x00007f41a8ea4c84 in __GI___poll (fds=fds@entry=0x7fffe13d7be8, nfds=nfds@entry=1, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29 #1 0x00007f41a9bc8eb1 in poll (__timeout=<optimized out>, __nfds=1, __fds=0x7fffe13d7be8) at /usr/include/x86_64-linux-gnu/bits/poll2.h:46 #2 pqSocketPoll (end_time=-1, forWrite=0, forRead=1, sock=<optimized out>) at fe-misc.c:1133 #3 pqSocketCheck (conn=0x7fffd979a0b0, forRead=1, forWrite=0, end_time=-1) at fe-misc.c:1075 #4 0x00007f41a9bc8ff0 in pqWaitTimed (forRead=<optimized out>, forWrite=<optimized out>, conn=0x7fffd979a0b0, finish_time=<optimized out>) at fe-misc.c:1007 #5 0x00007f41a9bc5ac9 in PQgetResult (conn=0x7fffd979a0b0) at fe-exec.c:1963 #6 0x00007f41a9bc5ea3 in PQexecFinish (conn=0x7fffd979a0b0) at fe-exec.c:2306 #7 0x00007f41a9bc5ef2 in PQexec (conn=<optimized out>, query=query@entry=0x7fffd9799f70 "CREATE DATABASE \"regression\" TEMPLATE=template0 LC_COLLATE='C' LC_CTYPE='C'") at fe-exec.c:2148 #8 0x00007f41aa21e7a0 in SendQuery (query=0x7fffd9799f70 "CREATE DATABASE \"regression\" TEMPLATE=template0 LC_COLLATE='C' LC_CTYPE='C'") at common.c:1303 #9 0x00007f41aa2160a6 in main (argc=<optimized out>, argv=<optimized out>) at startup.c:369 ##################################################### Here we can see that 83709a0d5a46559db016c50ded1a95fd3b0d3be6 goes past 'CREATE DATABASE' ======================= robins@WSLv1:~/proj/postgres/postgres$ git checkout 83709a0d5a46559db016c50ded1a95fd3b0d3be6 Previous HEAD position was 6a2a70a020 Use signalfd(2) for epoll latches. HEAD is now at 83709a0d5a Use SIGURG rather than SIGUSR1 for latches. robins@WSLv1:~/proj/postgres/postgres$ cd src/test/regress/ robins@WSLv1:~/proj/postgres/postgres/src/test/regress$ make -j4 NO_LOCALE=1 check make -C ../../../src/backend generated-headers rm -rf ./testtablespace make[1]: Entering directory '/home/robins/proj/postgres/postgres/src/backend' make -C catalog distprep generated-header-symlinks make -C utils distprep generated-header-symlinks mkdir ./testtablespace make[2]: Entering directory '/home/robins/proj/postgres/postgres/src/backend/utils' make[2]: Nothing to be done for 'distprep'. make[2]: Nothing to be done for 'generated-header-symlinks'. make[2]: Leaving directory '/home/robins/proj/postgres/postgres/src/backend/utils' make[2]: Entering directory '/home/robins/proj/postgres/postgres/src/backend/catalog' make[2]: Nothing to be done for 'distprep'. make[2]: Nothing to be done for 'generated-header-symlinks'. make[2]: Leaving directory '/home/robins/proj/postgres/postgres/src/backend/catalog' make[1]: Leaving directory '/home/robins/proj/postgres/postgres/src/backend' make -C ../../../src/port all rm -rf '/home/robins/proj/postgres/postgres'/tmp_install make[1]: Entering directory '/home/robins/proj/postgres/postgres/src/port' make[1]: Nothing to be done for 'all'. make[1]: Leaving directory '/home/robins/proj/postgres/postgres/src/port' make -C ../../../src/common all make[1]: Entering directory '/home/robins/proj/postgres/postgres/src/common' make[1]: Nothing to be done for 'all'. make[1]: Leaving directory '/home/robins/proj/postgres/postgres/src/common' make -C ../../../contrib/spi make[1]: Entering directory '/home/robins/proj/postgres/postgres/contrib/spi' make[1]: Nothing to be done for 'all'. make[1]: Leaving directory '/home/robins/proj/postgres/postgres/contrib/spi' /bin/mkdir -p '/home/robins/proj/postgres/postgres'/tmp_install/log make -C '../../..' DESTDIR='/home/robins/proj/postgres/postgres'/tmp_install install >'/home/robins/proj/postgres/postgres'/tmp_install/log/install.log 2>&1 make -j1 checkprep >>'/home/robins/proj/postgres/postgres'/tmp_install/log/install.log 2>&1 PATH="/home/robins/proj/postgres/postgres/tmp_install/opt/postgres/master/bin:$PATH" LD_LIBRARY_PATH="/home/robins/proj/postgres/postgres/tmp_install/opt/postgres/master/li b" ../../../src/test/regress/pg_regress --temp-instance=./tmp_check --inputdir=. --bindir= --no-locale --dlpath=. --max-concurrent-tests=20 --schedule=./parallel_sched ule ============== removing existing temp instance ============== ============== creating temporary instance ============== ============== initializing database system ============== ============== starting postmaster ============== running on port 58080 with PID 25879 ============== creating database "regression" ============== CREATE DATABASE ALTER DATABASE ============== running regression test queries ============== test tablespace ... ok 1239 ms parallel group (20 tests): boolean char varchar name text int2 int4 int8 oid float4 float8 bit^CGNUmakefile:132: recipe for target 'check' failed make: *** [check] Interrupt But checking out 6a2a70a02018d6362f9841cc2f499cc45405e86b we can see that it hangs at 'CREATE DATABASE' ======================================= robins@WSLv1:~/proj/postgres/postgres/src/test/regress$ git checkout 6a2a70a02018d6362f9841cc2f499cc45405e86b Previous HEAD position was 83709a0d5a Use SIGURG rather than SIGUSR1 for latches. HEAD is now at 6a2a70a020 Use signalfd(2) for epoll latches. robins@WSLv1:~/proj/postgres/postgres/src/test/regress$ make -j4 NO_LOCALE=1 check make -C ../../../src/backend generated-headers rm -rf ./testtablespace make[1]: Entering directory '/home/robins/proj/postgres/postgres/src/backend' make -C catalog distprep generated-header-symlinks make -C utils distprep generated-header-symlinks mkdir ./testtablespace make[2]: Entering directory '/home/robins/proj/postgres/postgres/src/backend/utils' make[2]: Nothing to be done for 'distprep'. make[2]: Nothing to be done for 'generated-header-symlinks'. make[2]: Leaving directory '/home/robins/proj/postgres/postgres/src/backend/utils' make[2]: Entering directory '/home/robins/proj/postgres/postgres/src/backend/catalog' make[2]: Nothing to be done for 'distprep'. make[2]: Nothing to be done for 'generated-header-symlinks'. make[2]: Leaving directory '/home/robins/proj/postgres/postgres/src/backend/catalog' make[1]: Leaving directory '/home/robins/proj/postgres/postgres/src/backend' make -C ../../../src/port all rm -rf '/home/robins/proj/postgres/postgres'/tmp_install make[1]: Entering directory '/home/robins/proj/postgres/postgres/src/port' make[1]: Nothing to be done for 'all'. make[1]: Leaving directory '/home/robins/proj/postgres/postgres/src/port' make -C ../../../src/common all make[1]: Entering directory '/home/robins/proj/postgres/postgres/src/common' make[1]: Nothing to be done for 'all'. make[1]: Leaving directory '/home/robins/proj/postgres/postgres/src/common' make -C ../../../contrib/spi make[1]: Entering directory '/home/robins/proj/postgres/postgres/contrib/spi' make[1]: Nothing to be done for 'all'. make[1]: Leaving directory '/home/robins/proj/postgres/postgres/contrib/spi' /bin/mkdir -p '/home/robins/proj/postgres/postgres'/tmp_install/log make -C '../../..' DESTDIR='/home/robins/proj/postgres/postgres'/tmp_install install >'/home/robins/proj/postgres/postgres'/tmp_install/log/install.log 2>&1 make -j1 checkprep >>'/home/robins/proj/postgres/postgres'/tmp_install/log/install.log 2>&1 PATH="/home/robins/proj/postgres/postgres/tmp_install/opt/postgres/master/bin:$PATH" LD_LIBRARY_PATH="/home/robins/proj/postgres/postgres/tmp_install/opt/postgres/master/lib" ../../../src/test/regress/pg_regress --temp-instance=./tmp_check --inputdir=. --bindir= --no-locale --dlpath=. --max-concurrent-tests=20 --schedule=./parallel_schedule ============== removing existing temp instance ============== ============== creating temporary instance ============== ============== initializing database system ============== ============== starting postmaster ============== running on port 58080 with PID 26702 ============== creating database "regression" ============== stuck here ^^^ ^CCancel request sent FATAL: terminating connection due to administrator command server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. connection to server was lost command failed: "psql" -X -c "CREATE DATABASE \"regression\" TEMPLATE=template0 LC_COLLATE='C' LC_CTYPE='C'" "postgres" pg_ctl: PID file "/home/robins/proj/postgres/postgres/src/test/regress/./tmp_check/data/postmaster.pid" does not exist Is server running? pg_regress: could not stop postmaster: exit code was 256 GNUmakefile:132: recipe for target 'check' failed make: *** [check] Interrupt