On Fri, May 07, 2021 at 01:18:19PM -0400, Tom Lane wrote: > Realizing that 9989d37d prevents the assertion failure, I went > to see if thorntail had shown EIO failures without assertions. > Looking back 180 days, I found these: > > sysname | branch | snapshot | stage | > l > > -----------+---------------+---------------------+--------------------+------------------------------------------------------------------------------------------------------------------------------------------------ > thorntail | HEAD | 2021-03-19 21:28:15 | recoveryCheck | > 2021-03-20 00:48:48.117 MSK [4089174:11] 008_fsm_truncation.pl PANIC: could > not fdatasync file "000000010000000000000002": Input/output error > thorntail | HEAD | 2021-04-06 16:08:10 | recoveryCheck | > 2021-04-06 19:30:54.103 MSK [3355008:11] 008_fsm_truncation.pl PANIC: could > not fdatasync file "000000010000000000000002": Input/output error > thorntail | REL9_6_STABLE | 2021-04-12 02:38:04 | pg_basebackupCheck | > pg_basebackup: could not fsync file "000000010000000000000013": Input/output > error > > So indeed the kernel-or-hardware problem is affecting other branches.
Having a flaky buildfarm member is bad news. I'll LD_PRELOAD the attached to prevent fsync from reaching the kernel. Hopefully, that will make the hardware-or-kernel trouble unreachable. (Changing 008_fsm_truncation.pl wouldn't avoid this, because fsync=off doesn't affect syncs outside the backend.)
/* gcc -fPIC -shared never_sync.c -o never_sync.so */ int fsync(int fd) { return 0; } int fdatasync(int fd) { return fsync(fd); }