In a moment of idleness I tried to run the TAP tests on pademelon, which is a mighty old and slow machine. Behold, src/test/recovery/t/010_logical_decoding_timelines.pl fell over, with the relevant section of its log contents being:
# testing logical timeline following with a filesystem-level copy # Taking filesystem backup b1 from node "master" # pg_start_backup: 0/2000028 could not opendir(/home/postgres/pgsql/src/test/recovery/tmp_check/t_010_logical_decoding_timelines_master_data/pgdata/pg_wal/archive_status/000000010000000000000001.ready): No such file or directory at ../../../src/test/perl//RecursiveCopy.pm line 115. ### Stopping node "master" using mode immediate The postmaster log has this relevant entry: 2017-09-08 22:03:22.917 EDT [19160] DEBUG: archived write-ahead log file "000000010000000000000001" It looks to me like the archiver removed 000000010000000000000001.ready between where RecursiveCopy.pm checks that $srcpath is a regular file or directory (line 95) and where it rechecks whether it's a regular file (line 105). Then the "-f" test on line 105 fails, allowing it to fall through to the its-a-directory path, and unsurprisingly the opendir at line 115 fails with the above symptom. In short, RecursiveCopy.pm is woefully unprepared for the idea that the source directory tree might be changing underneath it. I'm not real sure if the appropriate answer to this is "we need to fix RecursiveCopy" or "we need to fix the callers to not call RecursiveCopy until the source directory is stable". Thoughts? (I do kinda wonder why we rolled our own RecursiveCopy; surely there's a better implementation in CPAN?) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers