Re: To Thomas Munro > 603 iterations later it hit again, but didn't log anything. (I believe > I did run "make" in the right directory.)
This time it took 3086 iterations to hit the problem. Running c27f8621eedf7 + Debian patches + v8 + pgstat-report-conflicts-immediately.patch + the XXX logging. > [22:20:24.714](3.145s) # issuing query via background psql: > # BEGIN; > # DECLARE test_recovery_conflict_cursor CURSOR FOR SELECT b FROM > test_recovery_conflict_table1; > # FETCH FORWARD FROM test_recovery_conflict_cursor; > [22:20:24.745](0.031s) ok 1 - buffer pin conflict: cursor with conflicting > pin established > Waiting for replication conn standby's replay_lsn to pass 0/3430000 on primary > done > timed out waiting for match: (?^:User was holding shared buffer pin for too > long) at t/031_recovery_conflict.pl line 318. No XXX lines this time either, but I've seen then im logfiles that went through successfully. > Perhaps this can simply be attributed to the machine being too busy. With the patches, the problem of dying so often that builds targeting several distributions in parallel will usually fail is gone. Christoph
2023-08-10 01:00:48.765 UTC [892297] LOG: starting PostgreSQL 17devel (Debian 17~~devel-1) on s390x-ibm-linux-gnu, compiled by gcc (Debian 13.2.0-1) 13.2.0, 64-bit 2023-08-10 01:00:48.765 UTC [892297] LOG: listening on Unix socket "/tmp/5M7Y6uSm5n/.s.PGSQL.63013" 2023-08-10 01:00:48.769 UTC [892300] LOG: database system was shut down at 2023-08-10 01:00:48 UTC 2023-08-10 01:00:48.772 UTC [892297] LOG: database system is ready to accept connections 2023-08-10 01:00:48.867 UTC [892305] 031_recovery_conflict.pl LOG: statement: CREATE TABLESPACE test_recovery_conflict_tblspc LOCATION '' 2023-08-10 01:00:48.889 UTC [892307] 031_recovery_conflict.pl LOG: received replication command: SHOW data_directory_mode 2023-08-10 01:00:48.889 UTC [892307] 031_recovery_conflict.pl STATEMENT: SHOW data_directory_mode 2023-08-10 01:00:48.896 UTC [892307] 031_recovery_conflict.pl LOG: received replication command: SHOW wal_segment_size 2023-08-10 01:00:48.896 UTC [892307] 031_recovery_conflict.pl STATEMENT: SHOW wal_segment_size 2023-08-10 01:00:48.901 UTC [892307] 031_recovery_conflict.pl LOG: received replication command: IDENTIFY_SYSTEM 2023-08-10 01:00:48.901 UTC [892307] 031_recovery_conflict.pl STATEMENT: IDENTIFY_SYSTEM 2023-08-10 01:00:48.907 UTC [892307] 031_recovery_conflict.pl LOG: received replication command: BASE_BACKUP ( LABEL 'pg_basebackup base backup', PROGRESS, CHECKPOINT 'fast', WAIT 0, MANIFEST 'yes', TARGET 'client') 2023-08-10 01:00:48.907 UTC [892307] 031_recovery_conflict.pl STATEMENT: BASE_BACKUP ( LABEL 'pg_basebackup base backup', PROGRESS, CHECKPOINT 'fast', WAIT 0, MANIFEST 'yes', TARGET 'client') 2023-08-10 01:00:48.913 UTC [892298] LOG: checkpoint starting: immediate force wait 2023-08-10 01:00:48.921 UTC [892298] LOG: checkpoint complete: wrote 7 buffers (5.5%); 0 WAL file(s) added, 0 removed, 1 recycled; write=0.001 s, sync=0.001 s, total=0.009 s; sync files=0, longest=0.000 s, average=0.000 s; distance=11350 kB, estimate=11350 kB; lsn=0/2000060, redo lsn=0/2000028 2023-08-10 01:00:48.939 UTC [892308] 031_recovery_conflict.pl LOG: received replication command: SHOW data_directory_mode 2023-08-10 01:00:48.939 UTC [892308] 031_recovery_conflict.pl STATEMENT: SHOW data_directory_mode 2023-08-10 01:00:48.944 UTC [892308] 031_recovery_conflict.pl LOG: received replication command: CREATE_REPLICATION_SLOT "pg_basebackup_892308" TEMPORARY PHYSICAL ( RESERVE_WAL) 2023-08-10 01:00:48.944 UTC [892308] 031_recovery_conflict.pl STATEMENT: CREATE_REPLICATION_SLOT "pg_basebackup_892308" TEMPORARY PHYSICAL ( RESERVE_WAL) 2023-08-10 01:00:48.950 UTC [892308] 031_recovery_conflict.pl LOG: received replication command: IDENTIFY_SYSTEM 2023-08-10 01:00:48.950 UTC [892308] 031_recovery_conflict.pl STATEMENT: IDENTIFY_SYSTEM 2023-08-10 01:00:48.952 UTC [892308] 031_recovery_conflict.pl LOG: received replication command: START_REPLICATION SLOT "pg_basebackup_892308" 0/2000000 TIMELINE 1 2023-08-10 01:00:48.952 UTC [892308] 031_recovery_conflict.pl STATEMENT: START_REPLICATION SLOT "pg_basebackup_892308" 0/2000000 TIMELINE 1 2023-08-10 01:00:49.188 UTC [892307] 031_recovery_conflict.pl LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp892307.0", size 137324 2023-08-10 01:00:49.188 UTC [892307] 031_recovery_conflict.pl STATEMENT: BASE_BACKUP ( LABEL 'pg_basebackup base backup', PROGRESS, CHECKPOINT 'fast', WAIT 0, MANIFEST 'yes', TARGET 'client') 2023-08-10 01:00:49.487 UTC [892319] standby LOG: received replication command: IDENTIFY_SYSTEM 2023-08-10 01:00:49.487 UTC [892319] standby STATEMENT: IDENTIFY_SYSTEM 2023-08-10 01:00:49.493 UTC [892319] standby LOG: received replication command: START_REPLICATION 0/3000000 TIMELINE 1 2023-08-10 01:00:49.493 UTC [892319] standby STATEMENT: START_REPLICATION 0/3000000 TIMELINE 1 2023-08-10 01:00:49.546 UTC [892321] 031_recovery_conflict.pl LOG: statement: CREATE DATABASE test_db 2023-08-10 01:00:49.598 UTC [892323] 031_recovery_conflict.pl LOG: statement: CREATE TABLE test_recovery_conflict_table1(a int, b int); 2023-08-10 01:00:49.603 UTC [892323] 031_recovery_conflict.pl LOG: statement: INSERT INTO test_recovery_conflict_table1 SELECT i % 3, 0 FROM generate_series(1,20) i; 2023-08-10 01:00:49.609 UTC [892323] 031_recovery_conflict.pl LOG: statement: CREATE TABLE test_recovery_conflict_table2(a int, b int); 2023-08-10 01:00:49.627 UTC [892325] 031_recovery_conflict.pl LOG: statement: SELECT pg_current_wal_flush_lsn() 2023-08-10 01:00:49.648 UTC [892327] 031_recovery_conflict.pl LOG: statement: SELECT '0/342B100' <= replay_lsn AND state = 'streaming' FROM pg_catalog.pg_stat_replication WHERE application_name IN ('standby', 'walreceiver') 2023-08-10 01:00:49.696 UTC [892331] 031_recovery_conflict.pl LOG: statement: BEGIN; 2023-08-10 01:00:49.702 UTC [892331] 031_recovery_conflict.pl LOG: statement: INSERT INTO test_recovery_conflict_table1 VALUES (1,0); 2023-08-10 01:00:49.707 UTC [892331] 031_recovery_conflict.pl LOG: statement: ROLLBACK; 2023-08-10 01:00:49.712 UTC [892331] 031_recovery_conflict.pl LOG: statement: BEGIN; 2023-08-10 01:00:49.712 UTC [892331] 031_recovery_conflict.pl LOG: statement: LOCK test_recovery_conflict_table1; 2023-08-10 01:00:49.718 UTC [892331] 031_recovery_conflict.pl LOG: statement: COMMIT; 2023-08-10 01:00:49.742 UTC [892333] 031_recovery_conflict.pl LOG: statement: SELECT pg_current_wal_flush_lsn() 2023-08-10 01:00:49.766 UTC [892335] 031_recovery_conflict.pl LOG: statement: SELECT '0/342B1C0' <= replay_lsn AND state = 'streaming' FROM pg_catalog.pg_stat_replication WHERE application_name IN ('standby', 'walreceiver') 2023-08-10 01:00:49.806 UTC [892337] 031_recovery_conflict.pl LOG: statement: VACUUM test_recovery_conflict_table1; 2023-08-10 01:00:49.843 UTC [892339] 031_recovery_conflict.pl LOG: statement: SELECT pg_current_wal_flush_lsn() 2023-08-10 01:00:49.857 UTC [892341] 031_recovery_conflict.pl LOG: statement: SELECT '0/3430000' <= replay_lsn AND state = 'streaming' FROM pg_catalog.pg_stat_replication WHERE application_name IN ('standby', 'walreceiver') 2023-08-10 01:00:50.359 UTC [892343] 031_recovery_conflict.pl LOG: statement: SELECT '0/3430000' <= replay_lsn AND state = 'streaming' FROM pg_catalog.pg_stat_replication WHERE application_name IN ('standby', 'walreceiver') 2023-08-10 01:03:54.140 UTC [892297] LOG: received immediate shutdown request 2023-08-10 01:03:54.148 UTC [892297] LOG: database system is shut down
2023-08-10 01:00:49.463 UTC [892314] LOG: starting PostgreSQL 17devel (Debian 17~~devel-1) on s390x-ibm-linux-gnu, compiled by gcc (Debian 13.2.0-1) 13.2.0, 64-bit 2023-08-10 01:00:49.463 UTC [892314] LOG: listening on Unix socket "/tmp/5M7Y6uSm5n/.s.PGSQL.63014" 2023-08-10 01:00:49.465 UTC [892317] LOG: database system was interrupted; last known up at 2023-08-10 01:00:48 UTC 2023-08-10 01:00:49.465 UTC [892317] LOG: entering standby mode 2023-08-10 01:00:49.466 UTC [892317] LOG: redo starts at 0/2000028 2023-08-10 01:00:49.466 UTC [892317] WARNING: unexpected directory entry "16384" found in pg_tblspc/ 2023-08-10 01:00:49.466 UTC [892317] DETAIL: All directory entries in pg_tblspc/ should be symbolic links. 2023-08-10 01:00:49.466 UTC [892317] HINT: Remove those directories, or set allow_in_place_tablespaces to ON transiently to let recovery complete. 2023-08-10 01:00:49.466 UTC [892317] LOG: consistent recovery state reached at 0/2000100 2023-08-10 01:00:49.470 UTC [892314] LOG: database system is ready to accept read-only connections 2023-08-10 01:00:49.494 UTC [892318] LOG: started streaming WAL from primary at 0/3000000 on timeline 1 2023-08-10 01:00:49.773 UTC [892329] 031_recovery_conflict.pl LOG: statement: BEGIN; 2023-08-10 01:00:49.774 UTC [892329] 031_recovery_conflict.pl LOG: statement: DECLARE test_recovery_conflict_cursor CURSOR FOR SELECT b FROM test_recovery_conflict_table1; 2023-08-10 01:00:49.774 UTC [892329] 031_recovery_conflict.pl LOG: statement: FETCH FORWARD FROM test_recovery_conflict_cursor; 2023-08-10 01:00:49.780 UTC [892329] 031_recovery_conflict.pl LOG: statement: ; 2023-08-10 01:03:54.144 UTC [892318] FATAL: could not receive data from WAL stream: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. 2023-08-10 01:03:54.146 UTC [892317] LOG: invalid record length at 0/347E6E0: expected at least 24, got 0 2023-08-10 01:03:54.153 UTC [892427] FATAL: could not connect to the primary server: connection to server on socket "/tmp/5M7Y6uSm5n/.s.PGSQL.63013" failed: No such file or directory Is the server running locally and accepting connections on that socket? 2023-08-10 01:03:54.156 UTC [892317] LOG: waiting for WAL to become available at 0/347E6F8 2023-08-10 01:03:54.248 UTC [892314] LOG: received immediate shutdown request 2023-08-10 01:03:54.252 UTC [892314] LOG: database system is shut down
# Checking port 63013 # Found port 63013 Name: primary Data directory: /home/myon/postgresql/pg/postgresql/build/src/test/recovery/tmp_check/t_031_recovery_conflict_primary_data/pgdata Backup directory: /home/myon/postgresql/pg/postgresql/build/src/test/recovery/tmp_check/t_031_recovery_conflict_primary_data/backup Archive directory: /home/myon/postgresql/pg/postgresql/build/src/test/recovery/tmp_check/t_031_recovery_conflict_primary_data/archives Connection string: port=63013 host=/tmp/5M7Y6uSm5n Log file: /home/myon/postgresql/pg/postgresql/build/src/test/recovery/tmp_check/log/031_recovery_conflict_primary.log # Running: initdb -D /home/myon/postgresql/pg/postgresql/build/src/test/recovery/tmp_check/t_031_recovery_conflict_primary_data/pgdata -A trust -N The files belonging to this database system will be owned by user "myon". This user must also own the server process. The database cluster will be initialized with this locale configuration: provider: libc LC_COLLATE: en_US.utf8 LC_CTYPE: en_US.utf8 LC_MESSAGES: C LC_MONETARY: en_US.utf8 LC_NUMERIC: en_US.utf8 LC_TIME: en_US.utf8 The default database encoding has accordingly been set to "UTF8". The default text search configuration will be set to "english". Data page checksums are disabled. creating directory /home/myon/postgresql/pg/postgresql/build/src/test/recovery/tmp_check/t_031_recovery_conflict_primary_data/pgdata ... ok creating subdirectories ... ok selecting dynamic shared memory implementation ... posix selecting default max_connections ... 100 selecting default shared_buffers ... 128MB selecting default time zone ... Etc/UTC creating configuration files ... ok running bootstrap script ... ok performing post-bootstrap initialization ... ok Sync to disk skipped. The data directory might become corrupt if the operating system crashes. Success. You can now start the database server using: pg_ctl -D /home/myon/postgresql/pg/postgresql/build/src/test/recovery/tmp_check/t_031_recovery_conflict_primary_data/pgdata -l logfile start # Running: /home/myon/postgresql/pg/postgresql/build/src/test/recovery/../../../src/test/regress/pg_regress --config-auth /home/myon/postgresql/pg/postgresql/build/src/test/recovery/tmp_check/t_031_recovery_conflict_primary_data/pgdata ### Starting node "primary" # Running: pg_ctl -w -D /home/myon/postgresql/pg/postgresql/build/src/test/recovery/tmp_check/t_031_recovery_conflict_primary_data/pgdata -l /home/myon/postgresql/pg/postgresql/build/src/test/recovery/tmp_check/log/031_recovery_conflict_primary.log -o --cluster-name=primary start waiting for server to start.... done server started # Postmaster PID for node "primary" is 892297 # Taking pg_basebackup my_backup from node "primary" # Running: pg_basebackup -D /home/myon/postgresql/pg/postgresql/build/src/test/recovery/tmp_check/t_031_recovery_conflict_primary_data/backup/my_backup -h /tmp/5M7Y6uSm5n -p 63013 --checkpoint fast --no-sync # Backup finished # Checking port 63014 # Found port 63014 Name: standby Data directory: /home/myon/postgresql/pg/postgresql/build/src/test/recovery/tmp_check/t_031_recovery_conflict_standby_data/pgdata Backup directory: /home/myon/postgresql/pg/postgresql/build/src/test/recovery/tmp_check/t_031_recovery_conflict_standby_data/backup Archive directory: /home/myon/postgresql/pg/postgresql/build/src/test/recovery/tmp_check/t_031_recovery_conflict_standby_data/archives Connection string: port=63014 host=/tmp/5M7Y6uSm5n Log file: /home/myon/postgresql/pg/postgresql/build/src/test/recovery/tmp_check/log/031_recovery_conflict_standby.log # Initializing node "standby" from backup "my_backup" of node "primary" ### Enabling streaming replication for node "standby" ### Starting node "standby" # Running: pg_ctl -w -D /home/myon/postgresql/pg/postgresql/build/src/test/recovery/tmp_check/t_031_recovery_conflict_standby_data/pgdata -l /home/myon/postgresql/pg/postgresql/build/src/test/recovery/tmp_check/log/031_recovery_conflict_standby.log -o --cluster-name=standby start waiting for server to start.... done server started # Postmaster PID for node "standby" is 892314 Waiting for replication conn standby's replay_lsn to pass 0/342B100 on primary done Waiting for replication conn standby's replay_lsn to pass 0/342B1C0 on primary done [01:00:49.772](1.875s) # issuing query via background psql: # BEGIN; # DECLARE test_recovery_conflict_cursor CURSOR FOR SELECT b FROM test_recovery_conflict_table1; # FETCH FORWARD FROM test_recovery_conflict_cursor; [01:00:49.794](0.022s) ok 1 - buffer pin conflict: cursor with conflicting pin established Waiting for replication conn standby's replay_lsn to pass 0/3430000 on primary done timed out waiting for match: (?^:User was holding shared buffer pin for too long) at t/031_recovery_conflict.pl line 318. # Postmaster PID for node "primary" is 892297 ### Stopping node "primary" using mode immediate # Running: pg_ctl -D /home/myon/postgresql/pg/postgresql/build/src/test/recovery/tmp_check/t_031_recovery_conflict_primary_data/pgdata -m immediate stop waiting for server to shut down.... done server stopped # No postmaster PID for node "primary" # Postmaster PID for node "standby" is 892314 ### Stopping node "standby" using mode immediate # Running: pg_ctl -D /home/myon/postgresql/pg/postgresql/build/src/test/recovery/tmp_check/t_031_recovery_conflict_standby_data/pgdata -m immediate stop waiting for server to shut down.... done server stopped # No postmaster PID for node "standby" [01:03:54.353](184.559s) # Tests were run but no plan was declared and done_testing() was not seen. [01:03:54.354](0.000s) # Looks like your test exited with 255 just after 1.