Hi, Currently, there is a risk that pg_createsubscriber may fail to complete successfully when the max_slot_wal_keep_size value is set too low. This can occur if the WAL is removed before the standby using the replication slot is able to complete replication, as the required WAL files are no longer available.
I was able to reproduce this issue using the following steps: Set up a streaming replication environment. Run pg_createsubscriber in a debugger. Pause pg_createsubscriber at the setup_recovery stage. Perform several operations on the primary node to generate a large volume of WAL, causing older WAL segments to be removed due to the low max_slot_wal_keep_size setting. Once the necessary WAL segments are deleted, continue the execution of pg_createsubscriber. At this point, pg_createsubscriber fails with the following error: 2024-12-29 01:21:37.590 IST [427353] FATAL: could not receive data from WAL stream: ERROR: requested WAL segment 000000010000000000000003 has already been removed 2024-12-29 01:21:37.592 IST [427345] LOG: waiting for WAL to become available at 0/3000110 2024-12-29 01:21:42.593 IST [427358] LOG: started streaming WAL from primary at 0/3000000 on timeline 1 2024-12-29 01:21:42.593 IST [427358] FATAL: could not receive data from WAL stream: ERROR: requested WAL segment 000000010000000000000003 has already been removed This issue was previously reported in [1], with a suggestion to raise a warning in [2]. I’ve implemented a patch that logs a warning in dry-run mode. This will give users the opportunity to adjust the max_slot_wal_keep_size value before running the command. Thoughts? [1] - https://www.postgresql.org/message-id/TY3PR01MB9889FEDBBF74A9F79CA7CC87F57A2%40TY3PR01MB9889.jpnprd01.prod.outlook.com [2] - https://www.postgresql.org/message-id/be92c57b-82e1-4920-ac31-a8a04206db7b%40app.fastmail.com Thanks and regards, Shubham Khanna.
v1-0001-Validate-max_slot_wal_keep_size-in-pg_createsubsc.patch
Description: Binary data