On Mon, Dec 30, 2024 at 4:03 PM vignesh C <vignes...@gmail.com> wrote: > > On Mon, 30 Dec 2024 at 12:04, Shubham Khanna > <khannashubham1...@gmail.com> wrote: > > > > On Mon, Dec 30, 2024 at 10:10 AM vignesh C <vignes...@gmail.com> wrote: > > > > > > On Mon, 30 Dec 2024 at 09:34, Shubham Khanna > > > <khannashubham1...@gmail.com> wrote: > > > > > > > > Hi, > > > > > > > > Currently, there is a risk that pg_createsubscriber may fail to > > > > complete successfully when the max_slot_wal_keep_size value is set too > > > > low. This can occur if the WAL is removed before the standby using the > > > > replication slot is able to complete replication, as the required WAL > > > > files are no longer available. > > > > > > > > I was able to reproduce this issue using the following steps: > > > > Set up a streaming replication environment. > > > > Run pg_createsubscriber in a debugger. > > > > Pause pg_createsubscriber at the setup_recovery stage. > > > > Perform several operations on the primary node to generate a large > > > > volume of WAL, causing older WAL segments to be removed due to the low > > > > max_slot_wal_keep_size setting. > > > > Once the necessary WAL segments are deleted, continue the execution of > > > > pg_createsubscriber. > > > > At this point, pg_createsubscriber fails with the following error: > > > > 2024-12-29 01:21:37.590 IST [427353] FATAL: could not receive data > > > > from WAL stream: ERROR: requested WAL segment > > > > 000000010000000000000003 has already been removed > > > > 2024-12-29 01:21:37.592 IST [427345] LOG: waiting for WAL to become > > > > available at 0/3000110 > > > > 2024-12-29 01:21:42.593 IST [427358] LOG: started streaming WAL from > > > > primary at 0/3000000 on timeline 1 > > > > 2024-12-29 01:21:42.593 IST [427358] FATAL: could not receive data > > > > from WAL stream: ERROR: requested WAL segment > > > > 000000010000000000000003 has already been removed > > > > > > > > This issue was previously reported in [1], with a suggestion to raise > > > > a warning in [2]. I’ve implemented a patch that logs a warning in > > > > dry-run mode. This will give users the opportunity to adjust the > > > > max_slot_wal_keep_size value before running the command. > > > > > > > > Thoughts? > > > > > > +1 for throwing a warning in dry-run mode > > > > > > Few comments: > > > 1) We can have this check only in dry-run mode, it is not required in > > > non dry-run mode as there is nothing much user can do once the tool is > > > running, we can change this: > > > + if (max_slot_wal_keep_size != -1) > > > + { > > > + pg_log_warning("publisher requires > > > 'max_slot_wal_keep_size = -1', but only %d remain", > > > + max_slot_wal_keep_size); > > > + pg_log_warning_detail("Change the > > > 'max_slot_wal_keep_size' configuration on the publisher to -1."); > > > + } > > > > > > to: > > > + if (dry_run && max_slot_wal_keep_size != -1) > > > + { > > > + pg_log_warning("publisher requires > > > 'max_slot_wal_keep_size = -1', but only %d remain", > > > + max_slot_wal_keep_size); > > > + pg_log_warning_detail("Change the > > > 'max_slot_wal_keep_size' configuration on the publisher to -1."); > > > + } > > > > > > 2) This error message is not quite right, can we change it to > > > "publisher requires max_slot_wal_keep_size to be -1, but is set to %d" > > > + if (max_slot_wal_keep_size != -1) > > > + { > > > + pg_log_warning("publisher requires > > > 'max_slot_wal_keep_size = -1', but only %d remain", > > > + max_slot_wal_keep_size); > > > + pg_log_warning_detail("Change the > > > 'max_slot_wal_keep_size' configuration on the publisher to -1."); > > > + } > > > > > > 3) Also the configuration could be specified in format specifier like > > > it is specified in the earlier case > > > > > > > I have fixed the given comments. The attached patch contains the > > suggested changes. > > Few comments: > 1) Since this configuration will be updated after reload, you can > reload instead of restarting: > +# Configure 'max_slot_wal_keep_size = 1' on the publisher and verify the > +# warning message > +$node_p->append_conf('postgresql.conf', 'max_slot_wal_keep_size = 1'); > +$node_p->restart; > > 2) You can reset max_slot_wal_keep_size configuration after this test: > + 0, > + [qr/./], > + [ > + qr/pg_createsubscriber: warning: publisher requires > max_slot_wal_keep_size to be -1/, > + qr/Change the configuration parameter > "max_slot_wal_keep_size" on the publisher to -1./, > + ], > + 'Validate warning for misconfigured max_slot_wal_keep_size on > the publisher' > +); > > 3) We could update the comment to also mention the possibility of > required wal files being deleted with non-default > max_slot_wal_keep_size. > + /* Validate max_slot_wal_keep_size */ > + if (dry_run && max_slot_wal_keep_size != -1) > + { > + pg_log_warning("publisher requires > max_slot_wal_keep_size to be -1, but is set to %d", > + max_slot_wal_keep_size); > + pg_log_warning_detail("Change the configuration > parameter \"%s\" on the publisher to %d.", > + > "max_slot_wal_keep_size", -1); > + } > > 4) You can update the commit message saying "the possibility of > required wal files being deleted with non-default > max_slot_wal_keep_size" and also mention that this warning will be > raised in dry_run mode. >
I have fixed the given comments. The attached patch contains the suggested changes. Thanks and regards, Shubham Khanna.
v3-0001-Validate-max_slot_wal_keep_size-in-pg_createsubsc.patch
Description: Binary data