Here are some comments for the patch v50-0002. ====== GENERAL
(I made a short study of all the ereports in this patch -- here are some findings) ~~~ 0.1 Don't need the parentheses. Checking all the ereports I see that half of them have the redundant parentheses and half of them do not; You might as well make them all use the new style where the extra parentheses are not needed. e.g. + ereport(LOG, + (errmsg("skipping slot synchronization"), + errdetail("enable_syncslot is disabled."))); e.g. + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("cannot drop replication slot \"%s\"", name), + errdetail("This slot is being synced from the primary server."))); and many more like this. Search for all the ereports. ~~~ 0.2 + ereport(LOG, + (errmsg("dropped replication slot \"%s\" of dbid %d as it " + "was not sync-ready", NameStr(s->data.name), + s->data.database))); I felt maybe that could be: errmsg("dropped replication slot \"%s\" of dbid %d", ... errdetail("It was not sync-ready.") (now this shares the same errmsg with another ereport) ~~~ 0.3. + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("skipping sync of slot \"%s\" as it is a user created" + " slot", remote_slot->name), + errdetail("This slot has failover enabled on the primary and" + " thus is sync candidate but user created slot with" + " the same name already exists on the standby."))); This seemed too wordy. Can't it be shortened (maybe like below) without losing any of the vital information? errmsg("skipping sync of slot \"%s\"", ...) errdetail("A user-created slot with the same name already exists on the standby.") ~~~ 0.4 + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("exiting from slot synchronization due to bad configuration"), + /* translator: second %s is a GUC variable name */ + errdetail("The primary slot \"%s\" specified by %s is not valid.", + PrimarySlotName, "primary_slot_name"))); /The primary slot/The primary server slot/ ~~~ 0.5 + ereport(ERROR, + (errmsg("could not fetch primary_slot_name \"%s\" info from the " + "primary: %s", PrimarySlotName, res->err))); /primary:/primary server:/ ~~~ 0.6 The continuations for long lines are inconsistent. Sometimes there are trailing spaces and sometimes there are leading spaces. And sometimes there are both at the same time which would cause double-spacing in the message! Please make them all the same. I think using leading spaces is easier but YMMV. e.g. + elog(ERROR, + "not synchronizing local slot \"%s\" LSN(%X/%X)" + " to remote slot's LSN(%X/%X) as synchronization " + " would move it backwards", remote_slot->name, + LSN_FORMAT_ARGS(slot->data.restart_lsn), + LSN_FORMAT_ARGS(remote_slot->restart_lsn)); ====== src/backend/replication/logical/slotsync.c 1. check_primary_info + /* No need to check further, return that we are cascading standby */ + if (remote_in_recovery) + { + *am_cascading_standby = true; + ExecClearTuple(tupslot); + walrcv_clear_result(res); + CommitTransactionCommand(); + return; + } + + valid = DatumGetBool(slot_getattr(tupslot, 2, &isnull)); + Assert(!isnull); + + if (!valid) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("exiting from slot synchronization due to bad configuration"), + /* translator: second %s is a GUC variable name */ + errdetail("The primary slot \"%s\" specified by %s is not valid.", + PrimarySlotName, "primary_slot_name"))); + ExecClearTuple(tupslot); + walrcv_clear_result(res); + CommitTransactionCommand(); +} Now that there is a common cleanup/return code this function be reduced further like below: SUGGESTION if (remote_in_recovery) { /* No need to check further, return that we are cascading standby */ *am_cascading_standby = true; } else { /* We are a normal standby. */ valid = DatumGetBool(slot_getattr(tupslot, 2, &isnull)); Assert(!isnull); if (!valid) ... } ExecClearTuple(tupslot); walrcv_clear_result(res); CommitTransactionCommand(); } ~~~ 2. ReplSlotSyncWorkerMain + /* + * One can promote the standby and we can no longer be a cascading + * standby. So recheck here. + */ + if (am_cascading_standby) + check_primary_info(wrconn, &am_cascading_standby); Minor rewording of that new comment. SUGGESTION If the standby was promoted then what was previously a cascading standby might no longer be one, so recheck each time. ====== src/test/recovery/t/050_verify_slot_order.pl 3. +################################################## +# Test that a synchronized slot can not be decoded, altered and dropped by the user +################################################## /and dropped/or dropped/ ~~~ 4. + +($result, $stdout, $stderr) = $standby1->psql( + 'postgres', + qq[ALTER_REPLICATION_SLOT lsub1_slot (failover);], + replication => 'database'); +ok($stderr =~ /ERROR: cannot alter replication slot "lsub1_slot"/, + "synced slot on standby cannot be altered"); + Add a comment for this test part SUGGESTION Attempting to alter a synced slot should result in an error ~~~ 5. IMO it would be better if the tests were done in the same order mentioned in the comment. So either change the tests or change the comment. ====== Kind Regards, Peter Smith. Fujitsu Australia