On Thu, Nov 27, 2025 at 3:20 PM shveta malik <[email protected]> wrote:
>
> On Thu, Nov 27, 2025 at 2:32 AM Masahiko Sawada <[email protected]> wrote:
> >
> >
> > I've squashed all fixup patches and attached the updated patch.
> >
>
> Thanks for test results and the patch. Please find a few comments:
>
> 1)
> If primary has effective_wal_level as logical and standby has zero
> slots; then during promotion of standby, if I try to create a slot, it
> is restricted as intended, but it gives misleading message:
> postgres=# SELECT pg_create_logical_replication_slot('slot',
> 'pgoutput', false, false, false);
> ERROR:  logical decoding on standby requires "effective_wal_level" >=
> "logical" on the primary
> HINT:  Set "wal_level" >= "logical" or create at least one logical
> slot when "wal_level" = "replica".
>
> One possibility is we extend the message as follows:
> "Logical decoding on standby requires either effective_wal_level >=
> logical on the primary or that recovery has finished." and remove
> errhint?
>
> 2)
> + * completes. Here's how we handle concurrent toggling logical decoding:
>
> toggling --> toggling of
>
> 3)
> + /*
> + * This is the authoritative value used by all processes to determine
> + * whether to write additional information required by logical decoding to
> + * WAL. Since this information could be checked frequently, each process
> + * caches this value in XLogLogicalInfo for better performance.
> + */
> + bool xlog_logical_info;
>
> Shall we mention about cache XLogLogicalInfoXactCache as well here?
>
> 4)
> EnsureLogicalDecodingEnabled():
> + if (wal_level == WAL_LEVEL_MINIMAL)
> + ereport(ERROR,
> + errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("cannot enable logical decoding when \"wal_level\" = \"minimal\""));
>
> Do we need this error here? CheckSlotRequirements checks this already.
> Won't Assert suffice here?
>
> 5)
> EnsureLogicalDecodingEnabled():
> + if (RecoveryInProgress())
> + {
> + Assert(IsLogicalDecodingEnabled());
> + return;
> + }
>
> I understand that it will be no-op on standby. But it took me some
> time to understand why we expect logical-decoding enabled here.  I see
> that CheckLogicalDecodingRequirements() won't allow the flow to reach
> here if logical-decoding was not enabled in the first place. Shall we
> add some comments here?
>
> 6)
> DisableLogicalDecodingIfNecessary() has this:
> + if (wal_level != WAL_LEVEL_REPLICA)
> + return;
>
> Also it has this:
> + /* Write the WAL to disable logical decoding on standbys too */
> + if (XLogStandbyInfoActive())
> + write_logical_decoding_status_update_record(false);
>
> Do we actually need XLogStandbyInfoActive() here? Given that wal_level
> is replica, and we’ve already confirmed that no slot exists while
> pending_disable and xlog_logical_info are still true, isn’t that
> enough to write a WAL record for the disable operation?
>
> 7)
> + /*
> + * With 'minimal' WAL level, there are not logical replication slots
> + * during recovery. Logical decoding is always disabled, so there is no
> + * need to synchronize XLogLogicalInfo..
> + */
>
> not -> no
> 2 dots at the end.
> =====
>
> Reviewing more..

A few more comments:

8)
InvalidateObsoleteReplicationSlots():

+ if (InvalidatePossiblyObsoleteSlot(possible_causes, s, oldestLSN,
+    dboid, snapshotConflictHorizon,
+    &released_lock))
  {
......
  }
+
+ SpinLockAcquire(&s->mutex);
+ found_valid_logicalslot |= (s->data.invalidated == RS_INVAL_NONE);
+ SpinLockRelease(&s->mutex);

Shouldn't we move SpinLockAcquire and setting
'found_valid_logicalslot' to else-block? Only when
InvalidatePossiblyObsoleteSlot() has not invalidated slot, we would
like to check for RS_INVAL_NONE and set 'found_valid_logicalslot'.

9)
test:
+ "effective_wal_level increased to logical upon a logical slot creation");
+ "effective_wal_level remains logical even after a server restart");
+ "effective_wal_level stays 'logical' as one slot remains");
+ "effective_wal_level got decreased to 'replica' after invalidating
the last logical slot"

In the messages above, at some places we single quote the
effective_wal_level value while at other places we do not. We can make
all same.

10)
+# Promote standby3, increasing effective_wal_level to 'logical' as
its wal_level
+# is set to 'logical'.
+$standby3->promote;
+
+# Check if effective_wal_level is increased to 'logical' on the
cascaded standby.
+$standby3->wait_for_replay_catchup($cascade);
+test_wal_level($cascade, "replica|logical",
+ "effective_wal_level got increased to 'logical' on standby as the
new primary has wal_level='logical'"
+);

The message is slightly confusing due to usage of both 'standby' and
'new primary'. Can we make it:
effective_wal_level got increased to 'logical' as the new primary has
wal_level='logical'

11)
my ($result, $stdout, $stderr) = $standby4->psql('postgres',
qq[select pg_logical_slot_get_changes('standby4_slot', null, null)]);
like(
$stderr,
qr/ERROR:  logical decoding on standby requires "effective_wal_level"
>= "logical" on the primary/,
"cannot use logical decoding on standby as it is disabled on primary");

Since the slot is invalidated, shouldn't the better error message be
what we usually get:
ERROR:  can no longer access replication slot "slot"
DETAIL:  This replication slot has been invalidated due to "..".

IMO, the 'invalidated' message is better because even if we enable
logical-decoding on primary, it is of no use for this slot.

But I also see the difficulty in achieving this, as the
primary-related error message is by CheckLogicalDecodingRequirements()
which happens earlier than
ReplicationSlotAcquire().  So if there is no better way, we can leave it as is.

thanks
Shveta


Reply via email to