On Tue, 19 Nov 2024 at 12:43, Nisha Moond <nisha.moond...@gmail.com> wrote:
>
> Attached is the v49 patch set:
> - Fixed the bug reported in [1].
> - Addressed comments in [2] and [3].
>
> I've split the patch into two, implementing the suggested idea in
> comment #5 of [2] separately in 001:
>
> Patch-001: Adds additional error reports (for all invalidation types)
> in ReplicationSlotAcquire() for invalid slots when error_if_invalid =
> true.
> Patch-002: The original patch with comments addressed.

This Assert can fail:
+                                       /*
+                                        * Check if the slot needs to
be invalidated due to
+                                        *
replication_slot_inactive_timeout GUC.
+                                        */
+                                       if (now &&
+
TimestampDifferenceExceeds(s->inactive_since, now,
+
                            replication_slot_inactive_timeout_sec *
1000))
+                                       {
+                                               invalidation_cause = cause;
+                                               inactive_since =
s->inactive_since;
+
+                                               /*
+                                                * Invalidation due to
inactive timeout implies that
+                                                * no one is using the slot.
+                                                */
+                                               Assert(s->active_pid == 0);

With the following scenario:
Set replication_slot_inactive_timeout to 10 seconds
-- Create a slot
postgres=# select pg_create_logical_replication_slot ('test',
'pgoutput', true, true);
 pg_create_logical_replication_slot
------------------------------------
 (test,0/1748068)
(1 row)

-- Wait for 10 seconds and execute checkpoint
postgres=# checkpoint;
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back
the current transaction and exit, because another server process
exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
server closed the connection unexpectedly

The assert fails:
#5  0x00005b074f0c922f in ExceptionalCondition
(conditionName=0x5b074f2f0b4c "s->active_pid == 0",
fileName=0x5b074f2f0010 "slot.c", lineNumber=1762) at assert.c:66
#6  0x00005b074ee26ead in InvalidatePossiblyObsoleteSlot
(cause=RS_INVAL_INACTIVE_TIMEOUT, s=0x740925361780, oldestLSN=0,
dboid=0, snapshotConflictHorizon=0, invalidated=0x7fffaee87e63) at
slot.c:1762
#7  0x00005b074ee273b2 in InvalidateObsoleteReplicationSlots
(cause=RS_INVAL_INACTIVE_TIMEOUT, oldestSegno=0, dboid=0,
snapshotConflictHorizon=0) at slot.c:1952
#8  0x00005b074ee27678 in CheckPointReplicationSlots
(is_shutdown=false) at slot.c:2061
#9  0x00005b074e9dfda7 in CheckPointGuts (checkPointRedo=24412528,
flags=108) at xlog.c:7513
#10 0x00005b074e9df4ad in CreateCheckPoint (flags=108) at xlog.c:7179
#11 0x00005b074edc6bfc in CheckpointerMain (startup_data=0x0,
startup_data_len=0) at checkpointer.c:463

Regards,
Vignesh


Reply via email to