> On Mar 26, 2025, at 07:55, Phillip Diffley <phillip6...@gmail.com> wrote:
> Just to confirm, it sounds like the order messages are sent from the output 
> plugin is what matters here. When you update confirmed_flush_lsn to LSN "A", 
> any messages that were sent by the output plugin after the message with LSN 
> "A" will be replayable. Any messages sent by the output plugin before the 
> message with LSN "A" will most likely not be replayed, since their data is 
> freed for deletion. Is that correct?

The terminology is shifting around a bit here, so to be specific: When the 
primary (or publisher) receives a message from the secondary (or replica) that 
a particular LSN has been flushed, the primary at that point feels free to 
recycle any WAL segments that only contain WAL entries whose LSN is less than 
that flush point (whether or not it actually does depends on a lot of other 
factors).  The actual horizon that the primary needs to retain can be farther 
back than that, because there's no requirement that the secondary send an LSN 
as confirmed_flush_lsn that is at a transaction boundary, so the flush LSN 
might land in the middle of a transaction.  The actual point before which the 
primary can recycle WAL is restart_lsn, which the primary determines based on 
the flush LSN.

When the secondary connects, it provides an LSN from which the primary should 
start sending WAL (if a binary replica) or decoded WAL via the plugin (if a 
logical replica).  For a logical replica, that can be confirmed_flush_lsn or 
any point after, but it can't be before.  (Even if the WAL exists, the primary 
will return an error if the start point provided in START_REPLICATION is before 
confirmed_flush_lsn for a logical replication slot.)  Of course, you'll get an 
error if START_REPLICATION supplies an LSN that doesn't actually exist yet.

The behavior that the primary is expecting from the secondary is that the 
secondary never sends back a confirmed_flush_lsn until up to that point is 
crash / disconnection-safe.  What "safe" means in this case depends on the 
client behavior.  It might be just spooling the incoming stream to disk and 
processing it later, or it might be processing it completely on the fly as it 
comes in.

The most important point here is that the client consuming the logical 
replication messages must keep track of the flush point (defined however the 
client implements processing the messages), and provide the right one back to 
the primary when it connects.  (Another option is that that the client is 
written so that each transaction is idempotent, and even if transactions that 
it has already processed are sent again, the result is the same.)

One more note is that if the client supplies an LSN (for logical replication) 
that lands in the middle of a transaction, the primary will send over the 
complete transaction, so the actual start point may be earlier than the 
supplied start point.  Generally, this means that the client should respect 
transaction boundaries, and be able to deal with getting a partial transaction 
but discarding it if it doesn't get a commit record for it.

Reply via email to