Dear Amit, Thank you for giving comments!
> > > Sorry for the delay, I didn't had time to come back to it until this > > > afternoon. > > > > No issues, everyone is busy:-). > > > > > I don't think that your analysis is correct. Slots are guaranteed to be > > > stopped after all the normal backends have been stopped, exactly to avoid > such > > > extraneous records. > > > > > > What is happening here is that the slot's confirmed_flush_lsn is properly > > > updated in memory and ends up being the same as the current LSN before the > > > shutdown. But as it's a logical slot and those records aren't decoded, > > > the > > > slot isn't marked as dirty and therefore isn't saved to disk. You don't > > > see > > > that behavior when doing a manual checkpoint before (per your script > comment), > > > as in that case the checkpoint also tries to save the slot to disk but > > > then > > > finds a slot that was marked as dirty and therefore saves it. > > > > > Here, why the behavior is different for manual and non-manual checkpoint? I have analyzed more, and concluded that there are no difference between manual and shutdown checkpoint. The difference was whether the CHECKPOINT record has been decoded or not. The overall workflow of this test was: 1. do INSERT (2. do CHECKPOINT) (3. decode CHECKPOINT record) 4. receive feedback message from standby 5. do shutdown CHECKPOINT At step 3, the walsender decoded that WAL and set candidate_xmin_lsn. The stucktrace was: standby_decode()->SnapBuildProcessRunningXacts()->LogicalIncreaseXminForSlot(). At step 4, the confirmed_flush of the slot was updated, but ReplicationSlotSave() was executed only when the slot->candidate_xmin_lsn had valid lsn. If step 2 and 3 are misssed, the dirty flag is not set and the change is still on the memory. FInally, the CHECKPOINT was executed at step 5. If step 2 and 3 are misssed and the patch from Julien is not applied, the updated value will be discarded. This is what I observed. The patch forces to save the logical slot at the shutdown checkpoint, so the confirmed_lsn is save to disk at step 5. > Can you please explain what led to updating the confirmed_flush in > memory but not in the disk? The code-level workflow was said above. The slot info is updated only after decoding CHECKPOINT. I'm not sure the initial motivation, but I suspect we wanted to reduce the number of writing to disk. > BTW, have we ensured that discarding the > additional records are already sent to the subscriber, if so, why for > those records confirmed_flush LSN is not progressed? In this case, the apply worker request the LSN which is greater than confirmed_lsn via START_REPLICATION. Therefore, according to CreateDecodingContext(), the walsender sends from the appropriate records, doesn't it? I think discarding is not happened on subscriber. Best Regards, Hayato Kuroda FUJITSU LIMITED