On Tue, May 27, 2025 at 12:12 PM Alexander Korotkov <aekorot...@gmail.com> wrote: > > On Tue, May 27, 2025 at 7:08 AM Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Mon, May 26, 2025 at 10:36 PM Alexander Korotkov > > <aekorot...@gmail.com> wrote: > > > > > > On Mon, May 26, 2025 at 2:43 PM Amit Kapila <amit.kapil...@gmail.com> > > > wrote: > > > > > > > > On Mon, May 26, 2025 at 3:52 PM Vitaly Davydov > > > > <v.davy...@postgrespro.ru> wrote: > > > > > > > OTOH, if we don't want to adjust physical > > > > slot machinery, it seems saving the logical slots to disk immediately > > > > when its restart_lsn is updated is a waste of effort after your patch, > > > > no? If so, why are we okay with that? > > > > > > I don't think so. I think the reason why logical slots are synced to > > > disk immediately after update is that logical changes are not > > > idempotent (you can't safely apply the same change twice) unlike > > > physical block-level changes. This is why logical slots need to be > > > synced to prevent double replication of same changes, which could > > > lead, for example, to double insertion. > > > > > > > Hmm, if this has to be true, then even in the else branch of > > LogicalConfirmReceivedLocation [1], we should have saved the slot. > > AFAIU, whether the logical changes are sent to the client is decided > > based on two things: (a) the replication origins, which tracks > > replication progress and are maintained by clients (which for built-in > > replication are subscriber nodes), see [2]; and (b) confirmed_flush > > LSN maintained in the slot by the server. Now, for each ack by the > > client after applying/processing changes, we update the > > confirmed_flush LSN of the slot but don't immediately flush it. This > > shouldn't let us send the changes again because even if the system > > crashes and restarts, the client will send the server the location to > > start sending the changes from based on its origin tracking. There is > > more to it, like there are cases when confirm_flush LSN in the slot > > could be ahead the origin's LSN, and we handle all such cases, but I > > don't think those are directly related here, so I am skipping those > > details for now. > > > > Note that LogicalConfirmReceivedLocation won't save the slot to disk > > if it updates only confirmed_flush LSN, which is used to decide > > whether to send the changes. > > You're right, I didn't study these aspects careful enough. > > > > LogicalConfirmReceivedLocation() implements immediate sync for > > > different reasons. > > > > > > > I may be missing something, but let's discuss some more before we conclude > > this. > > So, yes probably LogicalConfirmReceivedLocation() tries to care about > keeping all WAL segments after the synchronized value of restart_lsn. > But it just doesn't care about concurrent > ReplicationSlotsComputeRequiredLSN(). In order to fix that logic, we > need effective_restart_lsn field by analogy to effective_catalog_xmin > (similar approach was discussed in this thread before). But that > would require ABI compatibility breakage. > > So, I'd like to propose following: backpatch 0001 and 0002, but > implement effective_restart_lsn field for pg19. What do you think?
Possibly we could implement effective_restart_lsn even for pg18. As I know, keeping ABI compatibility is not required for beta. ------ Regards, Alexander Korotkov Supabase