On Tue, Nov 12, 2024 at 4:55 PM Tomas Vondra <to...@vondra.me> wrote: > > > > On 11/12/24 10:37, Ashutosh Bapat wrote: > > On Tue, Nov 12, 2024 at 4:54 AM Tomas Vondra <to...@vondra.me> wrote: > >> > >> > >> > >> On 11/11/24 23:41, Masahiko Sawada wrote: > >>> On Mon, Nov 11, 2024 at 6:17 AM Tomas Vondra <to...@vondra.me> wrote: > >>> > >>> Which made me re-investigate the issue and thought that it doesn't > >>> necessarily need to clear these candidate values in memory on > >>> releasing a slot as long as we're carefully updating restart_lsn. > >> > >> Not sure, but maybe it'd be useful to ask the opposite question. Why > >> shouldn't it be correct to reset the fields, which essentially puts the > >> slot into the same state as if it was just read from disk? That also > >> discards all these values, and we can't rely on accidentally keeping > >> something important info in memory (because if the instance restarts > >> we'd lose that). > >> > >> But this reminds me that the patch I shared earlier today resets the > >> slot in the ReplicationSlotAcquire() function, but I guess that's not > >> quite correct. It probably should be in the "release" path. > >> > >>> Which seems a bit efficient for example when restarting from a very > >>> old point. Of course, even if we reset them on releasing a slot, it > >>> would perfectly work since it's the same as restarting logical > >>> decoding with a server restart. > >> > >> I find the "efficiency" argument a bit odd. It'd be fine if we had a > >> correct behavior to start with, but we don't have that ... Also, I'm not > >> quite sure why exactly would it be more efficient? > >> > >> And how likely is this in practice? It seems to me that > >> performance-sensitive cases can't do reconnects very often anyway, > >> that's inherently inefficient. No? > >> > >>> I think > >>> LogicalIncreaseRestartDecodingForSlot() should be fixed as it seems > >>> not to be working expectedly, but I could not have proof that we > >>> should either keep or reset them on releasing a slot. > >>> > >> > >> Not sure. Chances are we need both fixes, if only to make > >> LogicalIncreaseRestartDecodingForSlot more like the other function. > >> > > > > Thanks a lot for pointing to Masahiko's analysis. I missed that part > > when I read that thread. Sorry. > > > > No need to apologize. The discussion is complex and spread over a rather > long time period. > > > A candidate_restart_lsn and candidate_restart_valid pair just tells > > that we may set slot's data.restart_lsn to candidate_restart_lsn when > > the downstream confirms an LSN >= candidate_restart_valid. That pair > > can never get inaccurate. It may get stale but never inaccurate. So > > wiping those fields from ReplicationSlot is unnecessary. > > > > Isn't this issue a proof that those fields *can* get inaccurate? Or what > do you mean by "stale but not inaccurate"? > > > What should ideally happen is we should ignore candidates produced by > > older running transactions WAL records after WAL sender restart. This > > is inline with what logical replication does with transactions > > committed before slot's confirmed_flush_lsn - those are ignored. But > > the criteria for ignoring running transactions records is slightly > > different from that for transactions. If we ignore > > candidate_restart_lsn which has candidate_restart_valid <= > > confirmed_flush_lsn, we might lose some opportunity to advance > > data.restart_lsn. Instead we should ignore any candidate_restart_lsn > > <= data.restart_lsn especially before WAL sender finds first change to > > send downstream. We can do that in SnapBuildProcessRunningXacts() by > > accessing MyReplicationSlot, taking lock on it and then comparing > > data.restart_lsn with txn->restart_decoding_lsn before calling > > LogicalIncreaseRestartDecodingForSlot(). But then > > LogicalIncreaseRestartDecodingForSlot() would be doing the same anyway > > after applying your patch 0004. The only downside of 0004 is that the > > logic to ignore candidates produced by a running transactions record > > is not clearly visible in SnapBuildProcessRunningXacts(). For a > > transaction which is ignored the logic to ignore the transaction is > > visible in DecodeCommit() or DecodeAbort() - where people are likely > > to look for that logic. We may add a comment to that effect in > > SnapBuildProcessRunningXacts(). > > > > I have thought about just doing something like: > > slot->data.restart_lsn = Max(slot->data.restart_lsn, new_lsn); > > and similar for the other LSN fields. And it did resolve the issue at > hand, of course. But it seems sloppy, and I'm worried it might easily > mask actual issues in other cases. > > I'm still of the opinion that (with the exception of a reconnect), these > places should not need to deal with values that go backwards. It should > work just fine without the Max(), and we should add Asserts() to check > that it's always a higher LSN. > > For the reconnect, I think it's a bit as if the primary restarted right > before the reconnect. That could happen anyway, and we need to handle > that correctly - if not, we have yet another issue, IMHO. And with the > restart it's the same as writing the slot to disk and reading it back, > which also doesn't retain most of the fields. So it seems cleaner to do > the same thing and just reset the various fields. > > > I haven't thought about making SnapBuildProcessRunningXacts() more > complex to consider this stuff. But I doubt we'd like to be accessing > slots from that code - it has nothing to do with slots.
Here's way we can fix SnapBuildProcessRunningXacts() similar to DecodeCommit(). DecodeCommit() uses SnapBuildXactNeedsSkip() to decide whether a given transaction should be decoded or not. /* * Should the contents of transaction ending at 'ptr' be decoded? */ bool SnapBuildXactNeedsSkip(SnapBuild *builder, XLogRecPtr ptr) { return ptr < builder->start_decoding_at; } Similar to SnapBuild::start_decoding_at we could maintain a field SnapBuild::start_reading_at to the LSN from which the WAL sender would start reading WAL. If candidate_restart_lsn produced by a running transactions WAL record is less than SnapBuild::start_reading_at, SnapBuildProcessRunningXacts() won't call LogicalIncreaseRestartDecodingForSlot() with that candiate LSN. We won't access the slot here and the solution will be inline with DecodeCommit() which skips the transactions. -- Best Wishes, Ashutosh Bapat