On Wed, Apr 10, 2024 at 5:28 PM Zhijie Hou (Fujitsu) <houzj.f...@fujitsu.com> wrote: > > On Thursday, April 4, 2024 5:37 PM Amit Kapila <amit.kapil...@gmail.com> > wrote: > > > > BTW, while thinking on this one, I > > noticed that in the function LogicalConfirmReceivedLocation(), we first > > update > > the disk copy, see comment [1] and then in-memory whereas the same is not > > true in > > update_local_synced_slot() for the case when snapshot exists. Now, do we > > have > > the same risk here in case of standby? Because I think we will use these > > xmins > > while sending the feedback message (in XLogWalRcvSendHSFeedback()). > > > > * We have to write the changed xmin to disk *before* we change > > * the in-memory value, otherwise after a crash we wouldn't know > > * that some catalog tuples might have been removed already. > > Yes, I think we have the risk on the standby, I can reproduce the case that if > the server crashes after updating the in-memory value and before saving them > to > disk, the synced slot could be invalidated after restarting from crash, > because > the necessary rows have been removed on the primary. The steps can be found in > [1]. > > I think we'd better fix the order in update_local_synced_slot() as well. I > tried to make the fix in 0002, 0001 is Shveta's patch to fix another issue in > this thread. Since > they are touching the same function, so attach them together for review. >
Few comments: =============== 1. + + /* Sanity check */ + if (slot->data.confirmed_flush != remote_slot->confirmed_lsn) + ereport(LOG, + errmsg("synchronized confirmed_flush for slot \"%s\" differs from remote slot", + remote_slot->name), Is there a reason to use elevel as LOG instead of ERROR? I think it should be elog(ERROR, .. as this is an unexpected case. 2. - if (remote_slot->restart_lsn < slot->data.restart_lsn) + if (remote_slot->confirmed_lsn < slot->data.confirmed_flush) elog(ERROR, "cannot synchronize local slot \"%s\" LSN(%X/%X)" Can we be more specific in this message? How about splitting it into error_message as "cannot synchronize local slot \"%s\"" and then errdetail as "Local slot's start streaming location LSN(%X/%X) is ahead of remote slot's LSN(%X/%X)"? -- With Regards, Amit Kapila.