On Tue, Nov 12, 2024 at 12:02 PM Masahiko Sawada <sawada.m...@gmail.com> wrote: > > On Mon, Nov 11, 2024 at 2:08 PM Tomas Vondra <to...@vondra.me> wrote: > > > > > > But neither of those fixes prevents backwards move for confirmed_flush > > LSN, as enforced by asserts in the 0005 patch. I don't know if this > > assert is incorrect or now. It seems natural that once we get a > > confirmation for some LSN, we can't move before that position, but I'm > > not sure about that. Maybe it's too strict. > > Hmm, I'm concerned that it might be another problem. I think there are > some cases where a subscriber sends a flush position older than slot's > confirmed_flush as a feedback message. But it seems to be dangerous if > we always accept it as a new confirmed_flush value. It could happen > that confirm_flush could be set to a LSN older than restart_lsn. >
If confirmed_flush LSN moves backwards, it means the transactions which were thought to be replicated earlier are no longer considered to be replicated. This means that the restart_lsn of the slot needs to be at least far back as the oldest of starting points of those transactions. Thus restart_lsn of slot has to be pushed further back. That WAL may not be available anymore. Similar issue with catalog_xmin, the older catalog rows may have been removed. Other problem is we may send some transactions twice, which might cause trouble downstream. So I agree that confirmed_flush LSN should not move backwards. IIRC, if the downstream sends an older confirmed_flush in START_REPLICATION message, WAL sender does not consider it and instead uses the one in replication slot. -- Best Wishes, Ashutosh Bapat