On Fri, Jan 6, 2023 at 15:06 PM Ashutosh Bapat <ashutosh.bapat....@gmail.com> wrote: > Hi Wang, > Thanks for working on this. One of our customer faced a similar > situation when running BDR with PostgreSQL. > > I tested your patch and it solves the problem. > > Please find some review comments below
Thanks for your testing and comments. > +/* > + * Helper function for ReorderBufferProcessTXN for updating progress. > + */ > +static inline void > +ReorderBufferUpdateProgress(ReorderBuffer *rb, ReorderBufferTXN *txn, > + ReorderBufferChange *change) > +{ > + LogicalDecodingContext *ctx = rb->private_data; > + static int changes_count = 0; > > It's not easy to know that a variable is static when reading the code which > uses it. So it's easy to interpret code wrong. I would probably track it > through logical decoding context itself OR through a global variable like > other > places where we track the last timestamps. But there's more below on this. I'm not sure if we need to add global variables or member variables for a cumulative count that is only used here. How would you feel if I add some comments when declaring this static variable? > + > + if (!ctx->update_progress) > + return; > + > + Assert(!ctx->fast_forward); > + > + /* set output state */ > + ctx->accept_writes = false; > + ctx->write_xid = txn->xid; > + ctx->write_location = change->lsn; > + ctx->end_xact = false; > > This patch reverts many of the changes of the previous commit which tried to > fix this issue i.e. 55558df2374. end_xact was introduced by the same commit > but > without much explanation of that in the commit message. Its only user, > WalSndUpdateProgress(), is probably making a wrong assumption as well. > > * We don't have a mechanism to get the ack for any LSN other than end > * xact LSN from the downstream. So, we track lag only for end of > * transaction LSN. > > IIUC, WAL sender tracks the LSN of the last WAL record read in sentPtr which > is > sent downstream through a keep alive message. Downstream may > acknowledge this > LSN. So we do get ack for any LSN, not just commit LSN. > > So I propose removing end_xact as well. We didn't track the lag during a transaction because it could make the calculations of lag functionality inaccurate. If we track every lsn, it could fail to record important lsn information because of WALSND_LOGICAL_LAG_TRACK_INTERVAL_MS (see function WalSndUpdateProgress). Please see details in [1] and [2]. Regards, Wang Wei [1] - https://www.postgresql.org/message-id/OS3PR01MB62755D216245199554DDC8DB9EEA9%40OS3PR01MB6275.jpnprd01.prod.outlook.com [2] - https://www.postgresql.org/message-id/OS3PR01MB627514AE0B3040D8F55A68B99EEA9%40OS3PR01MB6275.jpnprd01.prod.outlook.com