On Wed, Mar 4, 2020 at 11:16 AM Dilip Kumar <dilipbal...@gmail.com> wrote: > > On Wed, Mar 4, 2020 at 10:50 AM Amit Kapila <amit.kapil...@gmail.com> wrote: > > > > On Wed, Mar 4, 2020 at 9:52 AM Dilip Kumar <dilipbal...@gmail.com> wrote: > > > > > > > > > IMHO, the threshold should be based on the commit LSN. Our main > > > reason we want to send empty transactions after a certain > > > transaction/duration is that we want the restart_lsn to be moving > > > forward so that if we need to restart the replication slot we don't > > > need to process a lot of extra WAL. So assume we set the threshold > > > based on transaction count then there is still a possibility that we > > > might process a few very big transactions then we will have to process > > > them again after the restart. > > > > > > > Won't the subscriber eventually send the flush location for the large > > transactions which will move the restart_lsn? > > I meant large empty transactions (basically we can not send anything > to the subscriber). So my point was if there are only large > transactions in the system which we can not stream because those > tables are not published. Then keeping threshold based on transaction > count will not help much because even if we don't reach the > transaction count threshold, we still might need to process a lot of > data if we don't stream the commit for the empty transactions. So > instead of tracking transaction count can we track LSN, and LSN > different since we last stream some change cross the threshold then we > will stream the next empty transaction. >
You have a point and it may be better to keep threshold based on LSN if we want to keep any threshold, but keeping on transaction count seems to be a bit straightforward. Let us see if anyone else has any opinion on this matter? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com