On Mon, Dec 7, 2020 at 9:21 AM Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Mon, Dec 7, 2020 at 6:20 AM Craig Ringer > <craig.rin...@enterprisedb.com> wrote: > > > > >> > >> I am not sure why but it seems acceptable to original authors that the > >> data of transactions are visibly partially during the initial > >> synchronization phase for a subscription. > > > > > > I don't think there's much alternative there. > > > > I am not sure about this. I think it is primarily to allow some more > parallelism among apply and sync workers. One primitive way to achieve > parallelism and don't have this problem is to allow apply worker to > wait till all the tablesync workers are in DONE state. >
As the slot of apply worker is created before all the tablesync workers it should never miss any LSN which tablesync workers would have processed. Also, the table sync workers should not process any xact if the apply worker has not processed anything. I think tablesync currently always processes one transaction (because we call process_sync_tables at commit of a txn) even if that is not required to be in sync with the apply worker. This should solve both the problems (a) visibility of partial transactions (b) allow prepared transactions because tablesync worker no longer needs to combine multiple transactions data. I think the other advantages of this would be that it would reduce the load (both CPU and I/O) on the publisher-side by allowing to decode the data only once instead of for each table sync worker once and separately for the apply worker. I think it will use fewer resources to finish the work. Is there any flaw in this idea which I am missing? -- With Regards, Amit Kapila.