On Tuesday, July 26, 2022 5:34 PM Dilip Kumar <dilipbal...@gmail.com> wrote: > On Tue, Jul 26, 2022 at 2:30 PM Dilip Kumar <dilipbal...@gmail.com> wrote: > > > > On Fri, Jul 22, 2022 at 8:27 AM wangw.f...@fujitsu.com > > <wangw.f...@fujitsu.com> wrote: > > > > > > On Tues, Jul 19, 2022 at 10:29 AM I wrote: > > > > Attach the news patches. > > > > > > Not able to apply patches cleanly because the change in HEAD > (366283961a). > > > Therefore, I rebased the patch based on the changes in HEAD. > > > > > > Attach the new patches. > > > > + /* Check the foreign keys. */ > > + fkeys = RelationGetFKeyList(entry->localrel); > > + if (fkeys) > > + entry->parallel_apply = PARALLEL_APPLY_UNSAFE; > > > > So if there is a foreign key on any of the tables which are parts of a > > subscription then we do not allow changes for that subscription to be > > applied in parallel? I think this is a big limitation because having > > foreign key on the table is very normal right? I agree that if we > > allow them then there could be failure due to out of order apply > > right? but IMHO we should not put the restriction instead let it fail > > if there is ever such conflict. Because if there is a conflict the > > transaction will be sent again. Do we see that there could be wrong > > or inconsistent results if we allow such things to be executed in > > parallel. If not then IMHO just to avoid some corner case failure we > > are restricting very normal cases. > > some more comments.. > 1. > + /* > + * If we have found a free worker or if we are already > applying this > + * transaction in an apply background worker, then we > pass the data to > + * that worker. > + */ > + if (first_segment) > + apply_bgworker_send_data(stream_apply_worker, s->len, > + s->data); > > Comment says that if we have found a free worker or we are already applying in > the worker then pass the changes to the worker but actually as per the code > here we are only passing in case of first_segment? > > I think what you are trying to say is that if it is first segment then send > the > > 2. > + /* > + * This is the main apply worker. Check if there is any free apply > + * background worker we can use to process this transaction. > + */ > + if (first_segment) > + stream_apply_worker = apply_bgworker_start(stream_xid); > + else > + stream_apply_worker = apply_bgworker_find(stream_xid); > > So currently, whenever we get a new streamed transaction we try to start a new > background worker for that. Why do we need to start/close the background > apply worker every time we get a new streamed transaction. I mean we can > keep the worker in the pool for time being and if there is a new transaction > looking for a worker then we can find from that. Starting a worker is costly > operation and since we are using parallelism for this mean we are expecting > that there would be frequent streamed transaction needing parallel apply > worker so why not to let it wait for a certain amount of time so that if load > is low > it will anyway stop and if the load is high it will be reused for next > streamed > transaction.
It seems the function name was a bit mislead. Currently, the started apply bgworker won't exit after applying the transaction. And the apply_bgworker_start will first try to choose a free worker. It will start a new worker only if no free worker is available. > 3. > Why are we restricting parallel apply workers only for the streamed > transactions, because streaming depends upon the size of the logical decoding > work mem so making steaming and parallel apply tightly coupled seems too > restrictive to me. Do we see some obvious problems in applying other > transactions in parallel? We thought there could be some conflict failure and deadlock if we parallel apply normal transaction which need transaction dependency check[1]. But I will do some more research for this and share the result soon. [1] https://www.postgresql.org/message-id/CAA4eK1%2BwyN6zpaHUkCLorEWNx75MG0xhMwcFhvjqm2KURZEAGw%40mail.gmail.com Best regards, Hou zj