On Fri, Jan 6, 2023 at 9:37 AM houzj.f...@fujitsu.com <houzj.f...@fujitsu.com> wrote: > > On Thursday, January 5, 2023 7:54 PM Dilip Kumar <dilipbal...@gmail.com> > wrote: > Thanks, I have started another thread[1] > > Attach the parallel apply patch set here again. I didn't change the patch set, > attach it here just to let the CFbot keep testing it.
I have completed the review and some basic testing and it mostly looks fine to me. Here is my last set of comments/suggestions. 1. /* * Don't start a new parallel worker if user has set skiplsn as it's * possible that they want to skip the streaming transaction. For * streaming transactions, we need to serialize the transaction to a file * so that we can get the last LSN of the transaction to judge whether to * skip before starting to apply the change. */ if (!XLogRecPtrIsInvalid(MySubscription->skiplsn)) return false; I think this is fine to block parallelism in this case, but it is also possible to make it less restrictive, basically, only if the first lsn of the transaction is <= skiplsn, then only it is possible that the final_lsn might match with skiplsn otherwise that is not possible. And if we want then we can allow parallelism in that case. I understand that currently we do not have first_lsn of the transaction in stream start message but I think that should be easy to do? Although I am not sure if it is worth it, it's good to make a note at least. 2. + * XXX Additionally, we also stop the worker if the leader apply worker + * serialize part of the transaction data due to a send timeout. This is + * because the message could be partially written to the queue and there + * is no way to clean the queue other than resending the message until it + * succeeds. Instead of trying to send the data which anyway would have + * been serialized and then letting the parallel apply worker deal with + * the spurious message, we stop the worker. + */ + if (winfo->serialize_changes || + list_length(ParallelApplyWorkerPool) > + (max_parallel_apply_workers_per_subscription / 2)) IMHO this reason (XXX Additionally, we also stop the worker if the leader apply worker serialize part of the transaction data due to a send timeout) for stopping the worker looks a bit hackish to me. It may be a rare case so I am not talking about the performance but the reasoning behind stopping is not good. Ideally we should be able to clean up the message queue and reuse the worker. 3. + else if (shmq_res == SHM_MQ_WOULD_BLOCK) + { + /* Replay the changes from the file, if any. */ + if (pa_has_spooled_message_pending()) + { + pa_spooled_messages(); + } I think we do not need this pa_has_spooled_message_pending() function. Because this function is just calling pa_get_fileset_state() which is acquiring mutex and getting filestate then if the filestate is not FS_EMPTY then we call pa_spooled_messages() that will again call pa_get_fileset_state() which will again acquire mutex. I think when the state is FS_SERIALIZE_IN_PROGRESS it will frequently call pa_get_fileset_state() consecutively 2 times, and I think we can easily achieve the same behavior with just one call. 4. + * leader, or when there there is an error. None of these cases will allow + * the code to reach here. /when there there is an error/when there is an error -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com