On Mon, May 30, 2022 at 5:08 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Mon, May 30, 2022 at 2:22 PM wangw.f...@fujitsu.com > <wangw.f...@fujitsu.com> wrote: > > > > Attach the new patches(only changed 0001 and 0002) > > >
This patch allows the same replication origin to be used by the main apply worker and the bgworker that uses it to apply streaming transactions. See the changes [1] in the patch. I am not completely sure whether that is a good idea even though I could not spot or think of problems that can't be fixed in your patch. I see that currently both the main apply worker and bgworker will assign MyProcPid to the assigned origin slot, this can create the problem because ReplicationOriginExitCleanup() can clean it up even though the main apply worker or another bgworker is still using that origin slot. Now, one way to fix is that we assign only the main apply worker's MyProcPid to session_replication_state->acquired_by. I have tried to think about the concurrency issues as multiple workers could now point to the same replication origin state. I think it is safe because the patch maintains the commit order by allowing only one process to commit at a time, so no two workers will be operating on the same origin at the same time. Even, though there is no case where the patch will try to advance the session's origin concurrently, it appears safe to do so as we change/advance the session_origin LSNs under replicate_state LWLock. Another idea could be that we allow multiple replication origins (one for each bgworker and one for the main apply worker) for the apply workers corresponding to a subscription. Then on restart, we can find the highest LSN among all the origins for a subscription. This should work primarily because we will maintain the commit order. Now, for this to work we need to somehow map all the origins for a subscription and one possibility is that we have a subscription id in each of the origin names. Currently we use ("pg_%u", MySubscription->oid) as origin_name. We can probably append some unique identifier number for each worker to allow each origin to have a subscription id. We need to drop all origins for a particular subscription on DROP SUBSCRIPTION. I think having multiple origins for the same subscription will have some additional work when we try to filter changes based on origin. The advantage of the first idea is that it won't increase the need to have more origins per subscription but it is quite possible that I am missing something and there are problems due to which we can't use that approach. Thoughts? [1]: -replorigin_session_setup(RepOriginId node) +replorigin_session_setup(RepOriginId node, bool acquire) { static bool registered_cleanup; int i; @@ -1110,7 +1110,7 @@ replorigin_session_setup(RepOriginId node) if (curstate->roident != node) continue; - else if (curstate->acquired_by != 0) + else if (curstate->acquired_by != 0 && acquire) { ... ... + /* + * The apply bgworker don't need to monopolize this replication origin + * which was already acquired by its leader process. + */ + replorigin_session_setup(originid, false); + replorigin_session_origin = originid; -- With Regards, Amit Kapila.