On Wed, Jun 28, 2017 at 2:13 PM, Masahiko Sawada <sawada.m...@gmail.com> wrote: > On Wed, Jun 28, 2017 at 1:47 AM, Petr Jelinek > <petr.jeli...@2ndquadrant.com> wrote: >> >> On 27/06/17 10:51, Masahiko Sawada wrote: >>> On Mon, Jun 26, 2017 at 12:12 PM, Masahiko Sawada <sawada.m...@gmail.com> >>> wrote: >>> >>> I've reviewed this patch briefly. >> >> Thanks! >> >>> >>> @@ -515,6 +533,31 @@ logicalrep_worker_stop(Oid subid, Oid relid) >>> } >>> >>> /* >>> + * Request worker to be stopped on commit. >>> + */ >>> +void >>> +logicalrep_worker_stop_at_commit(Oid subid, Oid relid) >>> +{ >>> + LogicalRepWorkerId *wid; >>> + MemoryContext old; >>> + >>> + old = MemoryContextSwitchTo(TopTransactionContext); >>> + >>> + wid = (LogicalRepWorkerId *) palloc(sizeof(LogicalRepWorkerId)); >>> + >>> + /* >>> + wid = MemoryContextAlloc(TopTransactionContext, >>> + >>> sizeof(LogicalRepWorkerId)); >>> + */ >>> + wid->subid = subid; >>> + wid->relid = relid; >>> + >>> + on_commit_stop_workers = lappend(on_commit_stop_workers, wid); >>> + >>> + MemoryContextSwitchTo(old); >>> +} >>> >>> logicalrep_worker_stop_at_commit() has a problem that new_list() >>> called by lappend() allocates the memory from current memory context. >>> It should switch the memory context and then allocate the memory for >>> wid and append it to the list. >>> > > Thank you for updating the patch! > >> >> Right, fixed (I see you did that locally as well based on the above >> excerpt ;) ). > > Oops, yeah that's my test code. > >>> -------- >>> @@ -754,9 +773,25 @@ ApplyLauncherShmemInit(void) >>> void >>> AtEOXact_ApplyLauncher(bool isCommit) >>> { >>> - if (isCommit && on_commit_launcher_wakeup) >>> - ApplyLauncherWakeup(); >>> + ListCell *lc; >>> + >>> + if (isCommit) >>> + { >>> + foreach (lc, on_commit_stop_workers) >>> + { >>> + LogicalRepWorkerId *wid = lfirst(lc); >>> + logicalrep_worker_stop(wid->subid, wid->relid); >>> + } >>> + >>> + if (on_commit_launcher_wakeup) >>> + ApplyLauncherWakeup(); >>> >>> Stopping the logical rep worker in AtEOXact_ApplyLauncher doesn't >>> support the prepared transaction. Since we allocate the list >>> on_commit_stop_workers in TopTransactionContext the postgres crashes >>> if we execute any query after prepared transaction that removes >>> subscription relation state. Also after fixed this issue, we still >>> need to something: the list of on_commit_stop_workers is not >>> associated the prepared transaction. A query next to "preapre >>> transaction" tries to stop workers at the commit. There was similar >>> discussion before. >>> >> >> Hmm, good point. I think for now it makes sense to simply don't allow >> PREPARE for transactions that manipulate workers similar to what we do >> when there are exported snapshots. Done it that way in attached. > > Agreed. Should we note that in docs? > >> >>> -------- >>> + >>> + ensure_transaction(); >>> + >>> UpdateSubscriptionRelState(MyLogicalRepWorker->subid, >>> + >>> rstate->relid, rstate->state, >>> + >>> rstate->lsn); >>> >>> >>> Should we commit the transaction if we started a new transaction >>> before update the subscription relation state, or it could be deadlock >>> risk. >> >> We only lock the whole subscription (and only conflicting things are >> DROP and ALTER SUBSCRIPTION), not individual subscription-relation >> mapping so it doesn't seem to me like there is any higher risk of >> deadlocks than anything else which works with multiple tables (or >> compared to previous behavior). >> > > I'm concerned that a lock for whole subscription could be conflicted > between ALTER SUBSCRIPTION, table sync worker and apply worker: > > Please imagine the following case. > > 1. The apply worker updates a subscription relation state R1 of > subscription S1. > -> It acquires AccessShareLock on S1, and keep holding. > 2. ALTER SUBSCRIPTION SET PUBLICATION tries to acquire > AccessExclusiveLock on S1. > -> But it waits for the apply worker to release the lock. > 3. The apply worker calls wait_for_relation_state_change(relid, > SUBREL_STATE_SYNCDONE) and waits for the table sync worker > for R2 to change its status. > -> Note that the apply worker is still holding AccessShareLock on S1 > 4. The table sync worker tries to update its status to SYNCDONE > -> In UpdateSubscriptionRelState(), it tires to acquire AccessShareLock > on S1 but waits for it. *deadlock* > > To summary, because the apply worker keeps holding AccessShareLock on > S1, the acquiring AccessExclusiveLock by ALTER SUBSCRIPTION waits for > the apply worker, and then the table sync worker also waits for the > ALTER SUBSCRIPTION in order to change its status. And the apply worker > waits for the table sync worker to change its status. > > I encountered the similar case once on my environment. But since it > completely depends on timing I don't have the concrete reproducing > steps. >
Also, now the patch conflicts with current HEAD, so need to be updated. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers