On Tue, Dec 22, 2020 at 11:58 PM Fujii Masao <masao.fu...@oss.nttdata.com> wrote: > > > > On 2020/12/22 20:42, Fujii Masao wrote: > > > > > > On 2020/12/22 10:25, Masahiko Sawada wrote: > >> On Fri, Dec 18, 2020 at 6:36 PM Fujii Masao <masao.fu...@oss.nttdata.com> > >> wrote: > >>> > >>> > >>> > >>> On 2020/12/17 2:15, Fujii Masao wrote: > >>>> > >>>> > >>>> On 2020/12/16 23:28, Drouvot, Bertrand wrote: > >>>>> Hi, > >>>>> > >>>>> On 12/16/20 2:36 PM, Victor Yegorov wrote: > >>>>>> > >>>>>> *CAUTION*: This email originated from outside of the organization. Do > >>>>>> not click links or open attachments unless you can confirm the sender > >>>>>> and know the content is safe. > >>>>>> > >>>>>> > >>>>>> ср, 16 дек. 2020 г. в 13:49, Fujii Masao <masao.fu...@oss.nttdata.com > >>>>>> <mailto:masao.fu...@oss.nttdata.com>>: > >>>>>> > >>>>>> After doing this procedure, you can see the startup process and > >>>>>> backend > >>>>>> wait for the table lock each other, i.e., deadlock. But this > >>>>>> deadlock remains > >>>>>> even after deadlock_timeout passes. > >>>>>> > >>>>>> This seems a bug to me. > >>>>>> > >>>>> +1 > >>>>> > >>>>>> > >>>>>> > * Deadlocks involving the Startup process and an ordinary > >>>>>> backend process > >>>>>> > * will be detected by the deadlock detector within the ordinary > >>>>>> backend. > >>>>>> > >>>>>> The cause of this issue seems that > >>>>>> ResolveRecoveryConflictWithLock() that > >>>>>> the startup process calls when recovery conflict on lock happens > >>>>>> doesn't > >>>>>> take care of deadlock case at all. You can see this fact by > >>>>>> reading the above > >>>>>> source code comment for ResolveRecoveryConflictWithLock(). > >>>>>> > >>>>>> To fix this issue, I think that we should enable > >>>>>> STANDBY_DEADLOCK_TIMEOUT > >>>>>> timer in ResolveRecoveryConflictWithLock() so that the startup > >>>>>> process can > >>>>>> send PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK signal to the > >>>>>> backend. > >>>>>> Then if PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK signal arrives, > >>>>>> the backend should check whether the deadlock actually happens or > >>>>>> not. > >>>>>> Attached is the POC patch implimenting this. > >>>>>> > >>>>> good catch! > >>>>> > >>>>> I don't see any obvious reasons why the STANDBY_DEADLOCK_TIMEOUT > >>>>> shouldn't be set in ResolveRecoveryConflictWithLock() too (it is > >>>>> already set in ResolveRecoveryConflictWithBufferPin()). > >>>>> > >>>>> So + 1 to consider this as a bug and for the way the patch proposes to > >>>>> fix it. > >>>> > >>>> Thanks Victor and Bertrand for agreeing! > >>>> Attached is the updated version of the patch. > >>> > >>> Attached is v3 of the patch. Could you review this version? > >>> > >>> While the startup process is waiting for recovery conflict on buffer pin, > >>> it repeats sending the request for deadlock check to all the backends > >>> every deadlock_timeout. This may increase the workload in the startup > >>> process and backends, but since this is the original behavior, the patch > >>> doesn't change that. Also in practice this may not be so harmful because > >>> the period that the buffer is kept pinned is basically not so long. > >>> > >> > >> @@ -529,6 +603,26 @@ ResolveRecoveryConflictWithBufferPin(void) > >> */ > >> ProcWaitForSignal(PG_WAIT_BUFFER_PIN); > >> > >> + if (got_standby_deadlock_timeout) > >> + { > >> + /* > >> + * Send out a request for hot-standby backends to check themselves > >> for > >> + * deadlocks. > >> + * > >> + * XXX The subsequent ResolveRecoveryConflictWithBufferPin() will > >> wait > >> + * to be signaled by UnpinBuffer() again and send a request for > >> + * deadlocks check if deadlock_timeout happens. This causes the > >> + * request to continue to be sent every deadlock_timeout until the > >> + * buffer is unpinned or ltime is reached. This would increase the > >> + * workload in the startup process and backends. In practice it may > >> + * not be so harmful because the period that the buffer is kept > >> pinned > >> + * is basically no so long. But we should fix this? > >> + */ > >> + SendRecoveryConflictWithBufferPin( > >> + > >> PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK); > >> + got_standby_deadlock_timeout = false; > >> + } > >> + > >> > >> Since SendRecoveryConflictWithBufferPin() sends the signal to all > >> backends every backend who is waiting on a lock at ProcSleep() and not > >> holding a buffer pin blocking the startup process will end up doing a > >> deadlock check, which seems expensive. What is worse is that the > >> deadlock will not be detected because the deadlock involving a buffer > >> pin isn't detected by CheckDeadLock(). I thought we can replace > >> PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK with > >> PROCSIG_RECOVERY_CONFLICT_BUFFERPIN but it’s not good because the > >> backend who has a buffer pin blocking the startup process and not > >> waiting on a lock is also canceled after deadlock_timeout. We can have > >> the backend return from RecoveryConflictInterrupt() when it received > >> PROCSIG_RECOVERY_CONFLICT_BUFFERPIN and is not waiting on any lock, > >> but it’s also not good because we cannot cancel the backend after > >> max_standby_streaming_delay that has a buffer pin blocking the startup > >> process. So I wonder if we can have a new signal. When the backend > >> received this signal it returns from RecoveryConflictInterrupt() > >> without deadlock check either if it’s not waiting on any lock or if it > >> doesn’t have a buffer pin blocking the startup process. Otherwise it's > >> cancelled. > > > > Thanks for pointing out that issue! Using new signal is an idea. Another > > idea > > is to make a backend skip check the deadlock if > > GetStartupBufferPinWaitBufId() > > returns -1, i.e., the startup process is not waiting for buffer pin. So, > > what I'm thinkins is; > > > > In RecoveryConflictInterrupt(), when a backend receive > > PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK, > > > > 1. If a backend isn't waiting for a lock, it does nothing . > > 2. If a backend is waiting for a lock and also holding a buffer pin that > > delays recovery, it may be canceled. > > 3. If a backend is waiting for a lock and the startup process is not waiting > > for buffer pin (i.e., the startup process is also waiting for a lock), > > it checks for the deadlocks. > > 4. If a backend is waiting for a lock and isn't holding a buffer pin that > > delays recovery though the startup process is waiting for buffer pin, > > it does nothing. >
Good idea! It could still happen that if the startup process sets startupBufferPinWaitBufId to -1 after sending the signal and before the backend checks it, the backend will end up doing an unmeaningful deadlock check. But the likelihood would be low in practice. I have two small comments on ResolveRecoveryConflictWithBufferPin() in the v4 patch: The code currently has three branches as follow: if (ltime == 0) { enable a timeout for deadlock; } else if (GetCurrentTimestamp() >= ltime) { send recovery conflict signal; } else { enable two timeouts: ltime and deadlock } I think we can rearrange the code similar to the changes you made on ResolveRecoveryConflictWithLock(): if (GetCurrentTimestamp() >= ltime && ltime != 0) { Resolve recovery conflict; } else { Enable one or two timeouts: ltime and deadlock } It's more consistent with ResolveRecoveryConflictWithLock(). And currently the patch doesn't reset got_standby_deadlock_timeout in (ltime == 0) case but it will also be resolved by this rearrangement. --- If we always reset got_standby_deadlock_timeout before waiting it's not necessary but we might want to clear got_standby_deadlock_timeout also after disabling all timeouts to ensure that it's cleared at the end of the function. In ResolveRecoveryConflictWithLock() we clear both got_standby_lock_timeout and got_standby_deadlock_timeout after disabling all timeouts but we don't do that in ResolveRecoveryConflictWithBufferPin(). Regards, -- Masahiko Sawada EnterpriseDB: https://www.enterprisedb.com/