On Thu, Nov 26, 2020 at 12:49 AM Drouvot, Bertrand <bdrou...@amazon.com> wrote: > > Hi, > > On 11/25/20 2:20 PM, Masahiko Sawada wrote: > > CAUTION: This email originated from outside of the organization. Do not > > click links or open attachments unless you can confirm the sender and know > > the content is safe. > > > > > > > > On Fri, Nov 20, 2020 at 6:18 PM Drouvot, Bertrand <bdrou...@amazon.com> > > wrote: > >> Hi, > >> > >> On 11/17/20 4:44 PM, Fujii Masao wrote: > >>> Thanks for updating the patch! Here are review comments. > >>> > >>> + Controls whether a log message is produced when the startup > >>> process > >>> + is waiting longer than <varname>deadlock_timeout</varname> > >>> + for recovery conflicts. > >>> > >>> But a log message can be produced also when the backend is waiting > >>> for recovery conflict. Right? If yes, this description needs to be > >>> corrected. > >> Thanks for looking at it! > >> > >> I don't think so, only the startup process should write those new log > >> messages. > >> > >> What makes you think that would not be the case? > >> > >>> > >>> + for recovery conflicts. This is useful in determining if > >>> recovery > >>> + conflicts prevents the recovery from applying WAL. > >>> > >>> "prevents" should be "prevent"? > >> Indeed: fixed in the new attached patch. > >> > >>> > >>> + TimestampDifference(waitStart, GetCurrentTimestamp(), &secs, > >>> &usecs); > >>> + msecs = secs * 1000 + usecs / 1000; > >>> > >>> GetCurrentTimestamp() is basically called before LogRecoveryConflict() > >>> is called. So isn't it better to avoid calling GetCurrentTimestamp() > >>> newly in > >>> LogRecoveryConflict() and to reuse the timestamp that we got? > >>> It's helpful to avoid the waste of cycles. > >>> > >> good catch! fixed in the new attached patch. > >> > >>> + while (VirtualTransactionIdIsValid(*vxids)) > >>> + { > >>> + PGPROC *proc = > >>> BackendIdGetProc(vxids->backendId); > >>> > >>> BackendIdGetProc() can return NULL if the backend is not active > >>> at that moment. This case should be handled. > >>> > >> handled in the new attached patch. > >>> + case PROCSIG_RECOVERY_CONFLICT_BUFFERPIN: > >>> + reasonDesc = gettext_noop("recovery is still > >>> waiting recovery conflict on buffer pin"); > >>> > >>> It's natural to use "waiting for recovery" rather than "waiting > >>> recovery"? > >>> > >> I would be tempted to say so, the new patch makes use of "waiting for". > >>> + /* Also, set deadlock timeout for logging purpose if > >>> necessary */ > >>> + if (log_recovery_conflict_waits) > >>> + { > >>> + timeouts[cnt].id = STANDBY_TIMEOUT; > >>> + timeouts[cnt].type = TMPARAM_AFTER; > >>> + timeouts[cnt].delay_ms = DeadlockTimeout; > >>> + cnt++; > >>> + } > >>> > >>> This needs to be executed only when the message has not been logged yet. > >>> Right? > >>> > >> good catch: fixed in the new attached patch. > >> > > Thank you for updating the patch! Here are review comments on the > > latest version patch. > > > > + while (VirtualTransactionIdIsValid(*vxids)) > > + { > > + PGPROC *proc = BackendIdGetProc(vxids->backendId); > > + > > + if (proc) > > + { > > + if (nprocs == 0) > > + appendStringInfo(&buf, "%d", proc->pid); > > + else > > + appendStringInfo(&buf, ", %d", proc->pid); > > + > > + nprocs++; > > + vxids++; > > + } > > + } > > > > We need to increment vxids even if *proc is null. Otherwise, the loop won't > > end. > > My bad, that's fixed. > > > > > --- > > + TimestampTz cur_ts = GetCurrentTimestamp();; > Fixed > > > > There is an extra semi-colon. > > > > --- > > int max_standby_streaming_delay = 30 * 1000; > > +bool log_recovery_conflict_waits = false; > > +bool logged_lock_conflict = false; > > > > > > + if (log_recovery_conflict_waits && !logged_lock_conflict) > > + { > > + timeouts[cnt].id = STANDBY_TIMEOUT; > > + timeouts[cnt].type = TMPARAM_AFTER; > > + timeouts[cnt].delay_ms = DeadlockTimeout; > > + cnt++; > > + } > > > > Can we pass a bool indicating if a timeout may be needed for recovery > > conflict logging from ProcSleep() to ResolveRecoveryConflictWithLock() > > instead of using a static variable? > > Yeah that makes more sense, specially as we already have > logged_recovery_conflict at our disposal. > > New patch version attached. >
Thank you for updating the patch! The patch works fine and looks good to me except for the following small comments: +/* + * Log the recovery conflict. + * + * waitStart is the timestamp when the caller started to wait. This function also + * reports the details about the conflicting process ids if *waitlist is not NULL. + */ +void +LogRecoveryConflict(ProcSignalReason reason, TimestampTz waitStart, + TimestampTz cur_ts, VirtualTransactionId *waitlist) I think it's better to explain cur_ts as well in the function comment. Regarding the function arguments, 'waitStart' is camel case whereas 'cur_ts' is snake case and 'waitlist' is using only lower cases. I think we should unify them. --- -ResolveRecoveryConflictWithLock(LOCKTAG locktag) +ResolveRecoveryConflictWithLock(LOCKTAG locktag, bool logged_recovery_conflict) The function argument name 'logged_recovery_conflict' sounds a bit redundant to me as this function is used only for recovery conflict resolution. How about 'need_log' or something? Also it’s better to explain it in the function comment. Regards, -- Masahiko Sawada EnterpriseDB: https://www.enterprisedb.com/