Hi, I looked through the patch and have some comments.
====== L68: + <title>Recovery Procedures</title> It looks somewhat confusing and appears as if the section is intended to explain how to perform recovery. Since this is the first built-in procedure, I'm not sure how should this section be written. However, the section immediately above is named "Recovery Control Functions", so "Reocvery Synchronization Functions" would align better with the naming of the upper section. (I don't believe we need to be so strcit about the distinction between functions and procedures here.) It looks strange that the procedure signature includes the return type. ====== L93: + If <parameter>timeout</parameter> is not specified or zero, this + procedure returns once WAL is replayed upto + <literal>target_lsn</literal>. + If the <parameter>timeout</parameter> is specified (in + milliseconds) and greater than zero, the procedure waits until the + server actually replays the WAL upto <literal>target_lsn</literal> or + until the given time has passed. On timeout, an error is emitted. The first sentence should mention the main functionality. Following precedents, it might be better to use something like "Waits until recovery surpasses the specified LSN. If no timeout is specified or it is set to zero, this procedure waits indefinitely for the LSN. If the timeout is specified (in milliseconds) and is greater than zero, the procedure waits until the LSN is reached or the specified time has elapsed. On timeout, or if the server is promoted before the LSN is reached, an error is emitted." The detailed explanation that follows the above seems somewhat too verbose to me, as other functions don't have such detailed examples. ====== L484 /* + * Set latches for processes, whose waited LSNs are already replayed. This + * involves spinlocks. So, we shouldn't do this under a spinlock. + */ Here, I'm not quite sure what specifically spinlock (or mutex?) is referring to. However, more importantly, shouldn't we explain that it is okay not to use any locks at all, rather than mentioning that spinlocks should not be used here? I found a similar case around freelist.c:238, which is written as follows. > * Not acquiring ProcArrayLock here which is slightly icky. It's > * actually fine because procLatch isn't ever freed, so we just > can > * potentially set the wrong process' (or no process') latch. > */ > SetLatch(&ProcGlobal->allProcs[bgwprocno].procLatch); ===== L518 +void +WaitForLSN(XLogRecPtr targetLSN, int64 timeout) This function is only called within the same module. I'm not sure if we need to expose it. I we do, the name should probably be more specific. I'm not quite sure if the division of functionality between this function and its only caller function is appropriate. As a possible refactoring, we could have WaitForLSN() just return the result as [reached, timedout, promoted] and delegate prerequisition checks and error reporting to the SQL function. ===== L524 + /* Shouldn't be called when shmem isn't initialized */ + Assert(waitLSN); Seeing this assertion, I feel that the name "waitLSN" is a bit obscure. How about renaming it to "waitLSNStates"? ===== L527 + /* Should be only called by a backend */ + Assert(MyBackendType == B_BACKEND && MyProcNumber <= MaxBackends); This is somewhat excessive, causing a server crash when ending with an error would suffice. By the way, if I execute "CALL pg_wal_replay_wait('0/0')" on a logical wansender, the server crashes. The condition doesn't seem appropriate. ===== L561 + errdetail("Recovery ended before replaying the target LSN %X/%X; last replay LSN %X/%X.", I don't think we need "the" before "target" in the above message. ===== L565 + if (timeout > 0) + { + delay_ms = (endtime - GetCurrentTimestamp()) / 1000; + latch_events |= WL_TIMEOUT; + if (delay_ms <= 0) + break; "timeout" is immutable in the function. Therefore, we can calculate "latch_events" before entering the loop. By the way, the name 'latch_events' seems a bit off. Latch is a kind of event the function can wait for. Therefore, something like wait_events might be more appropriate. ===== L567 + delay_ms = (endtime - GetCurrentTimestamp()) / 1000; We can use TimestampDifferenceMilliseconds() here. ==== L578 + if (rc & WL_LATCH_SET) + ResetLatch(MyLatch); I think we usually reset latches unconditionally after returning from WaitLatch(), even when waiting for timeouts. ===== L756 +{ oid => '16387', descr => 'wait for LSN with timeout', The description seems a bit off. While timeout is mentioned, the more important characteristic that the LSN is the replay LSN is not included. ===== L791 + * WaitLSNProcInfo [e2 80 93] the shared memory structure representing information This line contains a non-ascii character EN Dash(U+2013). ===== L798 + * A process number, same as the index of this item in waitLSN->procInfos. + * Stored for convenience. + */ + int procnum; It is described as "(just) for convenience". However, it is referenced by Startup to fetch the PGPROC entry for the waiter, which necessary for Startup. That aside, why don't we hold (the pointer to) procLatch instead of procnum? It makes things simpler and I believe it is our standard practice. ===== L809 + /* A flag indicating that this item is added to waitLSN->waitersHeap */ + bool inHeap; The name "inHeap" seems too leteral and might be hard to understand in most occurances. How about using "waiting" instead? ===== L920 +# I +# Make sure that pg_wal_replay_wait() works: add new content to Hmm. I feel that Arabic numerals look nicer than Greek ones here. ===== L940 +# Check that new data is visible after calling pg_wal_replay_wait() On the other hand, the comment for the check for this test states that > +# Make sure the current LSN on standby and is the same as primary's > LSN +ok($output eq 30, "standby reached the same LSN as primary"); I think the first comment and the second should be consistent. > Oh, I forgot some notes about 044_wal_replay_wait_injection_test.pl. > > 1. It's not clear why this test needs node_standby2 at all. It seems useless. I agree with you. What we would need is a second *waiter client* connecting to the same stanby rather than a second standby. I feel like having a test where the first waiter is removed while multiple waiters are waiting, as well as a test where promotion occurs under the same circumstances. > 2. The target LSN is set to pg_current_wal_insert_lsn() + 10000. This > location seems to be unachievable in this test. So, it's not clear > what race condition this test could potentially detect. > 3. I think it would make sense to check for the race condition > reported by Heikki. That is to insert the injection point at the > beginning of WaitLSNSetLatches(). I think the race condition you mentioned refers to the inconsistency between the inHeap flag and the pairing heap caused by a race condition between timeout and wakeup (or perhaps other combinations? I'm not sure which version of the patch the mentioned race condition refers to). However, I imagine it is difficult to reliably reproduce this condition. In that regard, in the latest patch, the coherence between the inHeap flag and the pairing heap is protected by LWLock, so I believe we no longer need that test. regards. > Links. > 1. > https://www.postgresql.org/message-id/flat/CAPpHfdvGRssjqwX1%2Bidm5Tu-eWsTcx6DthB2LhGqA1tZ29jJaw%40mail.gmail.com#557900e860457a9e24256c93a2ad4920 -- Kyotaro Horiguchi NTT Open Source Software Center