Re: [HACKERS] make async slave to wait for lsn to be replayed

Kyotaro Horiguchi Thu, 20 Jun 2024 01:31:09 -0700

Hi, I looked through the patch and have some comments.


====== L68:
+    <title>Recovery Procedures</title>

It looks somewhat confusing and appears as if the section is intended
to explain how to perform recovery. Since this is the first built-in
procedure, I'm not sure how should this section be written. However,
the section immediately above is named "Recovery Control Functions",
so "Reocvery Synchronization Functions" would align better with the
naming of the upper section. (I don't believe we need to be so strcit
about the distinction between functions and procedures here.)

It looks strange that the procedure signature includes the return type.


====== L93:
+        If <parameter>timeout</parameter> is not specified or zero, this
+        procedure returns once WAL is replayed upto
+        <literal>target_lsn</literal>.
+        If the <parameter>timeout</parameter> is specified (in
+        milliseconds) and greater than zero, the procedure waits until the
+        server actually replays the WAL upto <literal>target_lsn</literal> or
+        until the given time has passed. On timeout, an error is emitted.

The first sentence should mention the main functionality. Following
precedents, it might be better to use something like "Waits until
recovery surpasses the specified LSN. If no timeout is specified or it
is set to zero, this procedure waits indefinitely for the LSN. If the
timeout is specified (in milliseconds) and is greater than zero, the
procedure waits until the LSN is reached or the specified time has
elapsed. On timeout, or if the server is promoted before the LSN is
reached, an error is emitted."

The detailed explanation that follows the above seems somewhat too
verbose to me, as other functions don't have such detailed examples.

====== L484
/*
+        * Set latches for processes, whose waited LSNs are already replayed. 
This
+        * involves spinlocks.  So, we shouldn't do this under a spinlock.
+        */

Here, I'm not quite sure what specifically spinlock (or mutex?) is
referring to.  However, more importantly, shouldn't we explain that it
is okay not to use any locks at all, rather than mentioning that
spinlocks should not be used here? I found a similar case around
freelist.c:238, which is written as follows.

>                * Not acquiring ProcArrayLock here which is slightly icky. It's
>                * actually fine because procLatch isn't ever freed, so we just 
> can
>                * potentially set the wrong process' (or no process') latch.
>                */
>               SetLatch(&ProcGlobal->allProcs[bgwprocno].procLatch);

===== L518
+void
+WaitForLSN(XLogRecPtr targetLSN, int64 timeout)

This function is only called within the same module. I'm not sure if
we need to expose it. I we do, the name should probably be more
specific. I'm not quite sure if the division of functionality between
this function and its only caller function is appropriate.  As a
possible refactoring, we could have WaitForLSN() just return the
result as [reached, timedout, promoted] and delegate prerequisition
checks and error reporting to the SQL function.


===== L524
+       /* Shouldn't be called when shmem isn't initialized */
+       Assert(waitLSN);

Seeing this assertion, I feel that the name "waitLSN" is a bit
obscure. How about renaming it to "waitLSNStates"?

===== L527
+       /* Should be only called by a backend */
+       Assert(MyBackendType == B_BACKEND && MyProcNumber <= MaxBackends);

This is somewhat excessive, causing a server crash when ending with an
error would suffice. By the way, if I execute "CALL
pg_wal_replay_wait('0/0')" on a logical wansender, the server crashes.
The condition doesn't seem appropriate.


===== L561
+                                        errdetail("Recovery ended before 
replaying the target LSN %X/%X; last replay LSN %X/%X.",

I don't think we need "the" before "target" in the above message.


===== L565
+               if (timeout > 0)
+               {
+                       delay_ms = (endtime - GetCurrentTimestamp()) / 1000;
+                       latch_events |= WL_TIMEOUT;
+                       if (delay_ms <= 0)
+                               break;

"timeout" is immutable in the function. Therefore, we can calculate
"latch_events" before entering the loop. By the way, the name
'latch_events' seems a bit off. Latch is a kind of event the function
can wait for. Therefore, something like wait_events might be more
appropriate.

===== L567
+                       delay_ms = (endtime - GetCurrentTimestamp()) / 1000;

We can use TimestampDifferenceMilliseconds() here.


==== L578
+               if (rc & WL_LATCH_SET)
+                       ResetLatch(MyLatch);

I think we usually reset latches unconditionally after returning from
WaitLatch(), even when waiting for timeouts.


===== L756
+{ oid => '16387', descr => 'wait for LSN with timeout',

The description seems a bit off. While timeout is mentioned, the more
important characteristic that the LSN is the replay LSN is not
included.

===== L791
+ * WaitLSNProcInfo [e2 80 93] the shared memory structure representing 
information

This line contains a non-ascii character EN Dash(U+2013).


===== L798
+        * A process number, same as the index of this item in 
waitLSN->procInfos.
+        * Stored for convenience.
+        */
+       int                     procnum;

It is described as "(just) for convenience". However, it is referenced
by Startup to fetch the PGPROC entry for the waiter, which necessary
for Startup. That aside, why don't we hold (the pointer to) procLatch
instead of procnum? It makes things simpler and I believe it is our
standard practice.


===== L809
+       /* A flag indicating that this item is added to waitLSN->waitersHeap */
+       bool            inHeap;

The name "inHeap" seems too leteral and might be hard to understand in
most occurances. How about using "waiting" instead?

===== L920
+# I
+# Make sure that pg_wal_replay_wait() works: add new content to

Hmm. I feel that Arabic numerals look nicer than Greek ones here.


===== L940
+# Check that new data is visible after calling pg_wal_replay_wait()

On the other hand, the comment for the check for this test states that

> +# Make sure the current LSN on standby and is the same as primary's
> LSN +ok($output eq 30, "standby reached the same LSN as primary");

I think the first comment and the second should be consistent.




> Oh, I forgot some notes about 044_wal_replay_wait_injection_test.pl.
> 
> 1. It's not clear why this test needs node_standby2 at all.  It seems useless.

I agree with you. What we would need is a second *waiter client*
connecting to the same stanby rather than a second standby. I feel
like having a test where the first waiter is removed while multiple
waiters are waiting, as well as a test where promotion occurs under
the same circumstances.

> 2. The target LSN is set to pg_current_wal_insert_lsn() + 10000.  This
> location seems to be unachievable in this test.  So, it's not clear
> what race condition this test could potentially detect.
> 3. I think it would make sense to check for the race condition
> reported by Heikki.  That is to insert the injection point at the
> beginning of WaitLSNSetLatches().

I think the race condition you mentioned refers to the inconsistency
between the inHeap flag and the pairing heap caused by a race
condition between timeout and wakeup (or perhaps other combinations?
I'm not sure which version of the patch the mentioned race condition
refers to). However, I imagine it is difficult to reliably reproduce
this condition. In that regard, in the latest patch, the coherence
between the inHeap flag and the pairing heap is protected by LWLock,
so I believe we no longer need that test.

regards.


> Links.
> 1. 
> https://www.postgresql.org/message-id/flat/CAPpHfdvGRssjqwX1%2Bidm5Tu-eWsTcx6DthB2LhGqA1tZ29jJaw%40mail.gmail.com#557900e860457a9e24256c93a2ad4920

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: [HACKERS] make async slave to wait for lsn to be replayed

Reply via email to