On 2020/03/18 17:56, Atsushi Torikoshi wrote:
On Tue, Mar 17, 2020 at 11:55 AM Fujii Masao <masao.fu...@oss.nttdata.com
<mailto:masao.fu...@oss.nttdata.com>> wrote:
> > Waiting when WAL data is not available from any kind of sources
> > (local, archive or stream) before trying again to retrieve WAL
data,
>
> I think 'local' means pg_wal here, but the comment on
> WaitForWALToBecomeAvailable() says checking pg_wal in
> standby mode is 'not documented', so I'm a little bit worried
> that users may be confused.
This logic seems to be documented in high-availability.sgml.
Thanks! I didn't notice it.
I think you mean the below sentence.
> The standby server will also attempt to restore any WAL found in the
standby cluster's pg_wal directory.
I meant the following part in the doc.
---------------------
At startup, the standby begins by restoring all WAL available in the archive
location, calling restore_command. Once it reaches the end of WAL available
there and restore_command fails, it tries to restore any WAL available in the
pg_wal directory. If that fails, and streaming replication has been configured,
the standby tries to connect to the primary server and start streaming WAL from
the last valid record found in archive or pg_wal. If that fails or streaming
replication is not configured, or if the connection is later disconnected,
the standby goes back to step 1 and tries to restore the file from the archive
again. This loop of retries from the archive, pg_wal, and via streaming
replication goes on until the server is stopped or failover is triggered by a
trigger file.
---------------------
It seems the comment on WaitForWALToBecomeAvailable()
does not go along with the high-availability.sgml, do we need
modification on the comment on the function?
No, I think for now. But you'd like to improve the docs?
But, anyway, you think that "pg_wal" should be used instead
of "local" here?
I don't have special opinion here.
It might be better because high-availability.sgml does not use
"local" but "pg_wal" for the explanation, but I also feel it's
obvious in this context.
Ok, I changed that from "local" to "pg_wal" in the patch for
the master. Attached is the updated version of the patch.
If you're OK with this, I'd like to commit two patches that I posted
in this thread.
Regards,
--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 7626987808..89853a16d8 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1244,7 +1244,7 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss
11:34 0:00 postgres: ser
<entry>Waiting to acquire a pin on a buffer.</entry>
</row>
<row>
- <entry morerows="13"><literal>Activity</literal></entry>
+ <entry morerows="12"><literal>Activity</literal></entry>
<entry><literal>ArchiverMain</literal></entry>
<entry>Waiting in main loop of the archiver process.</entry>
</row>
@@ -1276,17 +1276,9 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss
11:34 0:00 postgres: ser
<entry><literal>PgStatMain</literal></entry>
<entry>Waiting in main loop of the statistics collector
process.</entry>
</row>
- <row>
- <entry><literal>RecoveryWalAll</literal></entry>
- <entry>Waiting for WAL from a stream at recovery.</entry>
- </row>
<row>
<entry><literal>RecoveryWalStream</literal></entry>
- <entry>
- Waiting when WAL data is not available from any kind of sources
- (local, archive or stream) before trying again to retrieve WAL data,
- at recovery.
- </entry>
+ <entry>Waiting for WAL from a stream at recovery.</entry>
</row>
<row>
<entry><literal>SysLoggerMain</literal></entry>
@@ -1496,7 +1488,7 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss
11:34 0:00 postgres: ser
<entry>Waiting for confirmation from remote server during synchronous
replication.</entry>
</row>
<row>
- <entry morerows="2"><literal>Timeout</literal></entry>
+ <entry morerows="3"><literal>Timeout</literal></entry>
<entry><literal>BaseBackupThrottle</literal></entry>
<entry>Waiting during base backup when throttling activity.</entry>
</row>
@@ -1508,6 +1500,14 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss
11:34 0:00 postgres: ser
<entry><literal>RecoveryApplyDelay</literal></entry>
<entry>Waiting to apply WAL at recovery because it is delayed.</entry>
</row>
+ <row>
+ <entry><literal>RecoveryRetrieveRetryInterval</literal></entry>
+ <entry>
+ Waiting when WAL data is not available from any kind of sources
+ (<filename>pg_wal</filename>, archive or stream) before trying
+ again to retrieve WAL data, at recovery.
+ </entry>
+ </row>
<row>
<entry morerows="68"><literal>IO</literal></entry>
<entry><literal>BufFileRead</literal></entry>
diff --git a/src/backend/access/transam/xlog.c
b/src/backend/access/transam/xlog.c
index de2d4ee582..793c076da6 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -12031,7 +12031,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool
randAccess,
WL_LATCH_SET | WL_TIMEOUT |
WL_EXIT_ON_PM_DEATH,
wait_time,
-
WAIT_EVENT_RECOVERY_WAL_STREAM);
+
WAIT_EVENT_RECOVERY_RETRIEVE_RETRY_INTERVAL);
ResetLatch(&XLogCtl->recoveryWakeupLatch);
now = GetCurrentTimestamp();
}
@@ -12221,7 +12221,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool
randAccess,
(void)
WaitLatch(&XLogCtl->recoveryWakeupLatch,
WL_LATCH_SET | WL_TIMEOUT |
WL_EXIT_ON_PM_DEATH,
- 5000L,
WAIT_EVENT_RECOVERY_WAL_ALL);
+ 5000L,
WAIT_EVENT_RECOVERY_WAL_STREAM);
ResetLatch(&XLogCtl->recoveryWakeupLatch);
break;
}
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index f9287b7942..d29c211a76 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3602,9 +3602,6 @@ pgstat_get_wait_activity(WaitEventActivity w)
case WAIT_EVENT_PGSTAT_MAIN:
event_name = "PgStatMain";
break;
- case WAIT_EVENT_RECOVERY_WAL_ALL:
- event_name = "RecoveryWalAll";
- break;
case WAIT_EVENT_RECOVERY_WAL_STREAM:
event_name = "RecoveryWalStream";
break;
@@ -3824,6 +3821,9 @@ pgstat_get_wait_timeout(WaitEventTimeout w)
case WAIT_EVENT_RECOVERY_APPLY_DELAY:
event_name = "RecoveryApplyDelay";
break;
+ case WAIT_EVENT_RECOVERY_RETRIEVE_RETRY_INTERVAL:
+ event_name = "RecoveryRetrieveRetryInterval";
+ break;
/* no default case, so that compiler will warn */
}
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 1a19921f80..851d0a7246 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -761,7 +761,6 @@ typedef enum
WAIT_EVENT_LOGICAL_APPLY_MAIN,
WAIT_EVENT_LOGICAL_LAUNCHER_MAIN,
WAIT_EVENT_PGSTAT_MAIN,
- WAIT_EVENT_RECOVERY_WAL_ALL,
WAIT_EVENT_RECOVERY_WAL_STREAM,
WAIT_EVENT_SYSLOGGER_MAIN,
WAIT_EVENT_WAL_RECEIVER_MAIN,
@@ -848,7 +847,8 @@ typedef enum
{
WAIT_EVENT_BASE_BACKUP_THROTTLE = PG_WAIT_TIMEOUT,
WAIT_EVENT_PG_SLEEP,
- WAIT_EVENT_RECOVERY_APPLY_DELAY
+ WAIT_EVENT_RECOVERY_APPLY_DELAY,
+ WAIT_EVENT_RECOVERY_RETRIEVE_RETRY_INTERVAL
} WaitEventTimeout;
/* ----------