On 2021-03-12 12:39, Fujii Masao wrote:
On 2021/03/11 21:29, Masahiro Ikeda wrote:
In addition, I rebased the patch for WAL receiver.
(v17-0003-Makes-the-wal-receiver-report-WAL-statistics.patch)

Thanks! Will review this later.

Thanks a lot!

I read through the 0003 patch. Here are some comments for that.

With the patch, walreceiver's stats are counted as wal_write,
wal_sync, wal_write_time and wal_sync_time in pg_stat_wal. But they
should be counted as different columns because WAL IO is different
between walreceiver and other processes like a backend? For example,
open_sync or open_datasync is chosen as wal_sync_method, those other
processes use O_DIRECT flag to open WAL files, but walreceiver does
not. For example, those other procesess write WAL data in block units,
but walreceiver does not. So I'm concerned that mixing different WAL
IO stats in the same columns would confuse the users. Thought? I'd
like to hear more opinions about how to expose walreceiver's stats to
users.

Thanks, I understood get_sync_bit() checks the sync flags and
the write unit of generated wal data and replicated wal data is different.
(It's interesting optimization whether to use kernel cache or not.)

OK. Although I agree to separate the stats for the walrecever,
I want to hear opinions from other people too. I didn't change the patch.

Please feel to your comments.




+int
+XLogWriteFile(int fd, const void *buf, size_t nbyte, off_t offset)

This common function writes WAL data and measures IO timing. IMO we
can refactor the code furthermore by making this function handle the
case where pg_write() reports an error. In other words, I think that
the function should do what do-while loop block in XLogWrite() does.
Thought?

OK. I agree.

I wonder to change the error check ways depending on who calls this function? Now, only the walreceiver checks (1)errno==0 and doesn't check (2)errno==ENITR.
Other processes are the opposite.

IIUC, it's appropriate that every process checks (1)(2).
Please let me know my understanding is wrong.



BTW, currently XLogWrite() increments IO timing even when pg_pwrite()
reports an error. But this is useless. Probably IO timing should be
incremented after the return code of pg_pwrite() is checked, instead?

Yes, I agree. I fixed it.
(v18-0003-Makes-the-wal-receiver-report-WAL-statistics.patch)



BTW, thanks for your comments in person that the bgwriter process will generate wal data. I checked that it generates the WAL to take a snapshot via LogStandySnapshot().
I attached the patch for bgwriter to send the wal stats.
(v18-0005-send-stats-for-bgwriter-when-shutdown.patch)

This patch includes the following changes.

(1) introduce pgstat_send_bgwriter() the mechanism to send the stats
    if PGSTAT_STAT_INTERVAL msec has passed like pgstat_send_wal()
    to avoid overloading to stats collector because "bgwriter_delay"
    can be set for 10msec or more.

(2) remove pgstat_report_wal() and integrate with pgstat_send_wal()
because bgwriter sends WalStats.m_wal_records and to avoid overloading (see (1)). IIUC, although the pros to separate them is to reduce the calculation cost of
    WalUsageAccumDiff(), the impact is limited.

(3) make a new signal handler for bgwriter to force sending remaining stats during shutdown because of (1) and remove HandleMainLoopInterrupts() because there are no processes to use it.


Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 9b2eb0d10b..c7bda9127f 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -2536,7 +2536,6 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible)
 			Size		nbytes;
 			Size		nleft;
 			int			written;
-			instr_time	start;
 
 			/* OK to write the page(s) */
 			from = XLogCtl->pages + startidx * (Size) XLOG_BLCKSZ;
@@ -2544,49 +2543,9 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible)
 			nleft = nbytes;
 			do
 			{
-				errno = 0;
+				written = XLogWriteFile(openLogFile, from, nleft, (off_t) startoffset,
+										ThisTimeLineID, openLogSegNo, wal_segment_size);
 
-				/* Measure I/O timing to write WAL data */
-				if (track_wal_io_timing)
-					INSTR_TIME_SET_CURRENT(start);
-
-				pgstat_report_wait_start(WAIT_EVENT_WAL_WRITE);
-				written = pg_pwrite(openLogFile, from, nleft, startoffset);
-				pgstat_report_wait_end();
-
-				/*
-				 * Increment the I/O timing and the number of times WAL data
-				 * were written out to disk.
-				 */
-				if (track_wal_io_timing)
-				{
-					instr_time	duration;
-
-					INSTR_TIME_SET_CURRENT(duration);
-					INSTR_TIME_SUBTRACT(duration, start);
-					WalStats.m_wal_write_time += INSTR_TIME_GET_MICROSEC(duration);
-				}
-
-				WalStats.m_wal_write++;
-
-				if (written <= 0)
-				{
-					char		xlogfname[MAXFNAMELEN];
-					int			save_errno;
-
-					if (errno == EINTR)
-						continue;
-
-					save_errno = errno;
-					XLogFileName(xlogfname, ThisTimeLineID, openLogSegNo,
-								 wal_segment_size);
-					errno = save_errno;
-					ereport(PANIC,
-							(errcode_for_file_access(),
-							 errmsg("could not write to log file %s "
-									"at offset %u, length %zu: %m",
-									xlogfname, startoffset, nleft)));
-				}
 				nleft -= written;
 				from += written;
 				startoffset += written;
@@ -2707,6 +2666,82 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible)
 	}
 }
 
+/*
+ * Issue pg_pwrite to write an WAL segment file.
+ *
+ * 'fd' is a file descriptor for the XLOG file to write
+ * 'buf' is a buffer starting address to write.
+ * 'nbyte' is a number of max bytes to write up.
+ * 'offset' is a offset of XLOG file to be set.
+ * 'timelineid' is a timeline ID of WAL segment file.
+ * 'segno' is a segment number of WAL segment file.
+ * 'segsize' is a segment size of WAL segment file.
+ *
+ * Return the number of bytes written.
+ */
+int
+XLogWriteFile(int fd, const void *buf, size_t nbyte, off_t offset,
+			  TimeLineID timelineid, XLogSegNo segno, int segsize)
+{
+	/*
+	 * Loop until to write the buffer data or an error occurred.
+	 */
+	for (;;)
+	{
+		int			written;
+		instr_time	start;
+
+		errno = 0;
+
+		/* Measure I/O timing to write WAL data */
+		if (track_wal_io_timing)
+			INSTR_TIME_SET_CURRENT(start);
+
+		pgstat_report_wait_start(WAIT_EVENT_WAL_WRITE);
+		written = pg_pwrite(fd, buf, nbyte, offset);
+		pgstat_report_wait_end();
+
+		if (written <= 0)
+		{
+			char		xlogfname[MAXFNAMELEN];
+			int			save_errno;
+
+			if (errno == EINTR)
+				continue;
+
+			/* if write didn't set errno, assume no disk space */
+			if (errno == 0)
+				errno = ENOSPC;
+
+			save_errno = errno;
+			XLogFileName(xlogfname, timelineid, segno, segsize);
+			errno = save_errno;
+			ereport(PANIC,
+					(errcode_for_file_access(),
+					 errmsg("could not write to log file %s "
+							"at offset %u, length %zu: %m",
+							xlogfname, (unsigned int) offset, (unsigned long) nbyte)));
+		}
+
+		/*
+		 * Increment the I/O timing and the number of times WAL data were
+		 * written out to disk.
+		 */
+		if (track_wal_io_timing)
+		{
+			instr_time	duration;
+
+			INSTR_TIME_SET_CURRENT(duration);
+			INSTR_TIME_SUBTRACT(duration, start);
+			WalStats.m_wal_write_time += INSTR_TIME_GET_MICROSEC(duration);
+		}
+
+		WalStats.m_wal_write++;
+
+		return written;
+	}
+}
+
 /*
  * Record the LSN for an asynchronous transaction commit/abort
  * and nudge the WALWriter if there is work for it to do.
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index a7a94d2a83..e9de78ffaa 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -771,6 +771,9 @@ WalRcvDie(int code, Datum arg)
 	/* Ensure that all WAL records received are flushed to disk */
 	XLogWalRcvFlush(true);
 
+	/* Send WAL statistics to the stats collector before terminating */
+	pgstat_send_wal(true);
+
 	/* Mark ourselves inactive in shared memory */
 	SpinLockAcquire(&walrcv->mutex);
 	Assert(walrcv->walRcvState == WALRCV_STREAMING ||
@@ -868,7 +871,7 @@ XLogWalRcvProcessMsg(unsigned char type, char *buf, Size len)
 static void
 XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr)
 {
-	int			startoff;
+	uint32		startoff;
 	int			byteswritten;
 
 	while (nbytes > 0)
@@ -910,6 +913,12 @@ XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr)
 					XLogArchiveForceDone(xlogfname);
 				else
 					XLogArchiveNotify(xlogfname);
+
+				/*
+				 * Send WAL statistics to the stats collector when finishing
+				 * the current WAL segment file to avoid overloading it.
+				 */
+				pgstat_send_wal(false);
 			}
 			recvFile = -1;
 
@@ -929,27 +938,8 @@ XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr)
 			segbytes = nbytes;
 
 		/* OK to write the logs */
-		errno = 0;
-
-		byteswritten = pg_pwrite(recvFile, buf, segbytes, (off_t) startoff);
-		if (byteswritten <= 0)
-		{
-			char		xlogfname[MAXFNAMELEN];
-			int			save_errno;
-
-			/* if write didn't set errno, assume no disk space */
-			if (errno == 0)
-				errno = ENOSPC;
-
-			save_errno = errno;
-			XLogFileName(xlogfname, recvFileTLI, recvSegNo, wal_segment_size);
-			errno = save_errno;
-			ereport(PANIC,
-					(errcode_for_file_access(),
-					 errmsg("could not write to log segment %s "
-							"at offset %u, length %lu: %m",
-							xlogfname, startoff, (unsigned long) segbytes)));
-		}
+		byteswritten = XLogWriteFile(recvFile, buf, segbytes, (off_t) startoff,
+									 recvFileTLI, recvSegNo, wal_segment_size);
 
 		/* Update state for write */
 		recptr += byteswritten;
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 6d384d3ce6..fd478b3478 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -298,6 +298,10 @@ extern bool XLogBackgroundFlush(void);
 extern bool XLogNeedsFlush(XLogRecPtr RecPtr);
 extern int	XLogFileInit(XLogSegNo segno, bool *use_existent, bool use_lock);
 extern int	XLogFileOpen(XLogSegNo segno);
+extern int	XLogWriteFile(int fd, const void *buf,
+						  size_t nbyte, off_t offset,
+						  TimeLineID timelineid, XLogSegNo segno,
+						  int segsize);
 
 extern void CheckXLogRemoved(XLogSegNo segno, TimeLineID tli);
 extern XLogSegNo XLogGetLastRemovedSegno(void);
diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index 715d5195bb..7cd01b38e9 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -83,6 +83,8 @@ int			BgWriterDelay = 200;
 static TimestampTz last_snapshot_ts;
 static XLogRecPtr last_snapshot_lsn = InvalidXLogRecPtr;
 
+/* Prototypes for private functions */
+static void HandleBackgroundWriterInterrupts(void);
 
 /*
  * Main entry point for bgwriter process
@@ -236,7 +238,7 @@ BackgroundWriterMain(void)
 		/* Clear any already-pending wakeups */
 		ResetLatch(MyLatch);
 
-		HandleMainLoopInterrupts();
+		HandleBackgroundWriterInterrupts();
 
 		/*
 		 * Do one cycle of dirty-buffer writing.
@@ -244,9 +246,11 @@ BackgroundWriterMain(void)
 		can_hibernate = BgBufferSync(&wb_context);
 
 		/*
-		 * Send off activity statistics to the stats collector
+		 * Send off activity statistics to the stats collector. Since
+		 * LogStandbySnapshot() will generate the WAL, send the WAL stats too.
 		 */
-		pgstat_send_bgwriter();
+		pgstat_send_bgwriter(false);
+		pgstat_send_wal(false);
 
 		if (FirstCallSinceLastCheckpoint())
 		{
@@ -349,3 +353,36 @@ BackgroundWriterMain(void)
 		prev_hibernate = can_hibernate;
 	}
 }
+
+/*
+ * Interrupt handler for main loops of Background Writer process.
+ */
+static void
+HandleBackgroundWriterInterrupts(void)
+{
+	if (ProcSignalBarrierPending)
+		ProcessProcSignalBarrier();
+
+	if (ConfigReloadPending)
+	{
+		ConfigReloadPending = false;
+		ProcessConfigFile(PGC_SIGHUP);
+	}
+
+	if (ShutdownRequestPending)
+	{
+		/*
+		 * Force to send remaining statistics to the stats collector at
+		 * process exit.
+		 *
+		 * Since pgstat_send_bgwriter and pgstat_send_wal are invoked with
+		 * 'force' is false in main loop to avoid overloading to the stats
+		 * collector, there may exist unsent stats counters for the background
+		 * writer.
+		 */
+		pgstat_send_bgwriter(true);
+		pgstat_send_wal(true);
+
+		proc_exit(0);
+	}
+}
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 94b55a17b3..c5f197434c 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -502,10 +502,10 @@ CheckpointerMain(void)
 		 * worth the trouble to split the stats support into two independent
 		 * stats message types.)
 		 */
-		pgstat_send_bgwriter();
+		pgstat_send_bgwriter(true);
 
 		/* Send WAL statistics to the stats collector. */
-		pgstat_report_wal();
+		pgstat_send_wal(true);
 
 		/*
 		 * If any checkpoint flags have been set, redo the loop to handle the
@@ -582,8 +582,8 @@ HandleCheckpointerInterrupts(void)
 		 */
 		BgWriterStats.m_requested_checkpoints++;
 		ShutdownXLOG(0, 0);
-		pgstat_send_bgwriter();
-		pgstat_report_wal();
+		pgstat_send_bgwriter(true);
+		pgstat_send_wal(true);
 
 		/* Normal exit from the checkpointer is here */
 		proc_exit(0);			/* done */
@@ -724,7 +724,7 @@ CheckpointWriteDelay(int flags, double progress)
 		/*
 		 * Report interim activity statistics to the stats collector.
 		 */
-		pgstat_send_bgwriter();
+		pgstat_send_bgwriter(true);
 
 		/*
 		 * This sleep used to be connected to bgwriter_delay, typically 200ms.
diff --git a/src/backend/postmaster/interrupt.c b/src/backend/postmaster/interrupt.c
index dd9136a942..6522cb311f 100644
--- a/src/backend/postmaster/interrupt.c
+++ b/src/backend/postmaster/interrupt.c
@@ -26,31 +26,12 @@
 volatile sig_atomic_t ConfigReloadPending = false;
 volatile sig_atomic_t ShutdownRequestPending = false;
 
-/*
- * Simple interrupt handler for main loops of background processes.
- */
-void
-HandleMainLoopInterrupts(void)
-{
-	if (ProcSignalBarrierPending)
-		ProcessProcSignalBarrier();
-
-	if (ConfigReloadPending)
-	{
-		ConfigReloadPending = false;
-		ProcessConfigFile(PGC_SIGHUP);
-	}
-
-	if (ShutdownRequestPending)
-		proc_exit(0);
-}
-
 /*
  * Simple signal handler for triggering a configuration reload.
  *
  * Normally, this handler would be used for SIGHUP. The idea is that code
  * which uses it would arrange to check the ConfigReloadPending flag at
- * convenient places inside main loops, or else call HandleMainLoopInterrupts.
+ * convenient places inside main loops.
  */
 void
 SignalHandlerForConfigReload(SIGNAL_ARGS)
@@ -98,7 +79,7 @@ SignalHandlerForCrashExit(SIGNAL_ARGS)
  * or SIGTERM.
  *
  * ShutdownRequestPending should be checked at a convenient place within the
- * main loop, or else the main loop should call HandleMainLoopInterrupts.
+ * main loop.
  */
 void
 SignalHandlerForShutdownRequest(SIGNAL_ARGS)
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index b1e2d94951..de16696dc6 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -146,8 +146,8 @@ PgStat_MsgWal WalStats;
 
 /*
  * WAL usage counters saved from pgWALUsage at the previous call to
- * pgstat_report_wal(). This is used to calculate how much WAL usage
- * happens between pgstat_report_wal() calls, by substracting
+ * pgstat_send_wal(). This is used to calculate how much WAL usage
+ * happens between pgstat_send_wal() calls, by substracting
  * the previous counters from the current ones.
  */
 static WalUsage prevWalUsage;
@@ -975,7 +975,7 @@ pgstat_report_stat(bool disconnect)
 	pgstat_send_funcstats();
 
 	/* Send WAL statistics */
-	pgstat_report_wal();
+	pgstat_send_wal(true);
 
 	/* Finally send SLRU statistics */
 	pgstat_send_slru();
@@ -3118,7 +3118,7 @@ pgstat_initialize(void)
 	}
 
 	/*
-	 * Initialize prevWalUsage with pgWalUsage so that pgstat_report_wal() can
+	 * Initialize prevWalUsage with pgWalUsage so that pgstat_send_wal() can
 	 * calculate how much pgWalUsage counters are increased by substracting
 	 * prevWalUsage from pgWalUsage.
 	 */
@@ -4643,13 +4643,19 @@ pgstat_send_archiver(const char *xlog, bool failed)
  * pgstat_send_bgwriter() -
  *
  *		Send bgwriter statistics to the collector
+ *
+ * If 'force' is not set, bgwriter stats message is only sent if enough time has
+ * passed since last one was sent to reach PGSTAT_STAT_INTERVAL.
+ *
+ * Return true if the message is sent, and false otherwise.
  * ----------
  */
 void
-pgstat_send_bgwriter(void)
+pgstat_send_bgwriter(bool force)
 {
 	/* We assume this initializes to zeroes */
 	static const PgStat_MsgBgWriter all_zeroes;
+	static TimestampTz sendTime = 0;
 
 	/*
 	 * This function can be called even if nothing at all has happened. In
@@ -4659,6 +4665,19 @@ pgstat_send_bgwriter(void)
 	if (memcmp(&BgWriterStats, &all_zeroes, sizeof(PgStat_MsgBgWriter)) == 0)
 		return;
 
+	if (!force)
+	{
+		TimestampTz now = GetCurrentTimestamp();
+
+		/*
+		 * Don't send a message unless it's been at least PGSTAT_STAT_INTERVAL
+		 * msec since we last sent one.
+		 */
+		if (!TimestampDifferenceExceeds(sendTime, now, PGSTAT_STAT_INTERVAL))
+			return;
+		sendTime = now;
+	}
+
 	/*
 	 * Prepare and send the message
 	 */
@@ -4672,19 +4691,25 @@ pgstat_send_bgwriter(void)
 }
 
 /* ----------
- * pgstat_report_wal() -
+ * pgstat_send_wal() -
  *
- * Calculate how much WAL usage counters are increased and send
- * WAL statistics to the collector.
+ *	Send WAL statistics to the collector.
  *
- * Must be called by processes that generate WAL.
+ * If 'force' is not set, WAL stats message is only sent if enough time has
+ * passed since last one was sent to reach PGSTAT_STAT_INTERVAL.
+ *
+ * Return true if the message is sent, and false otherwise.
  * ----------
  */
 void
-pgstat_report_wal(void)
+pgstat_send_wal(bool force)
 {
 	WalUsage	walusage;
 
+	/* We assume this initializes to zeroes */
+	static const PgStat_MsgWal all_zeroes;
+	static TimestampTz sendTime = 0;
+
 	/*
 	 * Calculate how much WAL usage counters are increased by substracting the
 	 * previous counters from the current ones. Fill the results in WAL stats
@@ -4697,43 +4722,13 @@ pgstat_report_wal(void)
 	WalStats.m_wal_fpi = walusage.wal_fpi;
 	WalStats.m_wal_bytes = walusage.wal_bytes;
 
-	/*
-	 * Send WAL stats message to the collector.
-	 */
-	if (!pgstat_send_wal(true))
-		return;
-
-	/*
-	 * Save the current counters for the subsequent calculation of WAL usage.
-	 */
-	prevWalUsage = pgWalUsage;
-}
-
-/* ----------
- * pgstat_send_wal() -
- *
- *	Send WAL statistics to the collector.
- *
- * If 'force' is not set, WAL stats message is only sent if enough time has
- * passed since last one was sent to reach PGSTAT_STAT_INTERVAL.
- *
- * Return true if the message is sent, and false otherwise.
- * ----------
- */
-bool
-pgstat_send_wal(bool force)
-{
-	/* We assume this initializes to zeroes */
-	static const PgStat_MsgWal all_zeroes;
-	static TimestampTz sendTime = 0;
-
 	/*
 	 * This function can be called even if nothing at all has happened. In
 	 * this case, avoid sending a completely empty message to the stats
 	 * collector.
 	 */
 	if (memcmp(&WalStats, &all_zeroes, sizeof(PgStat_MsgWal)) == 0)
-		return false;
+		return;
 
 	if (!force)
 	{
@@ -4744,7 +4739,7 @@ pgstat_send_wal(bool force)
 		 * msec since we last sent one.
 		 */
 		if (!TimestampDifferenceExceeds(sendTime, now, PGSTAT_STAT_INTERVAL))
-			return false;
+			return;
 		sendTime = now;
 	}
 
@@ -4759,7 +4754,10 @@ pgstat_send_wal(bool force)
 	 */
 	MemSet(&WalStats, 0, sizeof(WalStats));
 
-	return true;
+	/*
+	 * Save the current counters for the subsequent calculation of WAL usage.
+	 */
+	prevWalUsage = pgWalUsage;
 }
 
 /* ----------
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index be43c04802..993b774bb9 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -1600,9 +1600,8 @@ extern void pgstat_twophase_postabort(TransactionId xid, uint16 info,
 									  void *recdata, uint32 len);
 
 extern void pgstat_send_archiver(const char *xlog, bool failed);
-extern void pgstat_send_bgwriter(void);
-extern void pgstat_report_wal(void);
-extern bool pgstat_send_wal(bool force);
+extern void pgstat_send_bgwriter(bool force);
+extern void pgstat_send_wal(bool force);
 
 /* ----------
  * Support functions for the SQL-callable functions to
diff --git a/src/include/postmaster/interrupt.h b/src/include/postmaster/interrupt.h
index 85a1293ef1..0d333b819a 100644
--- a/src/include/postmaster/interrupt.h
+++ b/src/include/postmaster/interrupt.h
@@ -24,7 +24,6 @@
 extern PGDLLIMPORT volatile sig_atomic_t ConfigReloadPending;
 extern PGDLLIMPORT volatile sig_atomic_t ShutdownRequestPending;
 
-extern void HandleMainLoopInterrupts(void);
 extern void SignalHandlerForConfigReload(SIGNAL_ARGS);
 extern void SignalHandlerForCrashExit(SIGNAL_ARGS);
 extern void SignalHandlerForShutdownRequest(SIGNAL_ARGS);

Reply via email to