On 2021/04/13 9:33, Fujii Masao wrote:
> 
> 
> On 2021/03/30 20:37, Masahiro Ikeda wrote:
>> OK, I added the condition to the fast-return check. I noticed that I
>> misunderstood that the purpose is to avoid expanding a clock check using WAL
>> stats counters. But, the purpose is to make the conditions stricter, right?
> 
> Yes. Currently if the following condition is false even when the WAL counters
> are updated, nothing is sent to the stats collector. But with your patch,
> in this case the WAL stats are sent.
> 
>     if ((pgStatTabList == NULL || pgStatTabList->tsa_used == 0) &&
>         pgStatXactCommit == 0 && pgStatXactRollback == 0 &&
>         !have_function_stats && !disconnect)
> 
> Thanks for the patch! It now fails to be applied to the master cleanly.
> So could you rebase the patch?

Thanks for your comments!
I rebased it.


> pgstat_initialize() should initialize pgWalUsage with zero?

Thanks. But, I didn't handle it because I undo the logic to calculate the diff
as you mentioned below.


> +    /*
> +     * set the counters related to generated WAL data.
> +     */
> +    WalStats.m_wal_records = pgWalUsage.wal_records;
> +    WalStats.m_wal_fpi = pgWalUsage.wal_fpi;
> +    WalStats.m_wal_bytes = pgWalUsage.wal_bytes;
> 
> This should be skipped if pgWalUsage.wal_records is zero?

Yes, fixed it.


> + * Be careful that the counters are cleared after reporting them to
> + * the stats collector although you can use WalUsageAccumDiff()
> + * to computing differences to previous values. For backends,
> + * the counters may be reset after a transaction is finished and
> + * pgstat_send_wal() is invoked, so you can compute the difference
> + * in the same transaction only.
> 
> On the second thought, I'm afraid that this can be likely to be a foot-gun
> in the future. So I'm now wondering if the current approach (i.e., calculate
> the diff between the current and previous pgWalUsage and don't reset it
> to zero) is better. Thought? Since the similar data structure pgBufferUsage
> doesn't have this kind of restriction, I'm afraid that the developer can
> easily miss that only pgWalUsage has the restriction.
> 
> But currently the diff is calculated (i.e., WalUsageAccumDiff() is called)
> even when the WAL counters are not updated. Then if that calculated diff is
> zero, we skip sending the WAL stats. This logic should be improved. That is,
> probably we should be able to check whether the WAL counters are updated
> or not, without calculating the diff, because the calculation is not free.
> We can implement this by introducing new flag variable that we discussed
> upthread. This flag is set to true whenever the WAL counters are incremented
> and used to determine whether the WAL stats need to be sent.

Sound reasonable. I agreed that the restriction has a risk to lead mistakes.
I made the patch introducing a new flag.

- v4-0001-performance-improvements-of-reporting-wal-stats.patch


I think introducing a new flag is not necessary because we can know if the WAL
stats are updated or not using the counters of the generated wal record, wal
writes and wal syncs. It's fast compared to get timestamp. The attached patch
is to check if the counters are updated or not using them.

-
v4-0001-performance-improvements-of-reporting-wal-stats-without-introducing-a-new-variable.patch


> If we do this, another issue is that the data types for wal_records and 
> wal_fpi
> in pgWalUsage are long. Which may lead to overflow? If yes, it's should be
> replaced with uint64.

Yes, I fixed. BufferUsage's counters have same issue, so I fixed them too.

BTW, is it better to change PgStat_Counter from int64 to uint64 because there
aren't any counters with negative number?

Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 14be850cb3..265bb16cf6 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -173,21 +173,21 @@ typedef struct Counters
 	double		sum_var_time[PGSS_NUMKIND]; /* sum of variances in
 											 * planning/execution time in msec */
 	int64		rows;			/* total # of retrieved or affected rows */
-	int64		shared_blks_hit;	/* # of shared buffer hits */
-	int64		shared_blks_read;	/* # of shared disk blocks read */
-	int64		shared_blks_dirtied;	/* # of shared disk blocks dirtied */
-	int64		shared_blks_written;	/* # of shared disk blocks written */
-	int64		local_blks_hit; /* # of local buffer hits */
-	int64		local_blks_read;	/* # of local disk blocks read */
-	int64		local_blks_dirtied; /* # of local disk blocks dirtied */
-	int64		local_blks_written; /* # of local disk blocks written */
-	int64		temp_blks_read; /* # of temp blocks read */
-	int64		temp_blks_written;	/* # of temp blocks written */
+	uint64		shared_blks_hit;	/* # of shared buffer hits */
+	uint64		shared_blks_read;	/* # of shared disk blocks read */
+	uint64		shared_blks_dirtied;	/* # of shared disk blocks dirtied */
+	uint64		shared_blks_written;	/* # of shared disk blocks written */
+	uint64		local_blks_hit; /* # of local buffer hits */
+	uint64		local_blks_read;	/* # of local disk blocks read */
+	uint64		local_blks_dirtied; /* # of local disk blocks dirtied */
+	uint64		local_blks_written; /* # of local disk blocks written */
+	uint64		temp_blks_read; /* # of temp blocks read */
+	uint64		temp_blks_written;	/* # of temp blocks written */
 	double		blk_read_time;	/* time spent reading, in msec */
 	double		blk_write_time; /* time spent writing, in msec */
 	double		usage;			/* usage factor */
-	int64		wal_records;	/* # of WAL records generated */
-	int64		wal_fpi;		/* # of WAL full page images generated */
+	uint64		wal_records;	/* # of WAL records generated */
+	uint64		wal_fpi;		/* # of WAL full page images generated */
 	uint64		wal_bytes;		/* total amount of WAL generated in bytes */
 } Counters;
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 45fdecbbd9..848eff2212 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -1279,6 +1279,7 @@ XLogInsertRecord(XLogRecData *rdata,
 		pgWalUsage.wal_bytes += rechdr->xl_tot_len;
 		pgWalUsage.wal_records++;
 		pgWalUsage.wal_fpi += num_fpi;
+		walUsageUpdated = true;
 	}
 
 	return EndPos;
diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c
index 237e13361b..75ecd00c23 100644
--- a/src/backend/executor/instrument.c
+++ b/src/backend/executor/instrument.c
@@ -17,6 +17,12 @@
 
 #include "executor/instrument.h"
 
+/*
+ * Buffer and generated WAL usage counters.
+ *
+ * The counters are accumulated values. There are infrastructures
+ * to add the values and calculate the difference within a specific period.
+ */
 BufferUsage pgBufferUsage;
 static BufferUsage save_pgBufferUsage;
 WalUsage	pgWalUsage;
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index cdd07770a0..75a95f3de7 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -505,7 +505,7 @@ CheckpointerMain(void)
 		pgstat_send_bgwriter();
 
 		/* Send WAL statistics to the stats collector. */
-		pgstat_report_wal();
+		pgstat_send_wal(true);
 
 		/*
 		 * If any checkpoint flags have been set, redo the loop to handle the
@@ -583,7 +583,7 @@ HandleCheckpointerInterrupts(void)
 		BgWriterStats.m_requested_checkpoints++;
 		ShutdownXLOG(0, 0);
 		pgstat_send_bgwriter();
-		pgstat_report_wal();
+		pgstat_send_wal(true);
 
 		/* Normal exit from the checkpointer is here */
 		proc_exit(0);			/* done */
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 232aeeac29..2f1f6d8cc5 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -133,12 +133,18 @@ PgStat_MsgWal WalStats;
 
 /*
  * WAL usage counters saved from pgWALUsage at the previous call to
- * pgstat_report_wal(). This is used to calculate how much WAL usage
- * happens between pgstat_report_wal() calls, by substracting
+ * pgstat_send_wal(). This is used to calculate how much WAL usage
+ * happens between pgstat_send_wal() calls, by substracting
  * the previous counters from the current ones.
  */
 static WalUsage prevWalUsage;
 
+/*
+ * WAL usage flag to save if the WAL counters are updated or not.
+ * This is used to decide sending the WAL statistics to the collector.
+ */
+bool walUsageUpdated;
+
 /*
  * List of SLRU names that we keep stats for.  There is no central registry of
  * SLRUs, so we use this fixed list instead.  The "other" entry is used for
@@ -855,10 +861,20 @@ pgstat_report_stat(bool disconnect)
 	TabStatusArray *tsa;
 	int			i;
 
-	/* Don't expend a clock check if nothing to do */
+	/*
+	 * Don't expend a clock check if nothing to do.
+	 *
+	 * Note: regarding to WAL statistics counters, it's not enough to check
+	 * only whether the wal record is generated or not. If a read transaction
+	 * is executed, wal records aren't nomally generated (although HOT makes
+	 * wal records). But, just writes and syncs the wal data may happen when
+	 * to flush buffers.
+	 */
 	if ((pgStatTabList == NULL || pgStatTabList->tsa_used == 0) &&
 		pgStatXactCommit == 0 && pgStatXactRollback == 0 &&
-		!have_function_stats && !disconnect)
+		!walUsageUpdated && walStats.wal_write == 0 &&
+		walStats.wal_sync == 0 && !have_function_stats &&
+		!disconnect)
 		return;
 
 	/*
@@ -951,7 +967,7 @@ pgstat_report_stat(bool disconnect)
 	pgstat_send_funcstats();
 
 	/* Send WAL statistics */
-	pgstat_report_wal();
+	pgstat_send_wal(true);
 
 	/* Finally send SLRU statistics */
 	pgstat_send_slru();
@@ -2933,12 +2949,15 @@ void
 pgstat_initialize(void)
 {
 	/*
-	 * Initialize prevWalUsage with pgWalUsage so that pgstat_report_wal() can
+	 * Initialize prevWalUsage with pgWalUsage so that pgstat_send_wal() can
 	 * calculate how much pgWalUsage counters are increased by substracting
 	 * prevWalUsage from pgWalUsage.
 	 */
 	prevWalUsage = pgWalUsage;
 
+	/* WAL usage counters are not updated yet. */
+	walUsageUpdated = false;
+
 	/* Set up a process-exit hook to clean up */
 	on_shmem_exit(pgstat_shutdown_hook, 0);
 }
@@ -3045,44 +3064,6 @@ pgstat_send_bgwriter(void)
 	MemSet(&BgWriterStats, 0, sizeof(BgWriterStats));
 }
 
-/* ----------
- * pgstat_report_wal() -
- *
- * Calculate how much WAL usage counters are increased and send
- * WAL statistics to the collector.
- *
- * Must be called by processes that generate WAL.
- * ----------
- */
-void
-pgstat_report_wal(void)
-{
-	WalUsage	walusage;
-
-	/*
-	 * Calculate how much WAL usage counters are increased by substracting the
-	 * previous counters from the current ones. Fill the results in WAL stats
-	 * message.
-	 */
-	MemSet(&walusage, 0, sizeof(WalUsage));
-	WalUsageAccumDiff(&walusage, &pgWalUsage, &prevWalUsage);
-
-	WalStats.m_wal_records = walusage.wal_records;
-	WalStats.m_wal_fpi = walusage.wal_fpi;
-	WalStats.m_wal_bytes = walusage.wal_bytes;
-
-	/*
-	 * Send WAL stats message to the collector.
-	 */
-	if (!pgstat_send_wal(true))
-		return;
-
-	/*
-	 * Save the current counters for the subsequent calculation of WAL usage.
-	 */
-	prevWalUsage = pgWalUsage;
-}
-
 /* ----------
  * pgstat_send_wal() -
  *
@@ -3094,20 +3075,29 @@ pgstat_report_wal(void)
  * Return true if the message is sent, and false otherwise.
  * ----------
  */
-bool
+void
 pgstat_send_wal(bool force)
 {
-	/* We assume this initializes to zeroes */
-	static const PgStat_MsgWal all_zeroes;
 	static TimestampTz sendTime = 0;
 
 	/*
-	 * This function can be called even if nothing at all has happened. In
-	 * this case, avoid sending a completely empty message to the stats
-	 * collector.
+	 * First, to check the WAL stats counters were updated.
+	 *
+	 * Even if the 'force' is true, we don't need to send the stats if the
+	 * counters were not updated.
+	 *
+	 * We can know whether the counters were updated or not to check only
+	 * three counters. So, for performance, we don't allocate another memory
+	 * spaces and check the all stats like pgstat_send_slru().
+	 *
+	 * It's not enough to check the WAL usage counters are updated or not.
+	 * For example the walwriter may write/sync the WAL although it doesn't
+	 * generate wal records. 'wal_writes' and 'wal_sync' are zero means the
+	 * counters of time spent are zero too.
 	 */
-	if (memcmp(&WalStats, &all_zeroes, sizeof(PgStat_MsgWal)) == 0)
-		return false;
+	if (!walUsageUpdated && walStats.wal_write == 0 &&
+		walStats.wal_sync == 0)
+		return;
 
 	if (!force)
 	{
@@ -3115,25 +3105,52 @@ pgstat_send_wal(bool force)
 
 		/*
 		 * Don't send a message unless it's been at least PGSTAT_STAT_INTERVAL
-		 * msec since we last sent one.
+		 * msec since we last sent one to avoid overloading the stats
+		 * collector.
 		 */
 		if (!TimestampDifferenceExceeds(sendTime, now, PGSTAT_STAT_INTERVAL))
-			return false;
+			return;
 		sendTime = now;
 	}
 
+	/*
+	 * set the counters related to generated WAL data if the counters are
+	 * updated.
+	 */
+	if (walUsageUpdated)
+	{
+		WalUsage	walusage;
+
+		/*
+		 * Calculate how much WAL usage counters are increased by substracting
+		 * the previous counters from the current ones. Fill the results in
+		 * WAL stats message.
+		 */
+		MemSet(&walusage, 0, sizeof(WalUsage));
+		WalUsageAccumDiff(&walusage, &pgWalUsage, &prevWalUsage);
+
+		WalStats.m_wal_records = walusage.wal_records;
+		WalStats.m_wal_fpi = walusage.wal_fpi;
+		WalStats.m_wal_bytes = walusage.wal_bytes;
+	}
+
 	/*
 	 * Prepare and send the message
 	 */
 	pgstat_setheader(&WalStats.m_hdr, PGSTAT_MTYPE_WAL);
 	pgstat_send(&WalStats, sizeof(WalStats));
 
+	/*
+	 * Save the current counters for the subsequent calculation of WAL usage,
+	 * and reset the updated flag.
+	 */
+	prevWalUsage = pgWalUsage;
+	walUsageUpdated = false;
+
 	/*
 	 * Clear out the statistics buffer, so it can be re-used.
 	 */
 	MemSet(&WalStats, 0, sizeof(WalStats));
-
-	return true;
 }
 
 /* ----------
diff --git a/src/include/executor/instrument.h b/src/include/executor/instrument.h
index aa8eceda5f..9de29e51d6 100644
--- a/src/include/executor/instrument.h
+++ b/src/include/executor/instrument.h
@@ -16,26 +16,41 @@
 #include "portability/instr_time.h"
 
 
+/*
+ * The accumulated counters for buffer usage.
+ *
+ * The reason the counters are accumulated values is that there may be many
+ * callers to use them, so to avoid unexpected reset.
+ */
 typedef struct BufferUsage
 {
-	long		shared_blks_hit;	/* # of shared buffer hits */
-	long		shared_blks_read;	/* # of shared disk blocks read */
-	long		shared_blks_dirtied;	/* # of shared blocks dirtied */
-	long		shared_blks_written;	/* # of shared disk blocks written */
-	long		local_blks_hit; /* # of local buffer hits */
-	long		local_blks_read;	/* # of local disk blocks read */
-	long		local_blks_dirtied; /* # of local blocks dirtied */
-	long		local_blks_written; /* # of local disk blocks written */
-	long		temp_blks_read; /* # of temp blocks read */
-	long		temp_blks_written;	/* # of temp blocks written */
+	uint64		shared_blks_hit;	/* # of shared buffer hits */
+	uint64		shared_blks_read;	/* # of shared disk blocks read */
+	uint64		shared_blks_dirtied;	/* # of shared blocks dirtied */
+	uint64		shared_blks_written;	/* # of shared disk blocks written */
+	uint64		local_blks_hit; /* # of local buffer hits */
+	uint64		local_blks_read;	/* # of local disk blocks read */
+	uint64		local_blks_dirtied; /* # of local blocks dirtied */
+	uint64		local_blks_written; /* # of local disk blocks written */
+	uint64		temp_blks_read; /* # of temp blocks read */
+	uint64		temp_blks_written;	/* # of temp blocks written */
 	instr_time	blk_read_time;	/* time spent reading */
 	instr_time	blk_write_time; /* time spent writing */
 } BufferUsage;
 
+/*
+ * The accumulated counters for generated WAL usage.
+ *
+ * The reason the counters are accumulated values is the same as BufferUsage's one.
+ * And the reason to store only generated WAL usage and doesn't store WAL I/O
+ * activity, is that this is assumed for knowing the WAL usage in per query or
+ * transaction. So, common resources for the cluster like WAL I/O activity is
+ * not stored.
+ */
 typedef struct WalUsage
 {
-	long		wal_records;	/* # of WAL records produced */
-	long		wal_fpi;		/* # of WAL full page images produced */
+	uint64		wal_records;	/* # of WAL records produced */
+	uint64		wal_fpi;		/* # of WAL full page images produced */
 	uint64		wal_bytes;		/* size of WAL records produced */
 } WalUsage;
 
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index c2f70effbd..63414e7bc8 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -503,8 +503,8 @@ typedef struct PgStat_MsgBgWriter
 typedef struct PgStat_MsgWal
 {
 	PgStat_MsgHdr m_hdr;
-	PgStat_Counter m_wal_records;
-	PgStat_Counter m_wal_fpi;
+	uint64		m_wal_records;
+	uint64		m_wal_fpi;
 	uint64		m_wal_bytes;
 	PgStat_Counter m_wal_buffers_full;
 	PgStat_Counter m_wal_write;
@@ -984,6 +984,13 @@ extern PgStat_Counter pgStatTransactionIdleTime;
  */
 extern SessionEndType pgStatSessionEndCause;
 
+
+/*
+ * WAL usage flag to save if the WAL counters are updated or not.
+ * This is used to decide sending the WAL statistics to the collector.
+ */
+extern bool walUsageUpdated;
+
 /* ----------
  * Functions called from postmaster
  * ----------
@@ -1110,8 +1117,7 @@ extern void pgstat_twophase_postabort(TransactionId xid, uint16 info,
 extern void pgstat_send_archiver(const char *xlog, bool failed);
 extern void pgstat_send_bgwriter(void);
 extern void pgstat_send_recoveryprefetch(PgStat_RecoveryPrefetchStats *stats);
-extern void pgstat_report_wal(void);
-extern bool pgstat_send_wal(bool force);
+extern void pgstat_send_wal(bool force);
 
 /* ----------
  * Support functions for the SQL-callable functions to
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 14be850cb3..265bb16cf6 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -173,21 +173,21 @@ typedef struct Counters
 	double		sum_var_time[PGSS_NUMKIND]; /* sum of variances in
 											 * planning/execution time in msec */
 	int64		rows;			/* total # of retrieved or affected rows */
-	int64		shared_blks_hit;	/* # of shared buffer hits */
-	int64		shared_blks_read;	/* # of shared disk blocks read */
-	int64		shared_blks_dirtied;	/* # of shared disk blocks dirtied */
-	int64		shared_blks_written;	/* # of shared disk blocks written */
-	int64		local_blks_hit; /* # of local buffer hits */
-	int64		local_blks_read;	/* # of local disk blocks read */
-	int64		local_blks_dirtied; /* # of local disk blocks dirtied */
-	int64		local_blks_written; /* # of local disk blocks written */
-	int64		temp_blks_read; /* # of temp blocks read */
-	int64		temp_blks_written;	/* # of temp blocks written */
+	uint64		shared_blks_hit;	/* # of shared buffer hits */
+	uint64		shared_blks_read;	/* # of shared disk blocks read */
+	uint64		shared_blks_dirtied;	/* # of shared disk blocks dirtied */
+	uint64		shared_blks_written;	/* # of shared disk blocks written */
+	uint64		local_blks_hit; /* # of local buffer hits */
+	uint64		local_blks_read;	/* # of local disk blocks read */
+	uint64		local_blks_dirtied; /* # of local disk blocks dirtied */
+	uint64		local_blks_written; /* # of local disk blocks written */
+	uint64		temp_blks_read; /* # of temp blocks read */
+	uint64		temp_blks_written;	/* # of temp blocks written */
 	double		blk_read_time;	/* time spent reading, in msec */
 	double		blk_write_time; /* time spent writing, in msec */
 	double		usage;			/* usage factor */
-	int64		wal_records;	/* # of WAL records generated */
-	int64		wal_fpi;		/* # of WAL full page images generated */
+	uint64		wal_records;	/* # of WAL records generated */
+	uint64		wal_fpi;		/* # of WAL full page images generated */
 	uint64		wal_bytes;		/* total amount of WAL generated in bytes */
 } Counters;
 
diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c
index 237e13361b..75ecd00c23 100644
--- a/src/backend/executor/instrument.c
+++ b/src/backend/executor/instrument.c
@@ -17,6 +17,12 @@
 
 #include "executor/instrument.h"
 
+/*
+ * Buffer and generated WAL usage counters.
+ *
+ * The counters are accumulated values. There are infrastructures
+ * to add the values and calculate the difference within a specific period.
+ */
 BufferUsage pgBufferUsage;
 static BufferUsage save_pgBufferUsage;
 WalUsage	pgWalUsage;
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index cdd07770a0..75a95f3de7 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -505,7 +505,7 @@ CheckpointerMain(void)
 		pgstat_send_bgwriter();
 
 		/* Send WAL statistics to the stats collector. */
-		pgstat_report_wal();
+		pgstat_send_wal(true);
 
 		/*
 		 * If any checkpoint flags have been set, redo the loop to handle the
@@ -583,7 +583,7 @@ HandleCheckpointerInterrupts(void)
 		BgWriterStats.m_requested_checkpoints++;
 		ShutdownXLOG(0, 0);
 		pgstat_send_bgwriter();
-		pgstat_report_wal();
+		pgstat_send_wal(true);
 
 		/* Normal exit from the checkpointer is here */
 		proc_exit(0);			/* done */
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 232aeeac29..87b5088d68 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -133,8 +133,8 @@ PgStat_MsgWal WalStats;
 
 /*
  * WAL usage counters saved from pgWALUsage at the previous call to
- * pgstat_report_wal(). This is used to calculate how much WAL usage
- * happens between pgstat_report_wal() calls, by substracting
+ * pgstat_send_wal(). This is used to calculate how much WAL usage
+ * happens between pgstat_send_wal() calls, by substracting
  * the previous counters from the current ones.
  */
 static WalUsage prevWalUsage;
@@ -855,9 +855,19 @@ pgstat_report_stat(bool disconnect)
 	TabStatusArray *tsa;
 	int			i;
 
-	/* Don't expend a clock check if nothing to do */
+	/*
+	 * Don't expend a clock check if nothing to do.
+	 *
+	 * Note: regarding to WAL statistics counters, it's not enough to check
+	 * only whether the wal record is generated or not. If a read transaction
+	 * is executed, wal records aren't nomally generated (although HOT makes
+	 * wal records). But, just writes and syncs the wal data may happen when
+	 * to flush buffers.
+	 */
 	if ((pgStatTabList == NULL || pgStatTabList->tsa_used == 0) &&
 		pgStatXactCommit == 0 && pgStatXactRollback == 0 &&
+		pgWalUsage.wal_records == prevWalUsage.wal_records &&
+		walStats.wal_write == 0 && walStats.wal_sync == 0 &&
 		!have_function_stats && !disconnect)
 		return;
 
@@ -951,7 +961,7 @@ pgstat_report_stat(bool disconnect)
 	pgstat_send_funcstats();
 
 	/* Send WAL statistics */
-	pgstat_report_wal();
+	pgstat_send_wal(true);
 
 	/* Finally send SLRU statistics */
 	pgstat_send_slru();
@@ -2933,7 +2943,7 @@ void
 pgstat_initialize(void)
 {
 	/*
-	 * Initialize prevWalUsage with pgWalUsage so that pgstat_report_wal() can
+	 * Initialize prevWalUsage with pgWalUsage so that pgstat_send_wal() can
 	 * calculate how much pgWalUsage counters are increased by substracting
 	 * prevWalUsage from pgWalUsage.
 	 */
@@ -3045,44 +3055,6 @@ pgstat_send_bgwriter(void)
 	MemSet(&BgWriterStats, 0, sizeof(BgWriterStats));
 }
 
-/* ----------
- * pgstat_report_wal() -
- *
- * Calculate how much WAL usage counters are increased and send
- * WAL statistics to the collector.
- *
- * Must be called by processes that generate WAL.
- * ----------
- */
-void
-pgstat_report_wal(void)
-{
-	WalUsage	walusage;
-
-	/*
-	 * Calculate how much WAL usage counters are increased by substracting the
-	 * previous counters from the current ones. Fill the results in WAL stats
-	 * message.
-	 */
-	MemSet(&walusage, 0, sizeof(WalUsage));
-	WalUsageAccumDiff(&walusage, &pgWalUsage, &prevWalUsage);
-
-	WalStats.m_wal_records = walusage.wal_records;
-	WalStats.m_wal_fpi = walusage.wal_fpi;
-	WalStats.m_wal_bytes = walusage.wal_bytes;
-
-	/*
-	 * Send WAL stats message to the collector.
-	 */
-	if (!pgstat_send_wal(true))
-		return;
-
-	/*
-	 * Save the current counters for the subsequent calculation of WAL usage.
-	 */
-	prevWalUsage = pgWalUsage;
-}
-
 /* ----------
  * pgstat_send_wal() -
  *
@@ -3094,20 +3066,33 @@ pgstat_report_wal(void)
  * Return true if the message is sent, and false otherwise.
  * ----------
  */
-bool
+void
 pgstat_send_wal(bool force)
 {
-	/* We assume this initializes to zeroes */
-	static const PgStat_MsgWal all_zeroes;
 	static TimestampTz sendTime = 0;
 
 	/*
-	 * This function can be called even if nothing at all has happened. In
-	 * this case, avoid sending a completely empty message to the stats
-	 * collector.
+	 * First, to check the WAL stats counters were updated.
+	 *
+	 * Even if the 'force' is true, we don't need to send the stats if the
+	 * counters were not updated.
+	 *
+	 * We can know whether the counters were updated or not to check only
+	 * three counters. So, for performance, we don't allocate another memory
+	 * spaces and check the all stats like pgstat_send_slru().
+	 *
+	 * The current 'wal_records' is the same as the previous one means that
+	 * wal records weren't generated. So, the counters of 'wal_fpi',
+	 * 'wal_bytes', 'm_wal_buffers_full' are not updated neither.
+	 *
+	 * It's not enough to check the number of generated wal records, for
+	 * example the walwriter may write/sync the WAL although it doesn't
+	 * generate wal records. 'wal_writes' and 'wal_sync' are zero means the
+	 * counters of time spent are zero too.
 	 */
-	if (memcmp(&WalStats, &all_zeroes, sizeof(PgStat_MsgWal)) == 0)
-		return false;
+	if (pgWalUsage.wal_records == prevWalUsage.wal_records &&
+		walStats.wal_write == 0 && walStats.wal_sync == 0)
+		return;
 
 	if (!force)
 	{
@@ -3115,25 +3100,50 @@ pgstat_send_wal(bool force)
 
 		/*
 		 * Don't send a message unless it's been at least PGSTAT_STAT_INTERVAL
-		 * msec since we last sent one.
+		 * msec since we last sent one to avoid overloading the stats
+		 * collector.
 		 */
 		if (!TimestampDifferenceExceeds(sendTime, now, PGSTAT_STAT_INTERVAL))
-			return false;
+			return;
 		sendTime = now;
 	}
 
+	/*
+	 * set the counters related to generated WAL data if the counters are
+	 * updated.
+	 */
+	if (pgWalUsage.wal_records != prevWalUsage.wal_records)
+	{
+		WalUsage	walusage;
+
+		/*
+		 * Calculate how much WAL usage counters are increased by substracting
+		 * the previous counters from the current ones. Fill the results in
+		 * WAL stats message.
+		 */
+		MemSet(&walusage, 0, sizeof(WalUsage));
+		WalUsageAccumDiff(&walusage, &pgWalUsage, &prevWalUsage);
+
+		WalStats.m_wal_records = walusage.wal_records;
+		WalStats.m_wal_fpi = walusage.wal_fpi;
+		WalStats.m_wal_bytes = walusage.wal_bytes;
+	}
+
 	/*
 	 * Prepare and send the message
 	 */
 	pgstat_setheader(&WalStats.m_hdr, PGSTAT_MTYPE_WAL);
 	pgstat_send(&WalStats, sizeof(WalStats));
 
+	/*
+	 * Save the current counters for the subsequent calculation of WAL usage.
+	 */
+	prevWalUsage = pgWalUsage;
+
 	/*
 	 * Clear out the statistics buffer, so it can be re-used.
 	 */
 	MemSet(&WalStats, 0, sizeof(WalStats));
-
-	return true;
 }
 
 /* ----------
diff --git a/src/include/executor/instrument.h b/src/include/executor/instrument.h
index aa8eceda5f..9de29e51d6 100644
--- a/src/include/executor/instrument.h
+++ b/src/include/executor/instrument.h
@@ -16,26 +16,41 @@
 #include "portability/instr_time.h"
 
 
+/*
+ * The accumulated counters for buffer usage.
+ *
+ * The reason the counters are accumulated values is that there may be many
+ * callers to use them, so to avoid unexpected reset.
+ */
 typedef struct BufferUsage
 {
-	long		shared_blks_hit;	/* # of shared buffer hits */
-	long		shared_blks_read;	/* # of shared disk blocks read */
-	long		shared_blks_dirtied;	/* # of shared blocks dirtied */
-	long		shared_blks_written;	/* # of shared disk blocks written */
-	long		local_blks_hit; /* # of local buffer hits */
-	long		local_blks_read;	/* # of local disk blocks read */
-	long		local_blks_dirtied; /* # of local blocks dirtied */
-	long		local_blks_written; /* # of local disk blocks written */
-	long		temp_blks_read; /* # of temp blocks read */
-	long		temp_blks_written;	/* # of temp blocks written */
+	uint64		shared_blks_hit;	/* # of shared buffer hits */
+	uint64		shared_blks_read;	/* # of shared disk blocks read */
+	uint64		shared_blks_dirtied;	/* # of shared blocks dirtied */
+	uint64		shared_blks_written;	/* # of shared disk blocks written */
+	uint64		local_blks_hit; /* # of local buffer hits */
+	uint64		local_blks_read;	/* # of local disk blocks read */
+	uint64		local_blks_dirtied; /* # of local blocks dirtied */
+	uint64		local_blks_written; /* # of local disk blocks written */
+	uint64		temp_blks_read; /* # of temp blocks read */
+	uint64		temp_blks_written;	/* # of temp blocks written */
 	instr_time	blk_read_time;	/* time spent reading */
 	instr_time	blk_write_time; /* time spent writing */
 } BufferUsage;
 
+/*
+ * The accumulated counters for generated WAL usage.
+ *
+ * The reason the counters are accumulated values is the same as BufferUsage's one.
+ * And the reason to store only generated WAL usage and doesn't store WAL I/O
+ * activity, is that this is assumed for knowing the WAL usage in per query or
+ * transaction. So, common resources for the cluster like WAL I/O activity is
+ * not stored.
+ */
 typedef struct WalUsage
 {
-	long		wal_records;	/* # of WAL records produced */
-	long		wal_fpi;		/* # of WAL full page images produced */
+	uint64		wal_records;	/* # of WAL records produced */
+	uint64		wal_fpi;		/* # of WAL full page images produced */
 	uint64		wal_bytes;		/* size of WAL records produced */
 } WalUsage;
 
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index c2f70effbd..d9c9ece9c6 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -503,8 +503,8 @@ typedef struct PgStat_MsgBgWriter
 typedef struct PgStat_MsgWal
 {
 	PgStat_MsgHdr m_hdr;
-	PgStat_Counter m_wal_records;
-	PgStat_Counter m_wal_fpi;
+	uint64		m_wal_records;
+	uint64		m_wal_fpi;
 	uint64		m_wal_bytes;
 	PgStat_Counter m_wal_buffers_full;
 	PgStat_Counter m_wal_write;
@@ -1110,8 +1110,7 @@ extern void pgstat_twophase_postabort(TransactionId xid, uint16 info,
 extern void pgstat_send_archiver(const char *xlog, bool failed);
 extern void pgstat_send_bgwriter(void);
 extern void pgstat_send_recoveryprefetch(PgStat_RecoveryPrefetchStats *stats);
-extern void pgstat_report_wal(void);
-extern bool pgstat_send_wal(bool force);
+extern void pgstat_send_wal(bool force);
 
 /* ----------
  * Support functions for the SQL-callable functions to

Reply via email to