I update the patch since there were my misunderstanding points.

On 2021/03/26 16:20, Masahiro Ikeda wrote:
> Thanks for many your suggestions!
> I made the patch to handle the issues.
> 
>> 1) What is the motivation to have both prevWalUsage and pgWalUsage,
>>    instead of just accumulating the stats since the last submission?
>>    There doesn't seem to be any comment explaining it? Computing
>>    differences to previous values, and copying to prev*, isn't free. I
>>    assume this is out of a fear that the stats could get reset before
>>    they're used for instrument.c purposes - but who knows?
> 
> I removed the unnecessary code copying pgWalUsage and just reset the
> pgWalUsage after reporting the stats in pgstat_report_wal().

I didn't change this.

>> 2) Why is there both pgstat_send_wal() and pgstat_report_wal()? With the
>>    former being used by wal writer, the latter by most other processes?
>>    There again don't seem to be comments explaining this.
> 
> I added the comments why two functions are separated.
> (But is it better to merge them?)

I updated the comments for following reasons.


>> 3) Doing if (memcmp(&WalStats, &all_zeroes, sizeof(PgStat_MsgWal)) == 0)
>>    just to figure out if there's been any changes isn't all that
>>    cheap. This is regularly exercised in read-only workloads too. Seems
>>    adding a boolean WalStats.have_pending = true or such would be
>>    better.
>> 4) For plain backends pgstat_report_wal() is called by
>>    pgstat_report_stat() - but it is not checked as part of the "Don't
>>    expend a clock check if nothing to do" check at the top.  It's
>>    probably rare to have pending wal stats without also passing one of
>>    the other conditions, but ...
> 
> I added the logic to check if the stats counters are updated or not in
> pgstat_report_stat() using not only generated wal record but also write/sync
> counters, and it can skip to call reporting function.

I removed the checking code whether the wal stats counters were updated or not
in pgstat_report_stat() since I couldn't understand the valid case yet.
pgstat_report_stat() is called by backends when the transaction is finished.
This means that the condition seems always pass.

I thought to implement if the WAL stats counters were not updated, skip to
send all statistics including the counters for databases and so on. But I
think it's not good because it may take more time to be reflected the
generated stats by read-only transaction.


> Although I added the condition which the write/sync counters are updated or
> not, I haven't understood the following case yet...Sorry. I checked related
> code and tested to insert large object, but I couldn't. I'll investigate more
> deeply, but if you already know the function which leads the following case,
> please let me know.

I understood the above case (Fujii-san, thanks for your advice in person).
When to flush buffers, for example, to select buffer replacement victim,
it requires a WAL flush if the buffer is dirty. So, to check the WAL stats
counters are updated or not, I check the number of generated wal record and
the counter of syncing in pgstat_report_wal().

The reason why not to check the counter of writing is that if to write is
happened, to sync is happened too in the above case. I added the comments in
the patch.

Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION
diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c
index 237e13361b..095d8d0970 100644
--- a/src/backend/executor/instrument.c
+++ b/src/backend/executor/instrument.c
@@ -19,6 +19,17 @@
 
 BufferUsage pgBufferUsage;
 static BufferUsage save_pgBufferUsage;
+
+/*
+ * generated WAL usage counters.
+ *
+ * Be careful that the counters are cleared after reporting them to
+ * the stats collector although you can use WalUsageAccumDiff()
+ * to computing differences to previous values. For backends,
+ * the counters may be reset after a transaction is finished and
+ * pgstat_report_wal() is invoked, so you can compute the difference
+ * in the same transaction only.
+ */
 WalUsage	pgWalUsage;
 static WalUsage save_pgWalUsage;
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 60f45ccc4e..1ffedb2a24 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -144,14 +144,6 @@ char	   *pgstat_stat_tmpname = NULL;
 PgStat_MsgBgWriter BgWriterStats;
 PgStat_MsgWal WalStats;
 
-/*
- * WAL usage counters saved from pgWALUsage at the previous call to
- * pgstat_report_wal(). This is used to calculate how much WAL usage
- * happens between pgstat_report_wal() calls, by substracting
- * the previous counters from the current ones.
- */
-static WalUsage prevWalUsage;
-
 /*
  * List of SLRU names that we keep stats for.  There is no central registry of
  * SLRUs, so we use this fixed list instead.  The "other" entry is used for
@@ -3117,13 +3109,6 @@ pgstat_initialize(void)
 		MyBEEntry = &BackendStatusArray[MaxBackends + MyAuxProcType];
 	}
 
-	/*
-	 * Initialize prevWalUsage with pgWalUsage so that pgstat_report_wal() can
-	 * calculate how much pgWalUsage counters are increased by substracting
-	 * prevWalUsage from pgWalUsage.
-	 */
-	prevWalUsage = pgWalUsage;
-
 	/* Set up a process-exit hook to clean up */
 	on_shmem_exit(pgstat_beshutdown_hook, 0);
 }
@@ -4674,8 +4659,7 @@ pgstat_send_bgwriter(void)
 /* ----------
  * pgstat_report_wal() -
  *
- * Calculate how much WAL usage counters are increased and send
- * WAL statistics to the collector.
+ * Send WAL statistics to the collector and clear the counters.
  *
  * Must be called by processes that generate WAL.
  * ----------
@@ -4683,19 +4667,36 @@ pgstat_send_bgwriter(void)
 void
 pgstat_report_wal(void)
 {
-	WalUsage	walusage;
-
 	/*
-	 * Calculate how much WAL usage counters are increased by substracting the
-	 * previous counters from the current ones. Fill the results in WAL stats
-	 * message.
+	 * Skip if the WAL statistics counters were not updated.
+	 *
+	 * We can know whether the counters were updated to check only two
+	 * records. So, for performance, we don't allocate another memory spaces
+	 * and check the all stats like pgstat_send_slru().
+	 *
+	 * It's not enough to check only the generated wal records, because there
+	 * is a case that a backend generates no wal records, but just writes and
+	 * syncs the wal data. When to flush buffers, for example, to select
+	 * buffer replacement victim, it requires a WAL flush if the buffer is
+	 * dirty. This happens even if executing read-only transaction.
+	 *
+	 * In this case, if to write is happend, to sync is happend too. although
+	 * there is a case to write is not happend but to sync is happened. For
+	 * example, when to flush buffers just after another process only write
+	 * and don't sync because the wal buffer is no space.
+	 *
+	 * So, it's ok to check the generated wal records and the counter of
+	 * syncing only.
 	 */
-	MemSet(&walusage, 0, sizeof(WalUsage));
-	WalUsageAccumDiff(&walusage, &pgWalUsage, &prevWalUsage);
+	if (pgWalUsage.wal_records == 0 && WalStats.m_wal_sync == 0)
+		return;
 
-	WalStats.m_wal_records = walusage.wal_records;
-	WalStats.m_wal_fpi = walusage.wal_fpi;
-	WalStats.m_wal_bytes = walusage.wal_bytes;
+	/*
+	 * set the counters related to generated WAL data.
+	 */
+	WalStats.m_wal_records = pgWalUsage.wal_records;
+	WalStats.m_wal_fpi = pgWalUsage.wal_fpi;
+	WalStats.m_wal_bytes = pgWalUsage.wal_bytes;
 
 	/*
 	 * Send WAL stats message to the collector.
@@ -4704,18 +4705,26 @@ pgstat_report_wal(void)
 		return;
 
 	/*
-	 * Save the current counters for the subsequent calculation of WAL usage.
+	 * Clear out the statistics buffer for generated WAL data, so it can be
+	 * re-used.
+	 *
+	 * It's ok to clear out because no one takes difference crossing
+	 * pgstat_report_wal() calls although these counters are used in another
+	 * places, for example in pg_stat_statements.c.
 	 */
-	prevWalUsage = pgWalUsage;
+	MemSet(&pgWalUsage, 0, sizeof(WalUsage));
 }
 
 /* ----------
  * pgstat_send_wal() -
  *
- *	Send WAL statistics to the collector.
+ * Send WAL statistics to the collector.
  *
- * If 'force' is not set, WAL stats message is only sent if enough time has
- * passed since last one was sent to reach PGSTAT_STAT_INTERVAL.
+ * If the processes that generate WAL data must call pgstat_report_wal() instead.
+ *
+ * If 'force' is not set, the WAL stats message is only sent if the statistics
+ * counters except for the counters related to generated WAL data were updated
+ * and enough time has passed since last one was sent to reach PGSTAT_STAT_INTERVAL.
  *
  * Return true if the message is sent, and false otherwise.
  * ----------
@@ -4723,26 +4732,32 @@ pgstat_report_wal(void)
 bool
 pgstat_send_wal(bool force)
 {
-	/* We assume this initializes to zeroes */
-	static const PgStat_MsgWal all_zeroes;
 	static TimestampTz sendTime = 0;
 
-	/*
-	 * This function can be called even if nothing at all has happened. In
-	 * this case, avoid sending a completely empty message to the stats
-	 * collector.
-	 */
-	if (memcmp(&WalStats, &all_zeroes, sizeof(PgStat_MsgWal)) == 0)
-		return false;
-
 	if (!force)
 	{
-		TimestampTz now = GetCurrentTimestamp();
+		TimestampTz now;
+
+		/*
+		 * Don't expend a clock check if the WAL statistics counters were not
+		 * updated.
+		 *
+		 * We don't consider the WAL stats related to the generated WAL data
+		 * because pgstat_report_wal() is called instead in the case.
+		 *
+		 * We can know whether the counters were updated to check only two
+		 * records. So, for performance, we don't allocate another memory
+		 * spaces and check the all stats like pgstat_send_slru().
+		 */
+		if (walStats.wal_write == 0 && walStats.wal_sync == 0)
+			return false;
 
 		/*
 		 * Don't send a message unless it's been at least PGSTAT_STAT_INTERVAL
-		 * msec since we last sent one.
+		 * msec since we last sent one to avoid overloading the stats
+		 * collector.
 		 */
+		now = GetCurrentTimestamp();
 		if (!TimestampDifferenceExceeds(sendTime, now, PGSTAT_STAT_INTERVAL))
 			return false;
 		sendTime = now;

Reply via email to