On 2020-10-06 15:57, Masahiro Ikeda wrote:
Hi,

I think it's better to add other WAL statistics to the pg_stat_wal view. I'm thinking to add the following statistics. Please let me know your thoughts.

1.  Basic wal statistics

* wal_records: Total number of WAL records generated
* wal_fpi: Total number of WAL full page images generated
* wal_bytes: Total amount of WAL bytes generated

To understand DB's performance, first, we will check the performance
trends for the entire database instance.
For example, if the number of wal_fpi becomes higher, users may tune
"wal_compression", "checkpoint_timeout" and so on.

Although users can check the above statistics via EXPLAIN,
auto_explain, autovacuum
and pg_stat_statements now, if users want to see the performance
trends for the entire database,
they must preprocess the statistics.

Is it useful to add the sum of the above statistics to the pg_stat_wal view?


2.  Number of when new WAL file is created and zero-filled.

As Fujii-san already commented, I think it's good for tuning.

Just idea; it may be worth exposing the number of when new WAL file is created and zero-filled. This initialization may have impact on the performance of write-heavy workload generating lots of WAL. If this number is reported high, to reduce the number of this initialization, we can tune WAL-related parameters so that more "recycled" WAL files can be hold.


3.  Number of when to switch the WAL logfile segment.

This is similar to 2, but this counts the number of when WAL file is
recylcled too.
I think it's useful for tuning "wal_segment_size"
if the number is high relative to the startup time, "wal_segment_size"
must be bigger.


4. Number of when WAL is flushed

I think it's useful for tuning "synchronous_commit" and "commit_delay"
for query executions.
If the number of WAL is flushed is high, users can know
"synchronous_commit" is useful for the workload.

Also, it's useful for tuning "wal_writer_delay" and
"wal_writer_flush_after" for wal writer.
If the number is high, users can change the parameter for performance.

I think it's better to separate this for backends and wal writer.


5.  Wait time when WAL is flushed.

This is the accumulated time when wal is flushed.
If the time becomes much higher, users can detect the possibility of
disk failure.

Since users can see how much flash time occupies of the query execution time,
it may lead to query tuning and so on.

Since there is the above reason, I think it's better to separate this
for backends and wal writer.

I made a patch for collecting the above statistics.
If you have any comments, please let me know.

I think it's better to separate some statistics for backend and backgrounds because tuning target parameters like "synchronous_commit", "wal_writer_delay" and so on are different.
But first, I want to get a consensus to collect them.

Best regards,
--
Masahiro Ikeda
NTT DATA CORPORATION
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 6656676..31a37de 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3406,12 +3406,95 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
     </thead>
 
     <tbody>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>wal_records</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Total number of WAL records generated
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>wal_fpi</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Total number of WAL full page images generated
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>wal_bytes</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Total amount of WAL bytes generated
+      </para></entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>wal_buffers_full</structfield> <type>bigint</type>
       </para>
       <para>
-       Number of times WAL data was written to the disk because WAL buffers got full
+       Total number of WAL data written to the disk because WAL buffers got full
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>wal_init</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Total number of WAL file segment created or opened a pre-existing one
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>wal_init_zero</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Total number of WAL file segment created to be filled with zeroes
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>wal_write</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Total number of WAL data written to the disk
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>wal_write_time</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Total amount of time that has been spent in the portion of
+       WAL data was written to disk, in milliseconds
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>wal_sync</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Total number of WAL data synced to the disk
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>wal_sync_time</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Total amount of time that has been spent in the portion of
+       WAL data was synced to disk, in milliseconds
       </para></entry>
      </row>
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 52a67b1..840e684 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -1262,6 +1262,10 @@ XLogInsertRecord(XLogRecData *rdata,
 		pgWalUsage.wal_bytes += rechdr->xl_tot_len;
 		pgWalUsage.wal_records++;
 		pgWalUsage.wal_fpi += num_fpi;
+
+		WalStats.m_wal_bytes += rechdr->xl_tot_len;
+		WalStats.m_wal_records++;
+		WalStats.m_wal_fpi += num_fpi;
 	}
 
 	return EndPos;
@@ -2422,6 +2426,10 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible)
 	int			npages;
 	int			startidx;
 	uint32		startoffset;
+	TimestampTz start_time;
+	TimestampTz end_time;
+	long		secs;
+	int			usecs;
 
 	/* We should always be inside a critical section here */
 	Assert(CritSectionCount > 0);
@@ -2535,9 +2543,11 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible)
 			do
 			{
 				errno = 0;
+				start_time = GetCurrentTimestamp();
 				pgstat_report_wait_start(WAIT_EVENT_WAL_WRITE);
 				written = pg_pwrite(openLogFile, from, nleft, startoffset);
 				pgstat_report_wait_end();
+				end_time = GetCurrentTimestamp();
 				if (written <= 0)
 				{
 					char		xlogfname[MAXFNAMELEN];
@@ -2559,6 +2569,13 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible)
 				nleft -= written;
 				from += written;
 				startoffset += written;
+
+				/* Accumulate write time in milliseconds. */
+				TimestampDifference(start_time,
+									end_time,
+									&secs, &usecs);
+				WalStats.m_wal_write_time += ((int) secs * 1000) + (usecs / 1000);
+				WalStats.m_wal_write++;
 			} while (nleft > 0);
 
 			npages = 0;
@@ -2578,7 +2595,16 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible)
 			 */
 			if (finishing_seg)
 			{
+				start_time = GetCurrentTimestamp();
 				issue_xlog_fsync(openLogFile, openLogSegNo);
+				end_time = GetCurrentTimestamp();
+
+				/* Accumulate write time in milliseconds. */
+				TimestampDifference(start_time,
+									end_time,
+									&secs, &usecs);
+				WalStats.m_wal_sync_time += ((int) secs * 1000) + (usecs / 1000);
+				WalStats.m_wal_sync++;
 
 				/* signal that we need to wakeup walsenders later */
 				WalSndWakeupRequest();
@@ -2649,7 +2675,16 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible)
 				ReserveExternalFD();
 			}
 
+			start_time = GetCurrentTimestamp();
 			issue_xlog_fsync(openLogFile, openLogSegNo);
+			end_time = GetCurrentTimestamp();
+
+			/* Accumulate write time in milliseconds. */
+			TimestampDifference(start_time,
+								end_time,
+								&secs, &usecs);
+			WalStats.m_wal_sync_time += ((int) secs * 1000) + (usecs / 1000);
+			WalStats.m_wal_sync++;
 		}
 
 		/* signal that we need to wakeup walsenders later */
@@ -3278,7 +3313,10 @@ XLogFileInit(XLogSegNo logsegno, bool *use_existent, bool use_lock)
 						 errmsg("could not open file \"%s\": %m", path)));
 		}
 		else
+		{
+			WalStats.m_wal_init++;
 			return fd;
+		}
 	}
 
 	/*
@@ -3325,6 +3363,7 @@ XLogFileInit(XLogSegNo logsegno, bool *use_existent, bool use_lock)
 				break;
 			}
 		}
+		WalStats.m_wal_init_zero++;
 	}
 	else
 	{
@@ -3418,6 +3457,7 @@ XLogFileInit(XLogSegNo logsegno, bool *use_existent, bool use_lock)
 				 errmsg("could not open file \"%s\": %m", path)));
 
 	elog(DEBUG2, "done creating and filling new WAL file");
+	WalStats.m_wal_init++;
 
 	return fd;
 }
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index c293907..b880533 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -990,7 +990,16 @@ CREATE VIEW pg_stat_bgwriter AS
 
 CREATE VIEW pg_stat_wal AS
     SELECT
+        w.wal_records,
+        w.wal_fpi,
+        w.wal_bytes,
         w.wal_buffers_full,
+        w.wal_init,
+        w.wal_init_zero,
+        w.wal_write,
+        w.wal_write_time,
+        w.wal_sync,
+        w.wal_sync_time,
         w.stats_reset
     FROM pg_stat_get_wal() w;
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 822f0eb..2cf85b6 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -6756,7 +6756,16 @@ pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, int len)
 static void
 pgstat_recv_wal(PgStat_MsgWal *msg, int len)
 {
+	walStats.wal_records += msg->m_wal_records;
+	walStats.wal_fpi += msg->m_wal_fpi;
+	walStats.wal_bytes += msg->m_wal_bytes;
 	walStats.wal_buffers_full += msg->m_wal_buffers_full;
+	walStats.wal_init += msg->m_wal_init;
+	walStats.wal_init_zero += msg->m_wal_init_zero;
+	walStats.wal_write += msg->m_wal_write;
+	walStats.wal_write_time += msg->m_wal_write_time;
+	walStats.wal_sync += msg->m_wal_sync;
+	walStats.wal_sync_time += msg->m_wal_sync_time;
 }
 
 /* ----------
diff --git a/src/backend/postmaster/walwriter.c b/src/backend/postmaster/walwriter.c
index a52832f..ce9f4b7 100644
--- a/src/backend/postmaster/walwriter.c
+++ b/src/backend/postmaster/walwriter.c
@@ -253,6 +253,9 @@ WalWriterMain(void)
 		else if (left_till_hibernate > 0)
 			left_till_hibernate--;
 
+		/* Send WAL statistics */
+		pgstat_send_wal();
+
 		/*
 		 * Sleep until we are signaled or WalWriterDelay has elapsed.  If we
 		 * haven't done anything useful for quite some time, lengthen the
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 0d0d2e6..8ed25d7 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1703,7 +1703,7 @@ pg_stat_get_buf_alloc(PG_FUNCTION_ARGS)
 Datum
 pg_stat_get_wal(PG_FUNCTION_ARGS)
 {
-#define PG_STAT_GET_WAL_COLS	2
+#define PG_STAT_GET_WAL_COLS	11
 	TupleDesc	tupdesc;
 	Datum		values[PG_STAT_GET_WAL_COLS];
 	bool		nulls[PG_STAT_GET_WAL_COLS];
@@ -1715,9 +1715,27 @@ pg_stat_get_wal(PG_FUNCTION_ARGS)
 
 	/* Initialise attributes information in the tuple descriptor */
 	tupdesc = CreateTemplateTupleDesc(PG_STAT_GET_WAL_COLS);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "wal_buffers_full",
+	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "wal_records",
 					   INT8OID, -1, 0);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 2, "stats_reset",
+	TupleDescInitEntry(tupdesc, (AttrNumber) 2, "wal_fpi",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 3, "wal_bytes",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 4, "wal_buffers_full",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 5, "wal_init",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 6, "wal_init_zero",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 7, "wal_write",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 8, "wal_write_time",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 9, "wal_sync",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 10, "wal_sync_time",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 11, "stats_reset",
 					   TIMESTAMPTZOID, -1, 0);
 
 	BlessTupleDesc(tupdesc);
@@ -1726,8 +1744,17 @@ pg_stat_get_wal(PG_FUNCTION_ARGS)
 	wal_stats = pgstat_fetch_stat_wal();
 
 	/* Fill values and NULLs */
-	values[0] = Int64GetDatum(wal_stats->wal_buffers_full);
-	values[1] = TimestampTzGetDatum(wal_stats->stat_reset_timestamp);
+	values[0] = Int64GetDatum(wal_stats->wal_records);
+	values[1] = Int64GetDatum(wal_stats->wal_fpi);
+	values[2] = Int64GetDatum(wal_stats->wal_bytes);
+	values[3] = Int64GetDatum(wal_stats->wal_buffers_full);
+	values[4] = Int64GetDatum(wal_stats->wal_init);
+	values[5] = Int64GetDatum(wal_stats->wal_init_zero);
+	values[6] = Int64GetDatum(wal_stats->wal_write);
+	values[7] = Int64GetDatum(wal_stats->wal_write_time);
+	values[8] = Int64GetDatum(wal_stats->wal_sync);
+	values[9] = Int64GetDatum(wal_stats->wal_sync_time);
+	values[10] = TimestampTzGetDatum(wal_stats->stat_reset_timestamp);
 
 	/* Returns the record as Datum */
 	PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 22340ba..ee55eb1 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5492,9 +5492,9 @@
 { oid => '1136', descr => 'statistics: information about WAL activity',
   proname => 'pg_stat_get_wal', proisstrict => 'f', provolatile => 's',
   proparallel => 'r', prorettype => 'record', proargtypes => '',
-  proallargtypes => '{int8,timestamptz}',
-  proargmodes => '{o,o}',
-  proargnames => '{wal_buffers_full,stats_reset}',
+  proallargtypes => '{int8,int8,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz}',
+  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{wal_records,wal_fpi,wal_bytes,wal_buffers_full,wal_init,wal_init_zero,wal_write,wal_write_time,wal_sync,wal_sync_time,stats_reset}',
   prosrc => 'pg_stat_get_wal' },
 
 { oid => '2306', descr => 'statistics: information about SLRU caches',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index a821ff4..b35ae99 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -459,7 +459,16 @@ typedef struct PgStat_MsgBgWriter
 typedef struct PgStat_MsgWal
 {
 	PgStat_MsgHdr m_hdr;
+	PgStat_Counter m_wal_records;
+	PgStat_Counter m_wal_fpi;
+	PgStat_Counter m_wal_bytes;
 	PgStat_Counter m_wal_buffers_full;
+	PgStat_Counter m_wal_init;
+	PgStat_Counter m_wal_init_zero;
+	PgStat_Counter m_wal_write;
+	PgStat_Counter m_wal_write_time;	/* accumulate times in milliseconds */
+	PgStat_Counter m_wal_sync;
+	PgStat_Counter m_wal_sync_time; /* accumulate times in milliseconds */
 } PgStat_MsgWal;
 
 /* ----------
@@ -795,7 +804,22 @@ typedef struct PgStat_GlobalStats
  */
 typedef struct PgStat_WalStats
 {
+	PgStat_Counter wal_records;
+	PgStat_Counter wal_fpi;
+	PgStat_Counter wal_bytes;
 	PgStat_Counter wal_buffers_full;
+	PgStat_Counter wal_init;
+	PgStat_Counter wal_init_zero;
+
+	/*
+	 * TODO: Is it better to separate following metrics
+	 * for backends and background processes?
+	 */
+	PgStat_Counter wal_write;
+	PgStat_Counter wal_write_time;
+	PgStat_Counter wal_sync;
+	PgStat_Counter wal_sync_time;
+
 	TimestampTz stat_reset_timestamp;
 } PgStat_WalStats;
 
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index cf2a9b4..b15f04c 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2135,9 +2135,18 @@ pg_stat_user_tables| SELECT pg_stat_all_tables.relid,
     pg_stat_all_tables.autoanalyze_count
    FROM pg_stat_all_tables
   WHERE ((pg_stat_all_tables.schemaname <> ALL (ARRAY['pg_catalog'::name, 'information_schema'::name])) AND (pg_stat_all_tables.schemaname !~ '^pg_toast'::text));
-pg_stat_wal| SELECT w.wal_buffers_full,
+pg_stat_wal| SELECT w.wal_records,
+    w.wal_fpi,
+    w.wal_bytes,
+    w.wal_buffers_full,
+    w.wal_init,
+    w.wal_init_zero,
+    w.wal_write,
+    w.wal_write_time,
+    w.wal_sync,
+    w.wal_sync_time,
     w.stats_reset
-   FROM pg_stat_get_wal() w(wal_buffers_full, stats_reset);
+   FROM pg_stat_get_wal() w(wal_records, wal_fpi, wal_bytes, wal_buffers_full, wal_init, wal_init_zero, wal_write, wal_write_time, wal_sync, wal_sync_time, stats_reset);
 pg_stat_wal_receiver| SELECT s.pid,
     s.status,
     s.receive_start_lsn,

Reply via email to