On 2020-10-15 19:49, Fujii Masao wrote:
On 2020/10/13 11:57, Masahiro Ikeda wrote:
On 2020-10-06 15:57, Masahiro Ikeda wrote:
2. Number of when new WAL file is created and zero-filled.
As Fujii-san already commented, I think it's good for tuning.
Just idea; it may be worth exposing the number of when new WAL file
is created and zero-filled. This initialization may have impact on
the performance of write-heavy workload generating lots of WAL. If
this number is reported high, to reduce the number of this
initialization, we can tune WAL-related parameters so that more
"recycled" WAL files can be hold.
+1
But it might be better to track the number of when new WAL file is
created whether it's zero-filled or not, if file creation and sync
itself
takes time.
OK, I changed to track the number of when new WAL file is created.
3. Number of when to switch the WAL logfile segment.
This is similar to 2, but this counts the number of when WAL file is
recylcled too.
I think it's useful for tuning "wal_segment_size"
if the number is high relative to the startup time,
"wal_segment_size"
must be bigger.
You're thinking to count all the WAL file switch? That number is equal
to the number of WAL files generated since the last reset of
pg_stat_wal?
Yes. I think it might be better to count it because I think the ratio in
which a new WAL file is created is important.
To calculate it, we need the count all the WAL file switch.
4. Number of when WAL is flushed
I think it's useful for tuning "synchronous_commit" and
"commit_delay"
for query executions.
If the number of WAL is flushed is high, users can know
"synchronous_commit" is useful for the workload.
Also, it's useful for tuning "wal_writer_delay" and
"wal_writer_flush_after" for wal writer.
If the number is high, users can change the parameter for
performance.
I think it's better to separate this for backends and wal writer.
+1
Thanks, I separated the statistics for backends and wal writer.
When checkpointer process flushes the WAL, the statistics for backends
are counted now.
Although I think its impact is not big, is it better to make statistics
for checkpointer?
5. Wait time when WAL is flushed.
This is the accumulated time when wal is flushed.
If the time becomes much higher, users can detect the possibility of
disk failure.
This should be tracked, e.g., only when track_io_timing is enabled?
Otherwise, tracking that may cause performance overhead.
OK, I changed the implementation.
Since users can see how much flash time occupies of the query
execution time,
it may lead to query tuning and so on.
Since there is the above reason, I think it's better to separate this
for backends and wal writer.
I'm afraid that this counter for a backend may be a bit confusing.
Because
when the counter indicates small time, we may think that walwriter
almost
write WAL data and a backend doesn't take time to write WAL. But a
backend
may be just waiting for walwriter to write WAL.
Thanks for your comments. I agreed.
Now, the following is the view implemented in the attached patch.
If you have any other comments, please let me know.
```
postgres=# SELECT * FROM pg_stat_wal;
-[ RECORD 1 ]-------+------------------------------
wal_records | 1000128 # Total number of WAL records
generated
wal_fpi | 1 # Total number of WAL full page
images generated
wal_bytes | 124013682 #Total amount of WAL bytes generated
wal_buffers_full | 7952 #Total number of WAL data written to
the disk because WAL buffers got full
wal_file | 14 #Total number of WAL file segment created or
opened a pre-existing one
wal_init_file | 7 #Total number of WAL file segment created
wal_write_backend | 7956 #Total number of WAL data written to the
disk by backends
wal_write_walwriter | 27 #Total number of WAL data written to the
disk by walwriter
wal_write_time | 40 # Total amount of time that has been spent
in the portion of WAL data was written to disk by backend and walwriter,
in milliseconds
wal_sync_backend | 1 # Total number of WAL data synced to the disk
by backends
wal_sync_walwriter | 6 #Total number of WAL data synced to the disk
by walwriter
wal_sync_time | 0 # Total amount of time that has been spent in
the portion of WAL data was synced to disk by backend and walwriter, in
milliseconds
stats_reset | 2020-10-16 19:41:01.892272+09
```
Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 6656676..ee70b7b 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3406,12 +3406,115 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
</thead>
<tbody>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>wal_records</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Total number of WAL records generated
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>wal_fpi</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Total number of WAL full page images generated
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>wal_bytes</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Total amount of WAL bytes generated
+ </para></entry>
+ </row>
+
<row>
<entry role="catalog_table_entry"><para role="column_definition">
<structfield>wal_buffers_full</structfield> <type>bigint</type>
</para>
<para>
- Number of times WAL data was written to the disk because WAL buffers got full
+ Total number of WAL data written to the disk because WAL buffers got full
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>wal_file</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Total number of WAL file segment created or opened a pre-existing one
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>wal_init_file</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Total number of WAL file segment created
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>wal_write_backend</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Total number of WAL data written to the disk by backends
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>wal_write_walwriter</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Total number of WAL data written to the disk by walwriter
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>wal_write_time</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Total amount of time that has been spent in the portion of
+ WAL data was written to disk by backend and walwriter, in milliseconds
+ (if <xref linkend="guc-track-io-timing"/> is enabled, otherwise zero)
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>wal_sync_backend</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Total number of WAL data synced to the disk by backends
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>wal_sync_walwriter</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Total number of WAL data synced to the disk by walwriter
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>wal_sync_time</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Total amount of time that has been spent in the portion of
+ WAL data was synced to disk by backend and walwriter, in milliseconds
+ (if <xref linkend="guc-track-io-timing"/> is enabled, otherwise zero)
</para></entry>
</row>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 52a67b1..f7316d2 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -1262,6 +1262,10 @@ XLogInsertRecord(XLogRecData *rdata,
pgWalUsage.wal_bytes += rechdr->xl_tot_len;
pgWalUsage.wal_records++;
pgWalUsage.wal_fpi += num_fpi;
+
+ WalStats.m_wal_bytes += rechdr->xl_tot_len;
+ WalStats.m_wal_records++;
+ WalStats.m_wal_fpi += num_fpi;
}
return EndPos;
@@ -2527,6 +2531,8 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible)
Size nbytes;
Size nleft;
int written;
+ instr_time start;
+ instr_time duration;
/* OK to write the page(s) */
from = XLogCtl->pages + startidx * (Size) XLOG_BLCKSZ;
@@ -2535,9 +2541,28 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible)
do
{
errno = 0;
+ if (track_io_timing)
+ INSTR_TIME_SET_CURRENT(start);
+
pgstat_report_wait_start(WAIT_EVENT_WAL_WRITE);
written = pg_pwrite(openLogFile, from, nleft, startoffset);
pgstat_report_wait_end();
+
+ if (track_io_timing)
+ {
+ INSTR_TIME_SET_CURRENT(duration);
+ INSTR_TIME_SUBTRACT(duration, start);
+ WalStats.m_wal_write_time += INSTR_TIME_GET_MILLISEC(duration);
+ }
+
+ if (AmWalWriterProcess()){
+ WalStats.m_wal_write_walwriter++;
+ }
+ else
+ {
+ WalStats.m_wal_write_backend++;
+ }
+
if (written <= 0)
{
char xlogfname[MAXFNAMELEN];
@@ -3278,7 +3303,10 @@ XLogFileInit(XLogSegNo logsegno, bool *use_existent, bool use_lock)
errmsg("could not open file \"%s\": %m", path)));
}
else
+ {
+ WalStats.m_wal_file++;
return fd;
+ }
}
/*
@@ -3418,6 +3446,8 @@ XLogFileInit(XLogSegNo logsegno, bool *use_existent, bool use_lock)
errmsg("could not open file \"%s\": %m", path)));
elog(DEBUG2, "done creating and filling new WAL file");
+ WalStats.m_wal_init_file++;
+ WalStats.m_wal_file++;
return fd;
}
@@ -10427,8 +10457,13 @@ assign_xlog_sync_method(int new_sync_method, void *extra)
void
issue_xlog_fsync(int fd, XLogSegNo segno)
{
+ instr_time start;
+ instr_time duration;
char *msg = NULL;
+ if (track_io_timing)
+ INSTR_TIME_SET_CURRENT(start);
+
pgstat_report_wait_start(WAIT_EVENT_WAL_SYNC);
switch (sync_method)
{
@@ -10472,6 +10507,21 @@ issue_xlog_fsync(int fd, XLogSegNo segno)
}
pgstat_report_wait_end();
+
+ if (track_io_timing)
+ {
+ INSTR_TIME_SET_CURRENT(duration);
+ INSTR_TIME_SUBTRACT(duration, start);
+ WalStats.m_wal_sync_time += INSTR_TIME_GET_MICROSEC(duration);
+ }
+
+ if (AmWalWriterProcess()){
+ WalStats.m_wal_sync_walwriter++;
+ }
+ else
+ {
+ WalStats.m_wal_sync_backend++;
+ }
}
/*
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index c293907..f0acfa8 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -990,7 +990,18 @@ CREATE VIEW pg_stat_bgwriter AS
CREATE VIEW pg_stat_wal AS
SELECT
+ w.wal_records,
+ w.wal_fpi,
+ w.wal_bytes,
w.wal_buffers_full,
+ w.wal_file,
+ w.wal_init_file,
+ w.wal_write_backend,
+ w.wal_write_walwriter,
+ w.wal_write_time,
+ w.wal_sync_backend,
+ w.wal_sync_walwriter,
+ w.wal_sync_time,
w.stats_reset
FROM pg_stat_get_wal() w;
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 822f0eb..f74b431 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -6756,7 +6756,18 @@ pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, int len)
static void
pgstat_recv_wal(PgStat_MsgWal *msg, int len)
{
+ walStats.wal_records += msg->m_wal_records;
+ walStats.wal_fpi += msg->m_wal_fpi;
+ walStats.wal_bytes += msg->m_wal_bytes;
walStats.wal_buffers_full += msg->m_wal_buffers_full;
+ walStats.wal_file += msg->m_wal_file;
+ walStats.wal_init_file += msg->m_wal_init_file;
+ walStats.wal_write_backend += msg->m_wal_write_backend;
+ walStats.wal_write_walwriter += msg->m_wal_write_walwriter;
+ walStats.wal_write_time += msg->m_wal_write_time;
+ walStats.wal_sync_backend += msg->m_wal_sync_backend;
+ walStats.wal_sync_walwriter += msg->m_wal_sync_walwriter;
+ walStats.wal_sync_time += msg->m_wal_sync_time;
}
/* ----------
diff --git a/src/backend/postmaster/walwriter.c b/src/backend/postmaster/walwriter.c
index a52832f..ce9f4b7 100644
--- a/src/backend/postmaster/walwriter.c
+++ b/src/backend/postmaster/walwriter.c
@@ -253,6 +253,9 @@ WalWriterMain(void)
else if (left_till_hibernate > 0)
left_till_hibernate--;
+ /* Send WAL statistics */
+ pgstat_send_wal();
+
/*
* Sleep until we are signaled or WalWriterDelay has elapsed. If we
* haven't done anything useful for quite some time, lengthen the
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 0d0d2e6..e21f698 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1703,7 +1703,7 @@ pg_stat_get_buf_alloc(PG_FUNCTION_ARGS)
Datum
pg_stat_get_wal(PG_FUNCTION_ARGS)
{
-#define PG_STAT_GET_WAL_COLS 2
+#define PG_STAT_GET_WAL_COLS 13
TupleDesc tupdesc;
Datum values[PG_STAT_GET_WAL_COLS];
bool nulls[PG_STAT_GET_WAL_COLS];
@@ -1715,9 +1715,31 @@ pg_stat_get_wal(PG_FUNCTION_ARGS)
/* Initialise attributes information in the tuple descriptor */
tupdesc = CreateTemplateTupleDesc(PG_STAT_GET_WAL_COLS);
- TupleDescInitEntry(tupdesc, (AttrNumber) 1, "wal_buffers_full",
+ TupleDescInitEntry(tupdesc, (AttrNumber) 1, "wal_records",
INT8OID, -1, 0);
- TupleDescInitEntry(tupdesc, (AttrNumber) 2, "stats_reset",
+ TupleDescInitEntry(tupdesc, (AttrNumber) 2, "wal_fpi",
+ INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 3, "wal_bytes",
+ INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 4, "wal_buffers_full",
+ INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 5, "wal_file",
+ INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 6, "wal_init_file",
+ INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 7, "wal_write_backend",
+ INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 8, "wal_write_walwriter",
+ INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 9, "wal_write_time",
+ INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 10, "wal_sync_backend",
+ INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 11, "wal_sync_walwriter",
+ INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 12, "wal_sync_time",
+ INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 13, "stats_reset",
TIMESTAMPTZOID, -1, 0);
BlessTupleDesc(tupdesc);
@@ -1726,8 +1748,19 @@ pg_stat_get_wal(PG_FUNCTION_ARGS)
wal_stats = pgstat_fetch_stat_wal();
/* Fill values and NULLs */
- values[0] = Int64GetDatum(wal_stats->wal_buffers_full);
- values[1] = TimestampTzGetDatum(wal_stats->stat_reset_timestamp);
+ values[0] = Int64GetDatum(wal_stats->wal_records);
+ values[1] = Int64GetDatum(wal_stats->wal_fpi);
+ values[2] = Int64GetDatum(wal_stats->wal_bytes);
+ values[3] = Int64GetDatum(wal_stats->wal_buffers_full);
+ values[4] = Int64GetDatum(wal_stats->wal_file);
+ values[5] = Int64GetDatum(wal_stats->wal_init_file);
+ values[6] = Int64GetDatum(wal_stats->wal_write_backend);
+ values[7] = Int64GetDatum(wal_stats->wal_write_walwriter);
+ values[8] = Int64GetDatum(wal_stats->wal_write_time);
+ values[9] = Int64GetDatum(wal_stats->wal_sync_backend);
+ values[10] = Int64GetDatum(wal_stats->wal_sync_walwriter);
+ values[11] = Int64GetDatum(wal_stats->wal_sync_time);
+ values[12] = TimestampTzGetDatum(wal_stats->stat_reset_timestamp);
/* Returns the record as Datum */
PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 22340ba..f8c3ccb 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5492,9 +5492,9 @@
{ oid => '1136', descr => 'statistics: information about WAL activity',
proname => 'pg_stat_get_wal', proisstrict => 'f', provolatile => 's',
proparallel => 'r', prorettype => 'record', proargtypes => '',
- proallargtypes => '{int8,timestamptz}',
- proargmodes => '{o,o}',
- proargnames => '{wal_buffers_full,stats_reset}',
+ proallargtypes => '{int8,int8,int8,int8,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz}',
+ proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{wal_records,wal_fpi,wal_bytes,wal_buffers_full,wal_file,wal_init_file,wal_write_backend,wal_write_walwriter,wal_write_time,wal_sync_backend,wal_sync_walwriter,wal_sync_time,stats_reset}',
prosrc => 'pg_stat_get_wal' },
{ oid => '2306', descr => 'statistics: information about SLRU caches',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index a821ff4..25490ed 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -459,7 +459,18 @@ typedef struct PgStat_MsgBgWriter
typedef struct PgStat_MsgWal
{
PgStat_MsgHdr m_hdr;
+ PgStat_Counter m_wal_records;
+ PgStat_Counter m_wal_fpi;
+ PgStat_Counter m_wal_bytes;
PgStat_Counter m_wal_buffers_full;
+ PgStat_Counter m_wal_file;
+ PgStat_Counter m_wal_init_file;
+ PgStat_Counter m_wal_write_backend;
+ PgStat_Counter m_wal_write_walwriter;
+ PgStat_Counter m_wal_write_time; /* accumulate times in milliseconds */
+ PgStat_Counter m_wal_sync_backend;
+ PgStat_Counter m_wal_sync_walwriter;
+ PgStat_Counter m_wal_sync_time; /* accumulate times in milliseconds */
} PgStat_MsgWal;
/* ----------
@@ -795,7 +806,19 @@ typedef struct PgStat_GlobalStats
*/
typedef struct PgStat_WalStats
{
+ PgStat_Counter wal_records;
+ PgStat_Counter wal_fpi;
+ PgStat_Counter wal_bytes;
PgStat_Counter wal_buffers_full;
+ PgStat_Counter wal_file;
+ PgStat_Counter wal_init_file;
+ PgStat_Counter wal_write_backend;
+ PgStat_Counter wal_write_walwriter;
+ PgStat_Counter wal_write_time;
+ PgStat_Counter wal_sync_backend;
+ PgStat_Counter wal_sync_walwriter;
+ PgStat_Counter wal_sync_time;
+
TimestampTz stat_reset_timestamp;
} PgStat_WalStats;
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index cf2a9b4..019fd50 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2135,9 +2135,20 @@ pg_stat_user_tables| SELECT pg_stat_all_tables.relid,
pg_stat_all_tables.autoanalyze_count
FROM pg_stat_all_tables
WHERE ((pg_stat_all_tables.schemaname <> ALL (ARRAY['pg_catalog'::name, 'information_schema'::name])) AND (pg_stat_all_tables.schemaname !~ '^pg_toast'::text));
-pg_stat_wal| SELECT w.wal_buffers_full,
+pg_stat_wal| SELECT w.wal_records,
+ w.wal_fpi,
+ w.wal_bytes,
+ w.wal_buffers_full,
+ w.wal_file,
+ w.wal_init_file,
+ w.wal_write_backend,
+ w.wal_write_walwriter,
+ w.wal_write_time,
+ w.wal_sync_backend,
+ w.wal_sync_walwriter,
+ w.wal_sync_time,
w.stats_reset
- FROM pg_stat_get_wal() w(wal_buffers_full, stats_reset);
+ FROM pg_stat_get_wal() w(wal_records, wal_fpi, wal_bytes, wal_buffers_full, wal_file, wal_init_file, wal_write_backend, wal_write_walwriter, wal_write_time, wal_sync_backend, wal_sync_walwriter, wal_sync_time, stats_reset);
pg_stat_wal_receiver| SELECT s.pid,
s.status,
s.receive_start_lsn,