On 2021-03-03 16:30, Fujii Masao wrote:
On 2021/03/03 14:33, Masahiro Ikeda wrote:
On 2021-02-24 16:14, Fujii Masao wrote:
On 2021/02/15 11:59, Masahiro Ikeda wrote:
On 2021-02-10 00:51, David G. Johnston wrote:
On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda
<ikeda...@oss.nttdata.com> wrote:
I pgindented the patches.
... <function>XLogWrite</function>, which is invoked during an
<function>XLogFlush</function> request (see ...). This is also
incremented by the WAL receiver during replication.
("which normally called" should be "which is normally called" or
"which normally is called" if you want to keep true to the
original)
You missed the adding the space before an opening parenthesis here
and
elsewhere (probably copy-paste)
is ether -> is either
"This parameter is off by default as it will repeatedly query the
operating system..."
", because" -> "as"
Thanks, I fixed them.
wal_write_time and the sync items also need the note: "This is also
incremented by the WAL receiver during replication."
I skipped changing it since I separated the stats for the WAL
receiver
in pg_stat_wal_receiver.
"The number of times it happened..." -> " (the tally of this event
is
reported in wal_buffers_full in....) This is undesirable because
..."
Thanks, I fixed it.
I notice that the patch for WAL receiver doesn't require explicitly
computing the sync statistics but does require computing the write
statistics. This is because of the presence of issue_xlog_fsync
but
absence of an equivalent pg_xlog_pwrite. Additionally, I observe
that
the XLogWrite code path calls pgstat_report_wait_*() while the WAL
receiver path does not. It seems technically straight-forward to
refactor here to avoid the almost-duplicated logic in the two
places,
though I suspect there may be a trade-off for not adding another
function call to the stack given the importance of WAL processing
(though that seems marginalized compared to the cost of actually
writing the WAL). Or, as Fujii noted, go the other way and don't
have
any shared code between the two but instead implement the WAL
receiver
one to use pg_stat_wal_receiver instead. In either case, this
half-and-half implementation seems undesirable.
OK, as Fujii-san mentioned, I separated the WAL receiver stats.
(v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch)
Thanks for updating the patches!
I added the infrastructure code to communicate the WAL receiver
stats messages between the WAL receiver and the stats collector, and
the stats for WAL receiver is counted in pg_stat_wal_receiver.
What do you think?
On second thought, this idea seems not good. Because those stats are
collected between multiple walreceivers, but other values in
pg_stat_wal_receiver is only related to the walreceiver process
running
at that moment. IOW, it seems strange that some values show dynamic
stats and the others show collected stats, even though they are in
the same view pg_stat_wal_receiver. Thought?
OK, I fixed it.
The stats collected in the WAL receiver is exposed in pg_stat_wal view
in v11 patch.
Thanks for updating the patches! I'm now reading 001 patch.
+ /* Check whether the WAL file was synced to disk right now */
+ if (enableFsync &&
+ (sync_method == SYNC_METHOD_FSYNC ||
+ sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH ||
+ sync_method == SYNC_METHOD_FDATASYNC))
+ {
Isn't it better to make issue_xlog_fsync() return immediately
if enableFsync is off, sync_method is open_sync or open_data_sync,
to simplify the code more?
Thanks for the comments.
I added the above code in v12 patch.
+ /*
+ * Send WAL statistics only if WalWriterDelay has elapsed to
minimize
+ * the overhead in WAL-writing.
+ */
+ if (rc & WL_TIMEOUT)
+ pgstat_send_wal();
On second thought, this change means that it always takes
wal_writer_delay
before walwriter's WAL stats is sent after XLogBackgroundFlush() is
called.
For example, if wal_writer_delay is set to several seconds, some values
in
pg_stat_wal would be not up-to-date meaninglessly for those seconds.
So I'm thinking to withdraw my previous comment and it's ok to send
the stats every after XLogBackgroundFlush() is called. Thought?
Thanks, I didn't notice that.
Although PGSTAT_STAT_INTERVAL is 500msec, wal_writer_delay's
default value is 200msec and it may be set shorter time.
Why don't to make another way to check the timestamp?
+ /*
+ * Don't send a message unless it's been at least
PGSTAT_STAT_INTERVAL
+ * msec since we last sent one
+ */
+ now = GetCurrentTimestamp();
+ if (TimestampDifferenceExceeds(last_report, now,
PGSTAT_STAT_INTERVAL))
+ {
+ pgstat_send_wal();
+ last_report = now;
+ }
+
Although I worried that it's better to add the check code in
pgstat_send_wal(),
I didn't do so because to avoid to double check PGSTAT_STAT_INTERVAL.
pgstat_send_wal() is invoked pg_report_stat() and it already checks the
PGSTAT_STAT_INTERVAL.
Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION
From d870cfa78e501097cc56780ebb3140db6b9261e5 Mon Sep 17 00:00:00 2001
From: Masahiro Ikeda <ikeda...@oss.nttdata.com>
Date: Fri, 12 Feb 2021 11:19:59 +0900
Subject: [PATCH 1/2] Add statistics related to write/sync wal records.
This patch adds following statistics to pg_stat_wal view
to track WAL I/O activity by backends and background processes
except WAL receiver.
- the total number of times writing/syncing WAL data.
- the total amount of time spent writing/syncing WAL data.
Since to track I/O timing may leads significant overhead,
GUC parameter "track_wal_io_timing" is introduced.
Only if this is on, the I/O timing is measured.
The statistics related to sync are zero when "wal_sync_method"
is "open_datasync" or "open_sync", because it doesn't call each
sync method.
(This requires a catversion bump, as well as an update to
PGSTAT_FILE_FORMAT_ID)
Author: Masahiro Ikeda
Reviewed-By: Japin Li, Hayato Kuroda, Masahiko Sawada, David
Johnston, Fujii Masao
Discussion:
https://postgr.es/m/0509ad67b585a5b86a83d445dfa75...@oss.nttdata.co
---
doc/src/sgml/config.sgml | 23 +++++++-
doc/src/sgml/monitoring.sgml | 56 ++++++++++++++++++
doc/src/sgml/wal.sgml | 12 +++-
src/backend/access/transam/xlog.c | 59 +++++++++++++++++--
src/backend/catalog/system_views.sql | 4 ++
src/backend/postmaster/pgstat.c | 4 ++
src/backend/postmaster/walwriter.c | 21 +++++++
src/backend/utils/adt/pgstatfuncs.c | 24 +++++++-
src/backend/utils/misc/guc.c | 9 +++
src/backend/utils/misc/postgresql.conf.sample | 1 +
src/include/access/xlog.h | 1 +
src/include/catalog/pg_proc.dat | 6 +-
src/include/pgstat.h | 10 ++++
src/test/regress/expected/rules.out | 6 +-
14 files changed, 221 insertions(+), 15 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index b5718fc136..c232f537b2 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7439,7 +7439,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
<listitem>
<para>
Enables timing of database I/O calls. This parameter is off by
- default, because it will repeatedly query the operating system for
+ default, as it will repeatedly query the operating system for
the current time, which may cause significant overhead on some
platforms. You can use the <xref linkend="pgtesttiming"/> tool to
measure the overhead of timing on your system.
@@ -7453,6 +7453,27 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
</listitem>
</varlistentry>
+ <varlistentry id="guc-track-wal-io-timing" xreflabel="track_wal_io_timing">
+ <term><varname>track_wal_io_timing</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>track_wal_io_timing</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Enables timing of WAL I/O calls. This parameter is off by default,
+ as it will repeatedly query the operating system for
+ the current time, which may cause significant overhead on some
+ platforms. You can use the <xref linkend="pgtesttiming"/> tool to
+ measure the overhead of timing on your system.
+ I/O timing information is
+ displayed in <link linkend="monitoring-pg-stat-wal-view">
+ <structname>pg_stat_wal</structname></link>. Only superusers can
+ change this setting.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-track-functions" xreflabel="track_functions">
<term><varname>track_functions</varname> (<type>enum</type>)
<indexterm>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 3513e127b7..a16be45a71 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3487,6 +3487,62 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
</para></entry>
</row>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>wal_write</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Number of times WAL buffers were written out to disk via
+ <function>XLogWrite</function>, which is invoked during an
+ <function>XLogFlush</function> request (see <xref linkend="wal-configuration"/>)
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>wal_write_time</structfield> <type>double precision</type>
+ </para>
+ <para>
+ Total amount of time spent writing WAL buffers were written out to disk via
+ <function>XLogWrite</function>, which is invoked during an
+ <function>XLogFlush</function> request (see <xref linkend="wal-configuration"/>),
+ excluding sync time unless
+ <xref linkend="guc-wal-sync-method"/> is either <literal>open_datasync</literal> or
+ <literal>open_sync</literal>. Units are in milliseconds with microsecond resolution.
+ This is zero when <xref linkend="guc-track-wal-io-timing"/> is disabled.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>wal_sync</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Number of times WAL files were synced to disk via
+ <function>issue_xlog_fsync</function>, which is invoked during an
+ <function>XLogFlush</function> request (see <xref linkend="wal-configuration"/>)
+ while <xref linkend="guc-wal-sync-method"/> was set to one of the
+ "sync at commit" options (i.e., <literal>fdatasync</literal>,
+ <literal>fsync</literal>, or <literal>fsync_writethrough</literal>).
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>wal_sync_time</structfield> <type>double precision</type>
+ </para>
+ <para>
+ Total amount of time spent syncing WAL files to disk via
+ <function>issue_xlog_fsync</function>, which is invoked during an
+ <function>XLogFlush</function> request (see <xref linkend="wal-configuration"/>)
+ while <xref linkend="guc-wal-sync-method"/> was set to one of the
+ "sync at commit" options (i.e., <literal>fdatasync</literal>,
+ <literal>fsync</literal>, or <literal>fsync_writethrough</literal>).
+ Units are in milliseconds with microsecond resolution.
+ This is zero when <xref linkend="guc-track-wal-io-timing"/> is disabled.
+ </para></entry>
+ </row>
+
<row>
<entry role="catalog_table_entry"><para role="column_definition">
<structfield>stats_reset</structfield> <type>timestamp with time zone</type>
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index f75527f764..06e4b37012 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -663,7 +663,9 @@
the <acronym>WAL</acronym> buffers in shared memory. If there is no
space for the new record, <function>XLogInsertRecord</function> will have
to write (move to kernel cache) a few filled <acronym>WAL</acronym>
- buffers. This is undesirable because <function>XLogInsertRecord</function>
+ buffers (the tally of this event is reported in
+ <literal>wal_buffers_full</literal> in <xref linkend="pg-stat-wal-view"/>).
+ This is undesirable because <function>XLogInsertRecord</function>
is used on every database low level modification (for example, row
insertion) at a time when an exclusive lock is held on affected
data pages, so the operation needs to be as fast as possible. What
@@ -672,8 +674,12 @@
time. Normally, <acronym>WAL</acronym> buffers should be written
and flushed by an <function>XLogFlush</function> request, which is
made, for the most part, at transaction commit time to ensure that
- transaction records are flushed to permanent storage. On systems
- with high log output, <function>XLogFlush</function> requests might
+ transaction records are flushed to permanent storage.
+ <function>XLogFlush</function> calls <function>XLogWrite</function> to write
+ and <function>issue_xlog_fsync</function> to flush them, which are counted as
+ <literal>wal_write</literal> and <literal>wal_sync</literal> in
+ <xref linkend="pg-stat-wal-view"/>. On systems with high log output,
+ <function>XLogFlush</function> requests might
not occur often enough to prevent <function>XLogInsertRecord</function>
from having to do writes. On such systems
one should increase the number of <acronym>WAL</acronym> buffers by
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index fe56324439..fafc1b7fac 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -110,6 +110,7 @@ int CommitDelay = 0; /* precommit delay in microseconds */
int CommitSiblings = 5; /* # concurrent xacts needed to sleep */
int wal_retrieve_retry_interval = 5000;
int max_slot_wal_keep_size_mb = -1;
+bool track_wal_io_timing = false;
#ifdef WAL_DEBUG
bool XLOG_DEBUG = false;
@@ -2533,6 +2534,7 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible)
Size nbytes;
Size nleft;
int written;
+ instr_time start;
/* OK to write the page(s) */
from = XLogCtl->pages + startidx * (Size) XLOG_BLCKSZ;
@@ -2541,9 +2543,30 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible)
do
{
errno = 0;
+
+ /* Measure I/O timing to write WAL data */
+ if (track_wal_io_timing)
+ INSTR_TIME_SET_CURRENT(start);
+
pgstat_report_wait_start(WAIT_EVENT_WAL_WRITE);
written = pg_pwrite(openLogFile, from, nleft, startoffset);
pgstat_report_wait_end();
+
+ /*
+ * Increment the I/O timing and the number of times WAL data
+ * were written out to disk.
+ */
+ if (track_wal_io_timing)
+ {
+ instr_time duration;
+
+ INSTR_TIME_SET_CURRENT(duration);
+ INSTR_TIME_SUBTRACT(duration, start);
+ WalStats.m_wal_write_time += INSTR_TIME_GET_MICROSEC(duration);
+ }
+
+ WalStats.m_wal_write++;
+
if (written <= 0)
{
char xlogfname[MAXFNAMELEN];
@@ -10526,6 +10549,24 @@ void
issue_xlog_fsync(int fd, XLogSegNo segno)
{
char *msg = NULL;
+ instr_time start;
+
+ /*
+ * Check whether the WAL file was synced to disk right now.
+ *
+ * If fsync is disabled, never issue fsync method.
+ *
+ * If sync_mothod is SYNC_METHOD_OPEN or SYNC_METHOD_OPEN_DSYNC, the WAL
+ * file is already synced.
+ */
+ if (!enableFsync ||
+ sync_method == SYNC_METHOD_OPEN ||
+ sync_method == SYNC_METHOD_OPEN_DSYNC)
+ return;
+
+ /* Measure I/O timing to sync the WAL file */
+ if (track_wal_io_timing)
+ INSTR_TIME_SET_CURRENT(start);
pgstat_report_wait_start(WAIT_EVENT_WAL_SYNC);
switch (sync_method)
@@ -10546,10 +10587,6 @@ issue_xlog_fsync(int fd, XLogSegNo segno)
msg = _("could not fdatasync file \"%s\": %m");
break;
#endif
- case SYNC_METHOD_OPEN:
- case SYNC_METHOD_OPEN_DSYNC:
- /* write synced it already */
- break;
default:
elog(PANIC, "unrecognized wal_sync_method: %d", sync_method);
break;
@@ -10570,6 +10607,20 @@ issue_xlog_fsync(int fd, XLogSegNo segno)
}
pgstat_report_wait_end();
+
+ /*
+ * Increment the I/O timing and the number of times WAL files were synced.
+ */
+ if (track_wal_io_timing)
+ {
+ instr_time duration;
+
+ INSTR_TIME_SET_CURRENT(duration);
+ INSTR_TIME_SUBTRACT(duration, start);
+ WalStats.m_wal_sync_time += INSTR_TIME_GET_MICROSEC(duration);
+ }
+
+ WalStats.m_wal_sync++;
}
/*
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index fc94a73a54..ba5158ba57 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1005,6 +1005,10 @@ CREATE VIEW pg_stat_wal AS
w.wal_fpi,
w.wal_bytes,
w.wal_buffers_full,
+ w.wal_write,
+ w.wal_write_time,
+ w.wal_sync,
+ w.wal_sync_time,
w.stats_reset
FROM pg_stat_get_wal() w;
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index f75b52719d..987bbd058d 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -6892,6 +6892,10 @@ pgstat_recv_wal(PgStat_MsgWal *msg, int len)
walStats.wal_fpi += msg->m_wal_fpi;
walStats.wal_bytes += msg->m_wal_bytes;
walStats.wal_buffers_full += msg->m_wal_buffers_full;
+ walStats.wal_write += msg->m_wal_write;
+ walStats.wal_write_time += msg->m_wal_write_time;
+ walStats.wal_sync += msg->m_wal_sync;
+ walStats.wal_sync_time += msg->m_wal_sync_time;
}
/* ----------
diff --git a/src/backend/postmaster/walwriter.c b/src/backend/postmaster/walwriter.c
index 4f1a8e356b..8491e6f6d6 100644
--- a/src/backend/postmaster/walwriter.c
+++ b/src/backend/postmaster/walwriter.c
@@ -78,6 +78,11 @@ int WalWriterFlushAfter = 128;
#define LOOPS_UNTIL_HIBERNATE 50
#define HIBERNATE_FACTOR 25
+/*
+ * Minimum time between stats file updates; in milliseconds.
+ */
+#define PGSTAT_STAT_INTERVAL 500
+
/*
* Main entry point for walwriter process
*
@@ -222,7 +227,12 @@ WalWriterMain(void)
*/
for (;;)
{
+ /* we assume this inits to all zeroes: */
+ static TimestampTz last_report = 0;
+
long cur_timeout;
+ int rc;
+ TimestampTz now;
/*
* Advertise whether we might hibernate in this cycle. We do this
@@ -253,6 +263,17 @@ WalWriterMain(void)
else if (left_till_hibernate > 0)
left_till_hibernate--;
+ /*
+ * Don't send a message unless it's been at least PGSTAT_STAT_INTERVAL
+ * msec since we last sent one
+ */
+ now = GetCurrentTimestamp();
+ if (TimestampDifferenceExceeds(last_report, now, PGSTAT_STAT_INTERVAL))
+ {
+ pgstat_send_wal();
+ last_report = now;
+ }
+
/*
* Sleep until we are signaled or WalWriterDelay has elapsed. If we
* haven't done anything useful for quite some time, lengthen the
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 62bff52638..7296ef04ff 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1799,7 +1799,7 @@ pg_stat_get_buf_alloc(PG_FUNCTION_ARGS)
Datum
pg_stat_get_wal(PG_FUNCTION_ARGS)
{
-#define PG_STAT_GET_WAL_COLS 5
+#define PG_STAT_GET_WAL_COLS 9
TupleDesc tupdesc;
Datum values[PG_STAT_GET_WAL_COLS];
bool nulls[PG_STAT_GET_WAL_COLS];
@@ -1820,7 +1820,15 @@ pg_stat_get_wal(PG_FUNCTION_ARGS)
NUMERICOID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 4, "wal_buffers_full",
INT8OID, -1, 0);
- TupleDescInitEntry(tupdesc, (AttrNumber) 5, "stats_reset",
+ TupleDescInitEntry(tupdesc, (AttrNumber) 5, "wal_write",
+ INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 6, "wal_write_time",
+ FLOAT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 7, "wal_sync",
+ INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 8, "wal_sync_time",
+ FLOAT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 9, "stats_reset",
TIMESTAMPTZOID, -1, 0);
BlessTupleDesc(tupdesc);
@@ -1840,7 +1848,17 @@ pg_stat_get_wal(PG_FUNCTION_ARGS)
Int32GetDatum(-1));
values[3] = Int64GetDatum(wal_stats->wal_buffers_full);
- values[4] = TimestampTzGetDatum(wal_stats->stat_reset_timestamp);
+ values[4] = Int64GetDatum(wal_stats->wal_write);
+
+ /* convert counter from microsec to millisec for display */
+ values[5] = Float8GetDatum((double) wal_stats->wal_write_time / 1000.0);
+
+ values[6] = Int64GetDatum(wal_stats->wal_sync);
+
+ /* convert counter from microsec to millisec for display */
+ values[7] = Float8GetDatum((double) wal_stats->wal_sync_time / 1000.0);
+
+ values[8] = TimestampTzGetDatum(wal_stats->stat_reset_timestamp);
/* Returns the record as Datum */
PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index c5b7f01e1a..958e84a962 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1485,6 +1485,15 @@ static struct config_bool ConfigureNamesBool[] =
false,
NULL, NULL, NULL
},
+ {
+ {"track_wal_io_timing", PGC_SUSET, STATS_COLLECTOR,
+ gettext_noop("Collects timing statistics for WAL I/O activity."),
+ NULL
+ },
+ &track_wal_io_timing,
+ false,
+ NULL, NULL, NULL
+ },
{
{"update_process_title", PGC_SUSET, PROCESS_TITLE,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index ee06528bb0..c6483fa1ff 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -586,6 +586,7 @@
#track_activities = on
#track_counts = on
#track_io_timing = off
+#track_wal_io_timing = off
#track_functions = none # none, pl, all
#track_activity_query_size = 1024 # (change requires restart)
#stats_temp_directory = 'pg_stat_tmp'
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 75ec1073bd..1e53d9d4ca 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -131,6 +131,7 @@ extern int recovery_min_apply_delay;
extern char *PrimaryConnInfo;
extern char *PrimarySlotName;
extern bool wal_receiver_create_temp_slot;
+extern bool track_wal_io_timing;
/* indirectly set via GUC system */
extern TransactionId recoveryTargetXid;
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 447f9ae44d..91ac0479c2 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5543,9 +5543,9 @@
{ oid => '1136', descr => 'statistics: information about WAL activity',
proname => 'pg_stat_get_wal', proisstrict => 'f', provolatile => 's',
proparallel => 'r', prorettype => 'record', proargtypes => '',
- proallargtypes => '{int8,int8,numeric,int8,timestamptz}',
- proargmodes => '{o,o,o,o,o}',
- proargnames => '{wal_records,wal_fpi,wal_bytes,wal_buffers_full,stats_reset}',
+ proallargtypes => '{int8,int8,numeric,int8,int8,float8,int8,float8,timestamptz}',
+ proargmodes => '{o,o,o,o,o,o,o,o,o}',
+ proargnames => '{wal_records,wal_fpi,wal_bytes,wal_buffers_full,wal_write,wal_write_time,wal_sync,wal_sync_time,stats_reset}',
prosrc => 'pg_stat_get_wal' },
{ oid => '2306', descr => 'statistics: information about SLRU caches',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 724068cf87..000bb14c0b 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -474,6 +474,12 @@ typedef struct PgStat_MsgWal
PgStat_Counter m_wal_fpi;
uint64 m_wal_bytes;
PgStat_Counter m_wal_buffers_full;
+ PgStat_Counter m_wal_write;
+ PgStat_Counter m_wal_write_time; /* time spend writing wal records in
+ * micro seconds */
+ PgStat_Counter m_wal_sync;
+ PgStat_Counter m_wal_sync_time; /* time spend syncing wal records in micro
+ * seconds */
} PgStat_MsgWal;
/* ----------
@@ -839,6 +845,10 @@ typedef struct PgStat_WalStats
PgStat_Counter wal_fpi;
uint64 wal_bytes;
PgStat_Counter wal_buffers_full;
+ PgStat_Counter wal_write;
+ PgStat_Counter wal_write_time;
+ PgStat_Counter wal_sync;
+ PgStat_Counter wal_sync_time;
TimestampTz stat_reset_timestamp;
} PgStat_WalStats;
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index b1c9b7bdfe..fce6ab8338 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2160,8 +2160,12 @@ pg_stat_wal| SELECT w.wal_records,
w.wal_fpi,
w.wal_bytes,
w.wal_buffers_full,
+ w.wal_write,
+ w.wal_write_time,
+ w.wal_sync,
+ w.wal_sync_time,
w.stats_reset
- FROM pg_stat_get_wal() w(wal_records, wal_fpi, wal_bytes, wal_buffers_full, stats_reset);
+ FROM pg_stat_get_wal() w(wal_records, wal_fpi, wal_bytes, wal_buffers_full, wal_write, wal_write_time, wal_sync, wal_sync_time, stats_reset);
pg_stat_wal_receiver| SELECT s.pid,
s.status,
s.receive_start_lsn,
--
2.25.1
--- v11-0001-Add-statistics-related-to-write-sync-wal-records.patch 2021-03-03 14:27:02.119713795 +0900
+++ v12-0001-Add-statistics-related-to-write-sync-wal-records.patch 2021-03-03 20:22:47.258763274 +0900
@@ -1,4 +1,4 @@
-From c97fff03cd2bd51b28f6c6fde56c48792683f44e Mon Sep 17 00:00:00 2001
+From d870cfa78e501097cc56780ebb3140db6b9261e5 Mon Sep 17 00:00:00 2001
From: Masahiro Ikeda <ikeda...@oss.nttdata.com>
Date: Fri, 12 Feb 2021 11:19:59 +0900
Subject: [PATCH 1/2] Add statistics related to write/sync wal records.
@@ -30,10 +30,10 @@
doc/src/sgml/config.sgml | 23 +++++++-
doc/src/sgml/monitoring.sgml | 56 ++++++++++++++++++
doc/src/sgml/wal.sgml | 12 +++-
- src/backend/access/transam/xlog.c | 57 +++++++++++++++++++
+ src/backend/access/transam/xlog.c | 59 +++++++++++++++++--
src/backend/catalog/system_views.sql | 4 ++
src/backend/postmaster/pgstat.c | 4 ++
- src/backend/postmaster/walwriter.c | 16 ++++--
+ src/backend/postmaster/walwriter.c | 21 +++++++
src/backend/utils/adt/pgstatfuncs.c | 24 +++++++-
src/backend/utils/misc/guc.c | 9 +++
src/backend/utils/misc/postgresql.conf.sample | 1 +
@@ -41,7 +41,7 @@
src/include/catalog/pg_proc.dat | 6 +-
src/include/pgstat.h | 10 ++++
src/test/regress/expected/rules.out | 6 +-
- 14 files changed, 214 insertions(+), 15 deletions(-)
+ 14 files changed, 221 insertions(+), 15 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index b5718fc136..c232f537b2 100644
@@ -182,7 +182,7 @@
from having to do writes. On such systems
one should increase the number of <acronym>WAL</acronym> buffers by
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
-index fe56324439..98f558b4c7 100644
+index fe56324439..fafc1b7fac 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -110,6 +110,7 @@ int CommitDelay = 0; /* precommit delay in microseconds */
@@ -232,51 +232,60 @@
if (written <= 0)
{
char xlogfname[MAXFNAMELEN];
-@@ -10526,6 +10549,21 @@ void
+@@ -10526,6 +10549,24 @@ void
issue_xlog_fsync(int fd, XLogSegNo segno)
{
char *msg = NULL;
-+ bool issue_fsync = false;
+ instr_time start;
+
-+ /* Check whether the WAL file was synced to disk right now */
-+ if (enableFsync &&
-+ (sync_method == SYNC_METHOD_FSYNC ||
-+ sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH ||
-+ sync_method == SYNC_METHOD_FDATASYNC))
-+ {
-+ /* Measure I/O timing to sync the WAL file */
-+ if (track_wal_io_timing)
-+ INSTR_TIME_SET_CURRENT(start);
-+
-+ issue_fsync = true;
-+ }
++ /*
++ * Check whether the WAL file was synced to disk right now.
++ *
++ * If fsync is disabled, never issue fsync method.
++ *
++ * If sync_mothod is SYNC_METHOD_OPEN or SYNC_METHOD_OPEN_DSYNC, the WAL
++ * file is already synced.
++ */
++ if (!enableFsync ||
++ sync_method == SYNC_METHOD_OPEN ||
++ sync_method == SYNC_METHOD_OPEN_DSYNC)
++ return;
++
++ /* Measure I/O timing to sync the WAL file */
++ if (track_wal_io_timing)
++ INSTR_TIME_SET_CURRENT(start);
pgstat_report_wait_start(WAIT_EVENT_WAL_SYNC);
switch (sync_method)
-@@ -10570,6 +10608,25 @@ issue_xlog_fsync(int fd, XLogSegNo segno)
+@@ -10546,10 +10587,6 @@ issue_xlog_fsync(int fd, XLogSegNo segno)
+ msg = _("could not fdatasync file \"%s\": %m");
+ break;
+ #endif
+- case SYNC_METHOD_OPEN:
+- case SYNC_METHOD_OPEN_DSYNC:
+- /* write synced it already */
+- break;
+ default:
+ elog(PANIC, "unrecognized wal_sync_method: %d", sync_method);
+ break;
+@@ -10570,6 +10607,20 @@ issue_xlog_fsync(int fd, XLogSegNo segno)
}
pgstat_report_wait_end();
+
+ /*
+ * Increment the I/O timing and the number of times WAL files were synced.
-+ *
-+ * Check whether the WAL file was synced to disk right now because
-+ * statistics must be incremented when syncing really occurred.
+ */
-+ if (issue_fsync)
++ if (track_wal_io_timing)
+ {
-+ if (track_wal_io_timing)
-+ {
-+ instr_time duration;
++ instr_time duration;
+
-+ INSTR_TIME_SET_CURRENT(duration);
-+ INSTR_TIME_SUBTRACT(duration, start);
-+ WalStats.m_wal_sync_time += INSTR_TIME_GET_MICROSEC(duration);
-+ }
-+ WalStats.m_wal_sync++;
++ INSTR_TIME_SET_CURRENT(duration);
++ INSTR_TIME_SUBTRACT(duration, start);
++ WalStats.m_wal_sync_time += INSTR_TIME_GET_MICROSEC(duration);
+ }
++
++ WalStats.m_wal_sync++;
}
/*
@@ -311,38 +320,52 @@
/* ----------
diff --git a/src/backend/postmaster/walwriter.c b/src/backend/postmaster/walwriter.c
-index 4f1a8e356b..08fa7032c0 100644
+index 4f1a8e356b..8491e6f6d6 100644
--- a/src/backend/postmaster/walwriter.c
+++ b/src/backend/postmaster/walwriter.c
-@@ -223,6 +223,7 @@ WalWriterMain(void)
+@@ -78,6 +78,11 @@ int WalWriterFlushAfter = 128;
+ #define LOOPS_UNTIL_HIBERNATE 50
+ #define HIBERNATE_FACTOR 25
+
++/*
++ * Minimum time between stats file updates; in milliseconds.
++ */
++#define PGSTAT_STAT_INTERVAL 500
++
+ /*
+ * Main entry point for walwriter process
+ *
+@@ -222,7 +227,12 @@ WalWriterMain(void)
+ */
for (;;)
{
++ /* we assume this inits to all zeroes: */
++ static TimestampTz last_report = 0;
++
long cur_timeout;
+ int rc;
++ TimestampTz now;
/*
* Advertise whether we might hibernate in this cycle. We do this
-@@ -263,9 +264,16 @@ WalWriterMain(void)
- else
- cur_timeout = WalWriterDelay * HIBERNATE_FACTOR;
-
-- (void) WaitLatch(MyLatch,
-- WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
-- cur_timeout,
-- WAIT_EVENT_WAL_WRITER_MAIN);
-+ rc = WaitLatch(MyLatch,
-+ WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
-+ cur_timeout,
-+ WAIT_EVENT_WAL_WRITER_MAIN);
-+
+@@ -253,6 +263,17 @@ WalWriterMain(void)
+ else if (left_till_hibernate > 0)
+ left_till_hibernate--;
+
+ /*
-+ * Send WAL statistics only if WalWriterDelay has elapsed to minimize
-+ * the overhead in WAL-writing.
++ * Don't send a message unless it's been at least PGSTAT_STAT_INTERVAL
++ * msec since we last sent one
+ */
-+ if (rc & WL_TIMEOUT)
++ now = GetCurrentTimestamp();
++ if (TimestampDifferenceExceeds(last_report, now, PGSTAT_STAT_INTERVAL))
++ {
+ pgstat_send_wal();
- }
- }
++ last_report = now;
++ }
++
+ /*
+ * Sleep until we are signaled or WalWriterDelay has elapsed. If we
+ * haven't done anything useful for quite some time, lengthen the
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 62bff52638..7296ef04ff 100644
--- a/src/backend/utils/adt/pgstatfuncs.c