On 2021/03/05 19:54, Masahiro Ikeda wrote:
On 2021-03-05 12:47, Fujii Masao wrote:
On 2021/03/05 8:38, Masahiro Ikeda wrote:
On 2021-03-05 01:02, Fujii Masao wrote:
On 2021/03/04 16:14, Masahiro Ikeda wrote:
On 2021-03-03 20:27, Masahiro Ikeda wrote:
On 2021-03-03 16:30, Fujii Masao wrote:
On 2021/03/03 14:33, Masahiro Ikeda wrote:
On 2021-02-24 16:14, Fujii Masao wrote:
On 2021/02/15 11:59, Masahiro Ikeda wrote:
On 2021-02-10 00:51, David G. Johnston wrote:
On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda
<ikeda...@oss.nttdata.com> wrote:
I pgindented the patches.
... <function>XLogWrite</function>, which is invoked during an
<function>XLogFlush</function> request (see ...). This is also
incremented by the WAL receiver during replication.
("which normally called" should be "which is normally called" or
"which normally is called" if you want to keep true to the original)
You missed the adding the space before an opening parenthesis here and
elsewhere (probably copy-paste)
is ether -> is either
"This parameter is off by default as it will repeatedly query the
operating system..."
", because" -> "as"
Thanks, I fixed them.
wal_write_time and the sync items also need the note: "This is also
incremented by the WAL receiver during replication."
I skipped changing it since I separated the stats for the WAL receiver
in pg_stat_wal_receiver.
"The number of times it happened..." -> " (the tally of this event is
reported in wal_buffers_full in....) This is undesirable because ..."
Thanks, I fixed it.
I notice that the patch for WAL receiver doesn't require explicitly
computing the sync statistics but does require computing the write
statistics. This is because of the presence of issue_xlog_fsync but
absence of an equivalent pg_xlog_pwrite. Additionally, I observe that
the XLogWrite code path calls pgstat_report_wait_*() while the WAL
receiver path does not. It seems technically straight-forward to
refactor here to avoid the almost-duplicated logic in the two places,
though I suspect there may be a trade-off for not adding another
function call to the stack given the importance of WAL processing
(though that seems marginalized compared to the cost of actually
writing the WAL). Or, as Fujii noted, go the other way and don't have
any shared code between the two but instead implement the WAL receiver
one to use pg_stat_wal_receiver instead. In either case, this
half-and-half implementation seems undesirable.
OK, as Fujii-san mentioned, I separated the WAL receiver stats.
(v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch)
Thanks for updating the patches!
I added the infrastructure code to communicate the WAL receiver stats messages
between the WAL receiver and the stats collector, and
the stats for WAL receiver is counted in pg_stat_wal_receiver.
What do you think?
On second thought, this idea seems not good. Because those stats are
collected between multiple walreceivers, but other values in
pg_stat_wal_receiver is only related to the walreceiver process running
at that moment. IOW, it seems strange that some values show dynamic
stats and the others show collected stats, even though they are in
the same view pg_stat_wal_receiver. Thought?
OK, I fixed it.
The stats collected in the WAL receiver is exposed in pg_stat_wal view in v11
patch.
Thanks for updating the patches! I'm now reading 001 patch.
+ /* Check whether the WAL file was synced to disk right now */
+ if (enableFsync &&
+ (sync_method == SYNC_METHOD_FSYNC ||
+ sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH ||
+ sync_method == SYNC_METHOD_FDATASYNC))
+ {
Isn't it better to make issue_xlog_fsync() return immediately
if enableFsync is off, sync_method is open_sync or open_data_sync,
to simplify the code more?
Thanks for the comments.
I added the above code in v12 patch.
+ /*
+ * Send WAL statistics only if WalWriterDelay has elapsed to minimize
+ * the overhead in WAL-writing.
+ */
+ if (rc & WL_TIMEOUT)
+ pgstat_send_wal();
On second thought, this change means that it always takes wal_writer_delay
before walwriter's WAL stats is sent after XLogBackgroundFlush() is called.
For example, if wal_writer_delay is set to several seconds, some values in
pg_stat_wal would be not up-to-date meaninglessly for those seconds.
So I'm thinking to withdraw my previous comment and it's ok to send
the stats every after XLogBackgroundFlush() is called. Thought?
Thanks, I didn't notice that.
Although PGSTAT_STAT_INTERVAL is 500msec, wal_writer_delay's
default value is 200msec and it may be set shorter time.
Yeah, if wal_writer_delay is set to very small value, there is a risk
that the WAL stats are sent too frequently. I agree that's a problem.
Why don't to make another way to check the timestamp?
+ /*
+ * Don't send a message unless it's been at least
PGSTAT_STAT_INTERVAL
+ * msec since we last sent one
+ */
+ now = GetCurrentTimestamp();
+ if (TimestampDifferenceExceeds(last_report, now,
PGSTAT_STAT_INTERVAL))
+ {
+ pgstat_send_wal();
+ last_report = now;
+ }
+
Although I worried that it's better to add the check code in pgstat_send_wal(),
Agreed.
I didn't do so because to avoid to double check PGSTAT_STAT_INTERVAL.
pgstat_send_wal() is invoked pg_report_stat() and it already checks the
PGSTAT_STAT_INTERVAL.
I think that we can do that. What about the attached patch?
Thanks, I thought it's better.
I forgot to remove an unused variable.
The attached v13 patch is fixed.
Thanks for updating the patch!
+ w.wal_write,
+ w.wal_write_time,
+ w.wal_sync,
+ w.wal_sync_time,
It's more natural to put wal_write_time and wal_sync_time next to
each other? That is, what about the following order of columns?
wal_write
wal_sync
wal_write_time
wal_sync_time
Yes, I fixed it.
- case SYNC_METHOD_OPEN:
- case SYNC_METHOD_OPEN_DSYNC:
- /* write synced it already */
- break;
IMO it's better to add Assert(false) here to ensure that we never reach
here, as follows. Thought?
+ case SYNC_METHOD_OPEN:
+ case SYNC_METHOD_OPEN_DSYNC:
+ /* not reachable */
+ Assert(false);
I agree.
Even when a backend exits, it sends the stats via pgstat_beshutdown_hook().
On the other hand, walwriter doesn't do that. Walwriter also should send
the stats even at its exit? Otherwise some stats can fail to be collected.
But ISTM that this issue existed from before, for example checkpointer
doesn't call pgstat_send_bgwriter() at its exit, so it's overkill to fix
this issue in this patch?
Thanks, I thought it's better to do so.
I added the shutdown hook for the walwriter and the checkpointer in v14-0003
patch.
Thanks!
Seems you forgot to include the changes of expected/rules.out in 0001 patch,
and which caused the regression test to fail. Attached is the updated version
of the patch. I included expected/rules.out in it.
Sorry.
+ PgStat_Counter m_wal_write_time; /* time spend writing wal records in
+ * micro seconds */
+ PgStat_Counter m_wal_sync_time; /* time spend syncing wal records in micro
+ * seconds */
IMO "spend" should be "spent". Also "micro seconds" should be "microseconds"
in sake of consistent with other comments in pgstat.h. I fixed them.
Thanks.
Regarding pgstat_report_wal() and pgstat_send_wal(), I found one bug. Even
when pgstat_send_wal() returned without sending any message,
pgstat_report_wal() saved current pgWalUsage and that counter was used for
the subsequent calculation of WAL usage. This caused some counters not to
be sent to the collector. This is a bug that I added. I fixed this bug.
Thanks.
+ walStats.wal_write += msg->m_wal_write;
+ walStats.wal_write_time += msg->m_wal_write_time;
+ walStats.wal_sync += msg->m_wal_sync;
+ walStats.wal_sync_time += msg->m_wal_sync_time;
I changed the order of the above in pgstat.c so that wal_write_time and
wal_sync_time are placed in next to each other.
I forgot to fix them, thanks.
The followings are the comments for the docs part. I've not updated this
in the patch yet because I'm not sure how to change them for now.
+ Number of times WAL buffers were written out to disk via
+ <function>XLogWrite</function>, which is invoked during an
+ <function>XLogFlush</function> request (see <xref
linkend="wal-configuration"/>)
+ </para></entry>
XLogWrite() can be invoked during the functions other than XLogFlush().
For example, XLogBackgroundFlush(). So the above description might be
confusing?
+ Number of times WAL files were synced to disk via
+ <function>issue_xlog_fsync</function>, which is invoked during an
+ <function>XLogFlush</function> request (see <xref
linkend="wal-configuration"/>)
Same as above.
Yes, why don't you remove "XLogFlush" in the above comments
because XLogWrite() description is covered in wal.sgml?
But, now it's mentioned only for backend,
I added the comments for the wal writer in the attached patch.
+ while <xref linkend="guc-wal-sync-method"/> was set to one of the
+ "sync at commit" options (i.e., <literal>fdatasync</literal>,
+ <literal>fsync</literal>, or <literal>fsync_writethrough</literal>).
Even open_sync and open_datasync do the sync at commit. No? I'm not sure
if "sync at commit" is right term to indicate fdatasync, fsync and
fsync_writethrough.
Yes, why don't you change to the following comments?
```
while <xref linkend="guc-wal-sync-method"/> was set to one of the
options which specific fsync method is called (i.e.,
<literal>fdatasync</literal>,
<literal>fsync</literal>, or <literal>fsync_writethrough</literal>)
```
+ <literal>open_sync</literal>. Units are in milliseconds with
microsecond resolution.
"with microsecond resolution" part is really necessary?
I removed it because blk_read_time in pg_stat_database is the same above,
but it doesn't mention it.
+ transaction records are flushed to permanent storage.
+ <function>XLogFlush</function> calls <function>XLogWrite</function> to write
+ and <function>issue_xlog_fsync</function> to flush them, which are
counted as
+ <literal>wal_write</literal> and <literal>wal_sync</literal> in
+ <xref linkend="pg-stat-wal-view"/>. On systems with high log output,
This description might cause users to misread that XLogFlush() calls
issue_xlog_fsync(). Since issue_xlog_fsync() is called by XLogWrite(),
ISTM that this description needs to be updated.
I understood. I fixed to mention that XLogWrite()
calls issue_xlog_fsync().
Each line in the above seems to end with a space character.
This space character should be removed.
Sorry for that. I removed it.
Thanks for updating the patch! I think it's getting good shape!
- pid | wait_event_type | wait_event
+ pid | wait_event_type | wait_event
This change is not necessary?
- every <xref linkend="guc-wal-writer-delay"/> milliseconds.
+ every <xref linkend="guc-wal-writer-delay"/> milliseconds, which calls
+ <function>XLogWrite</function> to write and <function>XLogWrite</function>
+ <function>issue_xlog_fsync</function> to flush them. They are counted as
+ <literal>wal_write</literal> and <literal>wal_sync</literal> in
+ <xref linkend="pg-stat-wal-view"/>.
Isn't it better to avoid using the terms like XLogWrite or issue_xlog_fsync
before explaining what they are? They are explained later. At least for me
I'm ok without this change.
- to write (move to kernel cache) a few filled <acronym>WAL</acronym>
- buffers. This is undesirable because <function>XLogInsertRecord</function>
+ to call <function>XLogWrite</function> to write (move to kernel cache) a
+ few filled <acronym>WAL</acronym> buffers (the tally of this event is
reported in
+ <literal>wal_buffers_full</literal> in <xref linkend="pg-stat-wal-view"/>).
+ This is undesirable because <function>XLogInsertRecord</function>
This paragraph explains the relationshp between WAL writes and WAL buffers. I
don't think it's good to add different context to this paragraph. Instead, what
about adding new paragraph like the follwing?
----------------------------------
When track_wal_io_timing is enabled, the total amounts of time XLogWrite writes
and issue_xlog_fsync syncs WAL data to disk are counted as wal_write_time and
wal_sync_time in pg_stat_wal view, respectively. XLogWrite is normally called
by XLogInsertRecord (when there is no space for the new record in WAL buffers),
XLogFlush and the WAL writer, to write WAL buffers to disk and call
issue_xlog_fsync. If wal_sync_method is either open_datasync or open_sync, a
write operation in XLogWrite guarantees to sync written WAL data to disk and
issue_xlog_fsync does nothing. If wal_sync_method is either fdatasync, fsync,
or fsync_writethrough, the write operation moves WAL buffer to kernel cache and
issue_xlog_fsync syncs WAL files to disk. Regardless of the setting of
track_wal_io_timing, the numbers of times XLogWrite writes and issue_xlog_fsync
syncs WAL data to disk are also counted as wal_write and wal_sync in
pg_stat_wal, respectively.
----------------------------------
+ <function>issue_xlog_fsync</function> (see <xref
linkend="wal-configuration"/>)
"request" should be place just before "(see"?
+ Number of times WAL files were synced to disk via
+ <function>issue_xlog_fsync</function> (see <xref
linkend="wal-configuration"/>)
+ while <xref linkend="guc-wal-sync-method"/> was set to one of the
+ options which specific fsync method is called (i.e.,
<literal>fdatasync</literal>,
+ <literal>fsync</literal>, or <literal>fsync_writethrough</literal>).
Isn't it better to mention the case of fsync=off? What about the following?
----------------------------------
Number of times WAL files were synced to disk via issue_xlog_fsync (see ...).
This is zero when fsync is off or wal_sync_method is either open_datasync or
open_sync.
----------------------------------
+ Total amount of time spent writing WAL buffers were written out to disk
via
"were written out" is not necessary?
+ Total amount of time spent syncing WAL files to disk via
+ <function>issue_xlog_fsync</function> request (see <xref
linkend="wal-configuration"/>)
+ while <xref linkend="guc-wal-sync-method"/> was set to one of the
+ options which specific fsync method is called (i.e.,
<literal>fdatasync</literal>,
+ <literal>fsync</literal>, or <literal>fsync_writethrough</literal>).
+ Units are in milliseconds.
+ This is zero when <xref linkend="guc-track-wal-io-timing"/> is disabled.
Isn't it better to explain the case where this counter is zero a bit more
clearly as follows?
---------------------
This is zero when track_wal_io_timing is disabled, fsync is off, or
wal_sync_method is either open_datasync or open_sync.
---------------------
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION