On 2020-10-20 12:46, Amit Kapila wrote:
On Tue, Oct 20, 2020 at 8:01 AM Masahiro Ikeda
<ikeda...@oss.nttdata.com> wrote:
Hi,
I think we need to add some statistics to pg_stat_wal view.
Although there are some parameter related WAL,
there are few statistics for tuning them.
I think it's better to provide the following statistics.
Please let me know your comments.
```
postgres=# SELECT * from pg_stat_wal;
-[ RECORD 1 ]-------+------------------------------
wal_records | 2000224
wal_fpi | 47
wal_bytes | 248216337
wal_buffers_full | 20954
wal_init_file | 8
wal_write_backend | 20960
wal_write_walwriter | 46
wal_write_time | 51
wal_sync_backend | 7
wal_sync_walwriter | 8
wal_sync_time | 0
stats_reset | 2020-10-20 11:04:51.307771+09
```
1. Basic statistics of WAL activity
- wal_records: Total number of WAL records generated
- wal_fpi: Total number of WAL full page images generated
- wal_bytes: Total amount of WAL bytes generated
To understand DB's performance, first, we will check the performance
trends for the entire database instance.
For example, if the number of wal_fpi becomes higher, users may tune
"wal_compression", "checkpoint_timeout" and so on.
Although users can check the above statistics via EXPLAIN,
auto_explain,
autovacuum and pg_stat_statements now,
if users want to see the performance trends for the entire database,
they must recalculate the statistics.
Here, do you mean to say 'entire cluster' instead of 'entire database'
because it seems these stats are getting collected for the entire
cluster?
Thanks for your comments.
Yes, I wanted to say 'entire cluster'.
I think it is useful to add the sum of the basic statistics.
There is an argument that it is better to view these stats at the
statement-level so that one can know which statements are causing most
WAL and then try to rate-limit them if required in the application and
anyway they can get the aggregate of all the WAL if they want. We have
added these stats in PG-13, so do we have any evidence that the
already added stats don't provide enough information? I understand
that you are trying to display the accumulated stats here which if
required users/DBA need to compute with the currently provided stats.
OTOH, sometimes adding more ways to do some things causes difficulty
for users to understand and learn.
I agreed that the statement-level stat is important and I understood
that we can
know the aggregated WAL stats of pg_stat_statement view and autovacuum's
log.
But now, WAL stats generated by autovacuum can be output to logs and it
is not
easy to aggregate them. Since WAL writes impacts for the entire cluster,
I thought
it's natural to provide accumulated value.
I see that we also need to add extra code to capture these stats (some
of which is in performance-critical path especially in
XLogInsertRecord) which again makes me a bit uncomfortable. It might
be that it is all fine as it is very important to collect these stats
at cluster-level in spite that the same information can be gathered at
statement-level to help customers but I don't see a very strong case
for that in your proposal.
Also about performance, I thought there are few impacts because it
increments stats in memory. If I can implement to reuse pgWalUsage's
value which already collects these stats, there is no impact in
XLogInsertRecord.
For example, how about pg_stat_wal() calculates the accumulated
value of wal_records, wal_fpi, and wal_bytes to use pgWalUsage's value?
Regards
--
Masahiro Ikeda
NTT DATA CORPORATION