On Tue, Oct 20, 2020 at 8:01 AM Masahiro Ikeda <ikeda...@oss.nttdata.com> wrote: > > Hi, > > I think we need to add some statistics to pg_stat_wal view. > > Although there are some parameter related WAL, > there are few statistics for tuning them. > > I think it's better to provide the following statistics. > Please let me know your comments. > > ``` > postgres=# SELECT * from pg_stat_wal; > -[ RECORD 1 ]-------+------------------------------ > wal_records | 2000224 > wal_fpi | 47 > wal_bytes | 248216337 > wal_buffers_full | 20954 > wal_init_file | 8 > wal_write_backend | 20960 > wal_write_walwriter | 46 > wal_write_time | 51 > wal_sync_backend | 7 > wal_sync_walwriter | 8 > wal_sync_time | 0 > stats_reset | 2020-10-20 11:04:51.307771+09 > ``` > > 1. Basic statistics of WAL activity > > - wal_records: Total number of WAL records generated > - wal_fpi: Total number of WAL full page images generated > - wal_bytes: Total amount of WAL bytes generated > > To understand DB's performance, first, we will check the performance > trends for the entire database instance. > For example, if the number of wal_fpi becomes higher, users may tune > "wal_compression", "checkpoint_timeout" and so on. > > Although users can check the above statistics via EXPLAIN, auto_explain, > autovacuum and pg_stat_statements now, > if users want to see the performance trends for the entire database, > they must recalculate the statistics. >
Here, do you mean to say 'entire cluster' instead of 'entire database' because it seems these stats are getting collected for the entire cluster? > I think it is useful to add the sum of the basic statistics. > There is an argument that it is better to view these stats at the statement-level so that one can know which statements are causing most WAL and then try to rate-limit them if required in the application and anyway they can get the aggregate of all the WAL if they want. We have added these stats in PG-13, so do we have any evidence that the already added stats don't provide enough information? I understand that you are trying to display the accumulated stats here which if required users/DBA need to compute with the currently provided stats. OTOH, sometimes adding more ways to do some things causes difficulty for users to understand and learn. I see that we also need to add extra code to capture these stats (some of which is in performance-critical path especially in XLogInsertRecord) which again makes me a bit uncomfortable. It might be that it is all fine as it is very important to collect these stats at cluster-level in spite that the same information can be gathered at statement-level to help customers but I don't see a very strong case for that in your proposal. -- With Regards, Amit Kapila.