Hi,

I think it's better to add other WAL statistics to the pg_stat_wal view.
I'm thinking to add the following statistics. Please let me know your thoughts.

1.  Basic wal statistics

* wal_records: Total number of WAL records generated
* wal_fpi: Total number of WAL full page images generated
* wal_bytes: Total amount of WAL bytes generated

To understand DB's performance, first, we will check the performance trends for the entire database instance. For example, if the number of wal_fpi becomes higher, users may tune "wal_compression", "checkpoint_timeout" and so on.

Although users can check the above statistics via EXPLAIN, auto_explain, autovacuum and pg_stat_statements now, if users want to see the performance trends for the entire database,
they must preprocess the statistics.

Is it useful to add the sum of the above statistics to the pg_stat_wal view?


2.  Number of when new WAL file is created and zero-filled.

As Fujii-san already commented, I think it's good for tuning.

Just idea; it may be worth exposing the number of when new WAL file is created and zero-filled. This initialization may have impact on the performance of write-heavy workload generating lots of WAL. If this number is reported high, to reduce the number of this initialization, we can tune WAL-related parameters so that more "recycled" WAL files can be hold.


3.  Number of when to switch the WAL logfile segment.

This is similar to 2, but this counts the number of when WAL file is recylcled too.
I think it's useful for tuning "wal_segment_size"
if the number is high relative to the startup time, "wal_segment_size" must be bigger.


4. Number of when WAL is flushed

I think it's useful for tuning "synchronous_commit" and "commit_delay" for query executions. If the number of WAL is flushed is high, users can know "synchronous_commit" is useful for the workload.

Also, it's useful for tuning "wal_writer_delay" and "wal_writer_flush_after" for wal writer.
If the number is high, users can change the parameter for performance.

I think it's better to separate this for backends and wal writer.


5.  Wait time when WAL is flushed.

This is the accumulated time when wal is flushed.
If the time becomes much higher, users can detect the possibility of disk failure.

Since users can see how much flash time occupies of the query execution time,
it may lead to query tuning and so on.

Since there is the above reason, I think it's better to separate this for backends and wal writer.


Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION


Reply via email to