Hi,

Thank you for the amazing and great work.

On 23.02.2021 15:03, Andres Freund wrote:
## Stats

There are two new views: pg_stat_aios showing AIOs that are currently
in-progress, pg_stat_aio_backends showing per-backend statistics about AIO.

As a DBA I would like to propose a few amendments that might help with practical usage of stats when feature will be finally implemented. My suggestions aren’t related to the central idea of the proposed changes, but rather to the stats part.

A quick side note, there are two terms in Prometheus (https://prometheus.io/docs/concepts/metric_types/): 1. Counter. A counter is a cumulative metric that represents a single monotonically increasing counter whose value can only increase or be reset to zero on restart. 2. Gauge. A gauge is a metric that represents a single numerical value that can arbitrarily go up and down.

For the purposes of long-term stats collection, COUNTERs are preferred over GAUGEs, because COUNTERs allow us to understand how metrics are changed overtime without missing out potential spikes in activity. As a result, we have a much better historic perspective.

Measuring and collecting GAUGEs is limited to the moments in time when the stats are taken (snapshots) so the changes that took place between the snapshots remain unmeasured. In systems with a high rate of transactions per second (even 1 second interval between the snapshots) GAUGEs measuring won’t provide the full picture.  In addition, most of the monitoring systems like Prometheus, Zabbix, etc. use longer intervals (from 10-15 to 60 seconds).

The main idea is to try to expose almost all numeric stats as COUNTERs - this increases overall observabilty of implemented feature.

pg_stat_aios.
In general, this stat is a set of text values, and at the same time it looks GAUGE-like (similar to pg_stat_activity or pg_locks), and is only relevant for the moment when the user is looking at it. I think it would be better to rename this view to pg_stat_progress_aios. And keep pg_stat_aios for other AIO stats with global COUNTERs (like stuff in pg_stat_user_tables or pg_stat_statements, or system-wide /proc/stat, /proc/diskstats).

pg_stat_aio_backends.
This stat is based on COUNTERs, which is great, but the issue here is that its lifespan is limited by the lifespan of the backend processes - once the backend exits the stat will no longer be available - which could be inappropriate in workloads with short-lived backends.

I think there might be few existing examples in the current code that could be repurposed to implement the suggestions above (such as pg_stat_user_tables, pg_stat_database, etc). With this in mind, I think having these changes incorporated shouldn’t take significant effort considering the benefit it will bring to the final user.

Once again huge respect to your work on this changes and good look.

Regards, Alexey



Reply via email to