Hi, On Mon, Jun 10, 2024 at 10:36:42AM -0500, Nathan Bossart wrote: > On Mon, Jun 10, 2024 at 06:05:13AM +0000, Bertrand Drouvot wrote: > > During the last pgconf.dev I attended Robert´s presentation about > > autovacuum and > > it made me remember of an idea I had some time ago: $SUBJECT > > This sounds like useful information to me.
Thanks for looking at it! > I wonder if we should also > surface the effective cost limit for each autovacuum worker. I'm not sure about it as I think that it could be misleading: one could query pg_stat_progress_vacuum and conclude that the time_delayed he is seeing is due to _this_ cost_limit. But that's not necessary true as the cost_limit could have changed multiple times since the vacuum started. So, unless there is frequent sampling on pg_stat_progress_vacuum, displaying the time_delayed and the cost_limit could be misleadind IMHO. > > Currently one can change [autovacuum_]vacuum_cost_delay and > > [auto vacuum]vacuum_cost_limit but has no reliable way to measure the > > impact of > > the changes on the vacuum duration: one could observe the vacuum duration > > variation but the correlation to the changes is not accurate (as many others > > factors could impact the vacuum duration (load on the system, i/o > > latency,...)). > > IIUC you'd need to get information from both pg_stat_progress_vacuum and > pg_stat_activity in order to know what percentage of time was being spent > in cost delay. Is that how you'd expect for this to be used in practice? Yeah, one could use a query such as: select p.*, now() - a.xact_start as duration from pg_stat_progress_vacuum p JOIN pg_stat_activity a using (pid) for example. Worth to provide an example somewhere in the doc? > > pgstat_report_wait_start(WAIT_EVENT_VACUUM_DELAY); > > pg_usleep(msec * 1000); > > pgstat_report_wait_end(); > > + /* Report the amount of time we slept */ > > + if (VacuumSharedCostBalance != NULL) > > + > > pgstat_progress_parallel_incr_param(PROGRESS_VACUUM_TIME_DELAYED, msec); > > + else > > + > > pgstat_progress_incr_param(PROGRESS_VACUUM_TIME_DELAYED, msec); > > Hm. Should we measure the actual time spent sleeping, or is a rough > estimate good enough? I believe pg_usleep() might return early (e.g., if > the process is signaled) or late, so this field could end up being > inaccurate, although probably not by much. If we're okay with millisecond > granularity, my first instinct is that what you've proposed is fine, but I > figured I'd bring it up anyway. Thanks for bringing that up! I had the same thought when writing the code and came to the same conclusion. I think that's a good enough estimation and specially during a long running vacuum (which is probably the case where users care the most). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com