Hello! On Thu, 27/06/2024 at 10:39 +0900, Masahiko Sawada: > On Fri, May 31, 2024 at 4:19 AM Andrei Zubkov <zub...@moonset.ru> > wrote: > > As the vacuum process is a backend it has a workload > > instrumentation. > > We have all the basic counters available such as a number of blocks > > read, hit and written, time spent on I/O, WAL stats and so on.. > > Also, > > we can easily get some statistics specific to vacuum activity i.e. > > number of tuples removed, number of blocks removed, number of VM > > marks > > set and, of course the most important metric - time spent on vacuum > > operation. > > I've not reviewed the patch closely but it sounds helpful for users. > I > would like to add a statistic, the high-water mark of memory usage of > dead tuple TIDs. Since the amount of memory used by TidStore is hard > to predict, I think showing the high-water mark would help users to > predict how much memory they set to maintenance_work_mem. > Thank you for your interest on this patch. I've understand your idea. The obvious goal of it is to avoid expensive index multi processing during vacuum of the heap. Provided statistics in the patch contain the index_vacuum_count counter for each table which can be compared to the pg_stat_all_tables.vacuum_count to detect specific relation index multi-passes. Previous setting of maintenance_work_mem is known. Usage of TidStore should be proportional to the amount of dead-tuples vacuum workload on the table, so as the first evaluation we can take the number of index passes per one heap pass as a maintenance_work_mem multiplier.
But there is a better way. Once we detected the index multiprocessing we can lower the vacuum workload for the heap pass making vacuum a little bit more aggressive for this particular relation. I mean, in such case increasing maintenance_work_mem is not only decision. Suggested high-water mark statistic can't be used as cumulative statistic - any high-water mark statistic as maximim-like statistic is valid for certain time period thus should be reset on some kind of schedule. Without resets it should reach 100% once under the heavy load and stay there forever. Said that such high-water mark seems a little bit unclear and complicated for the DBA. It seems redundant to me right now. I can see the main value of such statistic is to avoid too large maintenance_work_mem setting. But I can't see really dramatic consequences of that. Maybe I've miss something.. -- Andrei Zubkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company