On Tue, May 13, 2025 at 3:19 PM Alena Rybakina <a.rybak...@postgrespro.ru> wrote: > > On 12.05.2025 08:30, Amit Kapila wrote: > > On Fri, May 9, 2025 at 5:34 PM Alena Rybakina <a.rybak...@postgrespro.ru> > > wrote: > >> I did a rebase and finished the part with storing statistics separately > >> from the relation statistics - now it is possible to disable the > >> collection of statistics for relationsh using gucs and > >> this allows us to solve the problem with the memory consumed. > >> > > I think this patch is trying to collect data similar to what we do for > > pg_stat_statements for SQL statements. So, can't we follow a similar > > idea such that these additional statistics will be collected once some > > external module like pg_stat_statements is enabled? That module should > > be responsible for accumulating and resetting the data, so we won't > > have this memory consumption issue. > The idea is good, it will require one hook for the pgstat_report_vacuum > function, the extvac_stats_start and extvac_stats_end functions can be > run if the extension is loaded, so as not to add more hooks. > But I see a problem here with tracking deleted objects for which > statistics are no longer needed. There are two solutions to this and I > don't like both of them, to be honest. > The first way is to add a background process that will go through the > table with saved statistics and check whether the relation or the > database are relevant now or not and if not, then > delete the vacuum statistics information for it. This may be > resource-intensive. The second way is to add hooks for deleting the > database and relationships (functions dropdb, index_drop, > heap_drop_with_catalog). >
How does pg_stat_io manages this? I mean how it removes objects that are dropped? Does some background task removes it? > > BTW, how will these new statistics be used to autotune a vacuum? > yes, but they are collected on demand - by guc. > > And > > do we need all the statistics proposed by this patch? > > > Regarding this issue, it was discussed here and so far we have come to > the conclusion that statistics are needed for a deep understanding of > the work of vacuum statistics [0] [1] [2]. > I haven't gone through the emails, but my opinion is to break the number of stats into some important subset of stats first and then keep enhancing it. Right now, the patch struggles with two concerns: one is what the design should be to capture the required stats, and the second is convincing ourselves whether we need all the stats it is trying to expose. Breaking into a smaller subset of stats could alleviate the second concern. -- With Regards, Amit Kapila.