Hi Tomas This is a great feature. + /* + * Define (or redefine) custom GUC variables. + */ + DefineCustomIntVariable("stats_history.size", + "Sets the amount of memory available for past events.", + NULL, + &statsHistorySizeMB, + 1, + 1, + 128, + PGC_POSTMASTER, + GUC_UNIT_MB, + NULL, + NULL, + NULL); + RAM is in terabytes now, the statsHistorySize is 128MB ,I think can increase to store more history record ?
Thanks On Sun, Dec 22, 2024 at 4:28 AM Tomas Vondra <to...@vondra.me> wrote: > Hi, > > Our runtime stats system is great, but it only keeps a snapshot of > cumulative stats / snapshot. And while that works for user workloads, it > may not be quite sufficient when analyzing maintenance operations like > vacuum/checkpoint, etc. > > For those operations it's useful to have information about individual > runs, not just the cumulative counters (or even deltas between regular > snapshots). There's also the issue that we only keep a very limited > subset of available information - just look at the info included in > VACUUM VERBOSE or with log_checkpoints=on, and how little of that is > available in pg_stats_. For vacuum we have the vacuum/analyze counts, > and timestamp of the last operation, but that's it. VACUUM VERBOSE > provides way more information, but we can only guess based on the > regular pgstat counters. > > Yes, we can get this info written to server log using log_checkpoints > and log_autovacuum_min_duration (AFAIK there's no way to ensure logging > for manual VACUUM). But processing this information is not particularly > convenient, as it requires parsing the log, the message format is > suitable more for humans, etc. And it'd be very convenient to be able to > query this information, just like the other pgstat catalogs. > > I wonder if we might/should do two things, to improve this: > > 1) Introduce hooks that allow doing some custom stuff with info about > those actions, after logging it. The attached 0001 and 0002 patches do > this for vacuum and checkpoint. > > 2) Allow keeping information about events. The 0003 patch does that in > an extension, leveraging the new hooks, but it'd certainly possible to > do in core too. > > I realize our current pgstat collector is keeping per-object stats, not > per-event. We might add this to per-object stats (e.g. each table would > have stats about vacuum runs), but that wouldn't work for checkpoints. > There's also the question of memory consumption - I'm sure we don't want > to keep infinite history of vacuum runs, for example. > > So the extension simply maintains a fixed-size circular queue, i.e. when > it gets full it starts evicting oldest entries. 1MB is enough for > storing ~4k VACUUM runs - I'm sure it can be made more compact. > > I don't think there's a better way to do this. I've considered if this > might be done using emit_log_hook, but (a) that only helps when we end > up logging the event (and I'd like to do this always), and (b) it'd > require parsing the server log. So it's not much better than just doing > that, I think ... > > > Opinions? > > -- > Tomas Vondra >