Hi, On 2021-03-21 14:53:50 -0700, Andres Freund wrote: > On 2021-03-21 11:41:30 -0400, Stephen Frost wrote: > > > 2) How to remove stats of dropped objects? > > > > > > [...] > > > > The current approach sounds pretty terrible and propagating that forward > > doesn't seem great. Guess here I'd disagree with your gut feeling and > > encourage trying to go 'full in', as you put it, or at least put enough > > effort into it to get a feeling of if it's going to require a *lot* more > > work or not and then reconsider if necessary. > > I think my gut's argument is that it's already a huge patch, and that > it's better to have the the very substantial memory and disk IO savings > with the crappy vacuum approach, than neither. And given the timeframe > there does seem to be a substantial danger of "neither" being the > outcome... Anyway, I'm mentally sketching out what it'd take.
I implemented this. Far from polished, but it does survive the regression tests, including new tests in stats.sql. functions stats, 2PC and lots of naming issues (xl_xact_dropped_stats with members ndropped and'dropped_stats) are yet to be addressed - but not architecturally relevant. I think there are three hairy corner-cases that I haven't thought sufficiently about: - It seems likely that there's a relatively narrow window where a crash could end up not dropping stats. The xact.c integration is basically parallel to smgrGetPendingDeletes etc. I don't see what prevents a checkpoint from happing after RecordTransactionCommit() (which does smgrGetPendingDeletes), but before smgrDoPendingDeletes(). Which means that if we crash in that window, nothing would clean up those files (and thus also not the dropped stats that I implemented very similarly). Closing that window seems doable (put the pending file/stat deletion list in shared memory after logging the WAL record, but before MyProc->delayChkpt = false, do something with that in CheckPointGuts()), but not trivial. If we had a good way to run pgstat_vacuum_stat() on a very *occasional* basis via autovac, that'd be ok. But it'd be a lot nicer if it were bulletproof. - Does temporary table cleanup after a crash properly deal with their stats? - I suspect there are a few cases where pending stats in one connection could "revive" an already dropped stat. It'd not be hard to address in an inefficient way in the shmstats patch, but it'd come with some efficiency penalty - but it might an irrelevant efficiency difference. Definitely for later, but if we got this ironed out, we probably could stop throwing stats away after crashes. Instead storing stats snapshots alongside redo LSNs or such. It'd be nice if immediate shutdowns etc wouldn't lead to autovacuum not doing any vacuuming for quite a while. Greetings, Andres Freund