On Tue, Apr 22, 2025 at 09:43:56PM +0200, Christoph Berg wrote: > Re: Nathan Bossart >> Update guidance for running vacuumdb after pg_upgrade. >> >> Now that pg_upgrade can carry over most optimizer statistics, we >> should recommend using vacuumdb's new --missing-stats-only option >> to only analyze relations that are missing statistics. > > I've been looking at vacuumdb --missing-stats-only because Debian's > pg_upgradecluster is using that now. > > I am wondering if this is really good advice in the pg_upgrade > documentation. Sure it's nice that optimizer statistics are carried > over by pg_upgrade, but the pg_stat_user_tables statistics are not > carried over, and afaict these are the numbers that determine when the > next autovacuum or autoanalyze run is going to happen. By removing the > "please run vacuumdb on all tables" step from the pg_upgrade docs, we > are effectively telling everyone that they should be starting with > these numbers all 0, postponing the next run to some indeterminate > point. Running `vacuumdb --missing-stats-only` does not fix that > because it's skipping the tables. Is that the message we want to send? > > (If I am misinterpreting the situation the docs should still explain > why this is ok.)
relation_needs_vacanalyze() uses dead_tuples, ins_since_vacuum, and mod_since_analyze. IIUC a full post-upgrade vacuumdb run would only set dead_tuples to a nonzero value, so the worst-case scenario is that it would take longer before a vacuum is triggered based on autovacuum_vacuum_{threshold,max_threshold,scale_factor}. To address this, I think we'd need to recommend using "vacuumdb --all --analyze-only" instead. We could alternatively suggest first running "vacuumdb --all --analyze-in-stages --missing-stats-only" (to fill in any missing stats) followed by "vacuumdb --all --analyze-only" (to update dead_tuples). However, I'm not sure how concerned to be about this. It does seem bad that it might take longer for tables to be vacuumed for the first time after upgrade, but I believe that's already the case for any type of unclean shutdown (e.g., immediate shutdown, server crash, starting from a base backup, point-in-time recovery). I see that we do recommend running ANALYZE after pg_stat_reset(), though. In any case, IMO it's unfortunate that we might end up recommending roughly the same post-upgrade steps as before even though the optimizer statistics are carried over. -- nathan