On Tue, 23 Apr 2024, 05:52 Tom Lane, <t...@sss.pgh.pa.us> wrote: > Jeff Davis <pg...@j-davis.com> writes: > > On Mon, 2024-04-22 at 16:19 -0400, Tom Lane wrote: > >> Loading data without stats, and hoping > >> that auto-analyze will catch up sooner not later, is exactly the > >> current behavior that we're doing all this work to get out of. > > > That's the disconnect, I think. For me, the main reason I'm excited > > about this work is as a way to solve the bad-plans-after-upgrade > > problem and to repro planner issues outside of production. Avoiding the > > need to ANALYZE at the end of a data load is also a nice convenience, > > but not a primary driver (for me). > > Oh, I don't doubt that there are use-cases for dumping stats without > data. I'm just dubious about the reverse. I think data+stats should > be the default, even if only because pg_dump's default has always > been to dump everything. Then there should be a way to get stats > only, and maybe a way to get data only. Maybe this does argue for a > four-section definition, despite the ensuing churn in the pg_dump API.
I've heard of use cases where dumping stats without data would help with production database planner debugging on a non-prod system. Sure, some planner inputs would have to be taken into account too, but having an exact copy of production stats is at least a start and can help build models and alerts for what'll happen when the tables grow larger with the current stats. As for other planner inputs: table size is relatively easy to shim with sparse files; cumulative statistics can be copied from a donor replica if needed, and btree indexes only really really need to contain their highest and lowest values (and need their height set correctly). Kind regards, Matthias van de Meent