On Fri, Jan 24, 2025 at 7:44 AM Nathan Bossart <nathandboss...@gmail.com> wrote: > > On Mon, Jan 06, 2025 at 03:27:18PM -0600, Nathan Bossart wrote: > > On Mon, Dec 30, 2024 at 03:45:03PM -0500, Bruce Momjian wrote: > >> On Mon, Dec 30, 2024 at 12:02:47PM -0800, Jeff Davis wrote: > >>> I suggest that we make a new thread about the vacuumdb changes and > >>> focus this thread and patch series on the pg_dump changes (and minor > >>> flag adjustments to pg_upgrade). > >>> > >>> Unless you think that the pg_dump changes should block on the vacuumdb > >>> changes? In which case please let me know because the pg_dump changes > >>> are otherwise close to commit. > >> > >> I think that is a good idea. I don't see vacuumdb blocking this. > > > > +1, I've been reviewing the vacuumdb portion and am planning to start a new > > thread in the near future. IIUC the bulk of the vacuumdb changes are > > relatively noncontroversial, we just haven't reached consensus on the user > > interface. > > As promised, I'm starting a new thread for this. The original thread [0] > has some preliminary discussion about the subject. > > As you may be aware, there is an ongoing effort to carry over statistics > during pg_upgrade. Today, we encourage users to use vacuumdb to run > ANALYZE on all relations after upgrading. There's even a special > --analyze-in-stages option that fast-tracks an initial set of minimal > statistics for this use-case. Once the statistics are carried over by > pg_upgrade, there will be little need to do this, except for perhaps > extended statistics if they aren't carried over. But there are patches in > flight for that, too [1]. > > This thread is dedicated to figuring out what, if anything, to do about > vacuumdb. I see the following general categories of options: > > * Do nothing. Other than updating our recommended guidance for > post-upgrade analyzing, we'd leave vacuumdb alone. While this is > certainly a simple option, it has a couple of key drawbacks. For one, > anyone who doesn't see the new vacuumdb guidance may continue to do > unnecessary post-upgrade analyzes. Also, if we don't get the extended > statistics piece completed for v18, users will have to manually construct > ANALYZE commands for those to run post-upgrade. > > * Add a breaking change so that users are forced to fix any outdated > post-upgrade scripts. This is what the attached patches do. In short, > they add a required parameter to --analyze-in-stages that can be set to > either "all" or "missing". The new "missing" mode generates ANALYZE > commands for relations that are missing statistics, while the "all" mode > does the same thing that --analyze-in-stages does today. While the > "missing" mode might be useful outside of upgrade cases, we could also > recommend it as a post-upgrade step if the extended statistics work > doesn't get committed for v18. > > * Add a new option that will make it easy to ANALYZE any relations that are > missing statistics, but don't make any breaking changes to existing > post-upgrade scripts. This option isn't really strictly necessary if we > get the extended statistics parts committed, but it could be a nice > feature, anyway. > > I chose the second approach because it had the most support in the other > thread, but I definitely wouldn't characterize it as a consensus. 0001 > simply refactors the main catalog query to its own function so that its > results can be reused in later stages of --analyze-in-stages. This might > require a bit more memory and make --analyze-in-stages less responsive to > concurrent changes, but it wasn't all that responsive to begin with. 0002 > adds the new "missing" mode functionality. Note that it regenerates all > statistics for a relation if any applicable statistics types are missing. > It's not clear whether we can or should do any better than that. Corey and > I put a lot of effort into the catalog query changes, and we think we've > covered everything, but we would of course appreciate some review on that > part. > > BTW as long as the basic "missing" mode idea seems reasonable, it's easy > enough to adjust the user interface to whatever we want, and I'm happy to > do so as needed. > > Finally, I think another open question is whether any of this should apply > to --analyze and/or --analyze-only. We do recommend the latter as a > post-upgrade step in our pg_upgrade documentation, and I could see the > "missing" mode being useful on its own for these modes, too. > >Thoughts?
I've not closely reviewed the patches yet but I find that the second approach is reasonable to me. IIUC if the extended statistics import/export work gets committed for v18, we don't need a new option for post-upgrade analyze. But as you mentioned, these new modes seem useful for other use cases too. Given that it could be used outside of post-upgrading, I think it would make more sense to apply the "all" and "missing" modes to --analyze and --analyze-only options too. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com