Re: vacuumdb changes for stats import/export

Masahiko Sawada Fri, 24 Jan 2025 13:36:21 -0800

On Fri, Jan 24, 2025 at 7:44 AM Nathan Bossart <[email protected]> wrote:
>
> On Mon, Jan 06, 2025 at 03:27:18PM -0600, Nathan Bossart wrote:
> > On Mon, Dec 30, 2024 at 03:45:03PM -0500, Bruce Momjian wrote:
> >> On Mon, Dec 30, 2024 at 12:02:47PM -0800, Jeff Davis wrote:
> >>> I suggest that we make a new thread about the vacuumdb changes and
> >>> focus this thread and patch series on the pg_dump changes (and minor
> >>> flag adjustments to pg_upgrade).
> >>>
> >>> Unless you think that the pg_dump changes should block on the vacuumdb
> >>> changes? In which case please let me know because the pg_dump changes
> >>> are otherwise close to commit.
> >>
> >> I think that is a good idea.  I don't see vacuumdb blocking this.
> >
> > +1, I've been reviewing the vacuumdb portion and am planning to start a new
> > thread in the near future.  IIUC the bulk of the vacuumdb changes are
> > relatively noncontroversial, we just haven't reached consensus on the user
> > interface.
>
> As promised, I'm starting a new thread for this.  The original thread [0]
> has some preliminary discussion about the subject.
>
> As you may be aware, there is an ongoing effort to carry over statistics
> during pg_upgrade.  Today, we encourage users to use vacuumdb to run
> ANALYZE on all relations after upgrading.  There's even a special
> --analyze-in-stages option that fast-tracks an initial set of minimal
> statistics for this use-case.  Once the statistics are carried over by
> pg_upgrade, there will be little need to do this, except for perhaps
> extended statistics if they aren't carried over.  But there are patches in
> flight for that, too [1].
>
> This thread is dedicated to figuring out what, if anything, to do about
> vacuumdb.  I see the following general categories of options:
>
> * Do nothing.  Other than updating our recommended guidance for
>   post-upgrade analyzing, we'd leave vacuumdb alone.  While this is
>   certainly a simple option, it has a couple of key drawbacks.  For one,
>   anyone who doesn't see the new vacuumdb guidance may continue to do
>   unnecessary post-upgrade analyzes.  Also, if we don't get the extended
>   statistics piece completed for v18, users will have to manually construct
>   ANALYZE commands for those to run post-upgrade.
>
> * Add a breaking change so that users are forced to fix any outdated
>   post-upgrade scripts.  This is what the attached patches do.  In short,
>   they add a required parameter to --analyze-in-stages that can be set to
>   either "all" or "missing".  The new "missing" mode generates ANALYZE
>   commands for relations that are missing statistics, while the "all" mode
>   does the same thing that --analyze-in-stages does today.  While the
>   "missing" mode might be useful outside of upgrade cases, we could also
>   recommend it as a post-upgrade step if the extended statistics work
>   doesn't get committed for v18.
>
> * Add a new option that will make it easy to ANALYZE any relations that are
>   missing statistics, but don't make any breaking changes to existing
>   post-upgrade scripts.  This option isn't really strictly necessary if we
>   get the extended statistics parts committed, but it could be a nice
>   feature, anyway.
>
> I chose the second approach because it had the most support in the other
> thread, but I definitely wouldn't characterize it as a consensus.  0001
> simply refactors the main catalog query to its own function so that its
> results can be reused in later stages of --analyze-in-stages.  This might
> require a bit more memory and make --analyze-in-stages less responsive to
> concurrent changes, but it wasn't all that responsive to begin with.  0002
> adds the new "missing" mode functionality.  Note that it regenerates all
> statistics for a relation if any applicable statistics types are missing.
> It's not clear whether we can or should do any better than that.  Corey and
> I put a lot of effort into the catalog query changes, and we think we've
> covered everything, but we would of course appreciate some review on that
> part.
>
> BTW as long as the basic "missing" mode idea seems reasonable, it's easy
> enough to adjust the user interface to whatever we want, and I'm happy to
> do so as needed.
>
> Finally, I think another open question is whether any of this should apply
> to --analyze and/or --analyze-only.  We do recommend the latter as a
> post-upgrade step in our pg_upgrade documentation, and I could see the
> "missing" mode being useful on its own for these modes, too.
>
>Thoughts?


I've not closely reviewed the patches yet but I find that the second
approach is reasonable to me. IIUC if the extended statistics
import/export work gets committed for v18, we don't need a new option
for post-upgrade analyze. But as you mentioned, these new modes seem
useful for other use cases too. Given that it could be used outside of
post-upgrading, I think it would make more sense to apply the "all"
and "missing" modes to --analyze and --analyze-only options too.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: vacuumdb changes for stats import/export

Reply via email to