On 6/27/25 06:35, Zechman, Derek S wrote:
We recently performed an upgrade from pg14 (14.18) to pg16 (16.9) and
performed the analyze-in-stages post upgrade. It has been noticed that
some plans changed to use hash joins instead of nested loops. Further
investigation found it was because the parent table of partitioned
tables did not have stats. After running an ANALYZE on the parent
tables we got similar plan an execution times as before.
I have two questions
1 - Why does analyze-in-stages not analyze the parent tables?
2 – What happens if we do not run analyze-in-stages post upgrade and
just run an analyze?
It is spelled out in the docs:
https://www.postgresql.org/docs/current/pgupgrade.html
Emphasis added
"Using vacuumdb --all --analyze-only can efficiently generate such
statistics, and the use of --jobs can speed it up. Option
--analyze-in-stages can be used to generate **minimal statistics**
quickly. If vacuum_cost_delay is set to a non-zero value, this can be
overridden to speed up statistics generation using PGOPTIONS, e.g.,
PGOPTIONS='-c vacuum_cost_delay=0' vacuumdb ...."
and from here:
https://www.postgresql.org/docs/current/app-vacuumdb.html
"--analyze-in-stages
Only calculate statistics for use by the optimizer (no vacuum),
like --analyze-only. Run three stages of analyze; the first stage uses
the lowest possible statistics target (see default_statistics_target) to
produce usable statistics faster, and subsequent stages build the full
statistics.
This option is only useful to analyze a database that currently has
no statistics or has wholly incorrect ones, such as if it is newly
populated from a restored dump or by pg_upgrade. Be aware that running
with this option in a database with existing statistics may cause the
query optimizer choices to become transiently worse due to the low
statistics targets of the early stages.
"
Thanks,
Sean
--
Adrian Klaver
adrian.kla...@aklaver.com