On 2021-Oct-18, Euler Taveira wrote: > On Fri, Oct 8, 2021, at 6:33 AM, nikolai.berkoff wrote: > > Running --analyze-in-stages on a database with statistics causes the > > statistics to become significantly worse before they are improved. > > This could be made clearer in the documentation. > > I think the paragraph you modified *already* stated that usefulness of > --analyze-in-stages (newly populated from a restored dump or by pg_upgrade). I > don't think your change is an improvement for --analyze-in-stages description. > If you read the previous paragraph, it explains _how_ (3 stages of ANALYZE > with > different settings) and _why_ (produce usable statistics faster).
Yeah, but it doesn't say that the initial stats target (1) is the worst value you could possibly have. I wonder if it isn't better to *remove* the second phrase from this paragraph, and leave just this text: This option is useful to analyze a database that was newly populated from a restored dump or by <command>pg_upgrade</command>. to avoid giving the impression that it'll improve things if you run it in a database with existing statistics. Or we could proactively warn more explicitly about the problem: This option is useful to analyze a database that was newly populated from a restored dump or by <command>pg_upgrade</command>. Beware that running with this option in a database with existing statistics may cause query optimizer choices to become transiently worse, because of the very low statistics target that is used in the early stages. Or maybe just This option is useful to analyze a database that was newly populated from a restored dump or by <command>pg_upgrade</command>. Running with this option in a database with existing statistics is not recommended. Given that the first stage uses statistic target=1, running this option in a database with any stats at all is probably a bad idea. -- Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/