Re: vacuumdb --analyze-in-stages
On Fri, Oct 8, 2021, at 6:33 AM, nikolai.berkoff wrote: > Running --analyze-in-stages on a database with statistics causes the > statistics to become significantly worse before they are improved. This > could be made clearer in the documentation. I think the paragraph you modified *already* stated that usefulness of --analyze-in-stages (newly populated from a restored dump or by pg_upgrade). I don't think your change is an improvement for --analyze-in-stages description. If you read the previous paragraph, it explains _how_ (3 stages of ANALYZE with different settings) and _why_ (produce usable statistics faster). -- Euler Taveira EDB https://www.enterprisedb.com/
Re: ALTER TABLE ... SET DATA TYPE removes statistics
On Fri, Oct 8, 2021, at 6:03 AM, nikolai.berkoff wrote: > The current documentation does not mention that the column statistics are > removed which I can see they are in src/backend/commands/tablecmds.c > > ATExecAlterColumnType > /* > * Drop any pg_statistic entry for the column, since it's now wrong type > */ > RemoveStatistics(RelationGetRelid(rel), attnum); > > Although this might be obvious it tripped me up. For example renaming and > SET STATISTICS preserves statistics. Patch attached. I agree that it might surprise an user and it would be good to document it. However, it does not belong to the description. I would add it to the Notes section at the end of the ALTER TABLE page. -- Euler Taveira EDB https://www.enterprisedb.com/
Re: vacuumdb --analyze-in-stages
On 2021-Oct-18, Euler Taveira wrote: > On Fri, Oct 8, 2021, at 6:33 AM, nikolai.berkoff wrote: > > Running --analyze-in-stages on a database with statistics causes the > > statistics to become significantly worse before they are improved. > > This could be made clearer in the documentation. > > I think the paragraph you modified *already* stated that usefulness of > --analyze-in-stages (newly populated from a restored dump or by pg_upgrade). I > don't think your change is an improvement for --analyze-in-stages description. > If you read the previous paragraph, it explains _how_ (3 stages of ANALYZE > with > different settings) and _why_ (produce usable statistics faster). Yeah, but it doesn't say that the initial stats target (1) is the worst value you could possibly have. I wonder if it isn't better to *remove* the second phrase from this paragraph, and leave just this text: This option is useful to analyze a database that was newly populated from a restored dump or by pg_upgrade. to avoid giving the impression that it'll improve things if you run it in a database with existing statistics. Or we could proactively warn more explicitly about the problem: This option is useful to analyze a database that was newly populated from a restored dump or by pg_upgrade. Beware that running with this option in a database with existing statistics may cause query optimizer choices to become transiently worse, because of the very low statistics target that is used in the early stages. Or maybe just This option is useful to analyze a database that was newly populated from a restored dump or by pg_upgrade. Running with this option in a database with existing statistics is not recommended. Given that the first stage uses statistic target=1, running this option in a database with any stats at all is probably a bad idea. -- Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/
Re: vacuumdb --analyze-in-stages
On Mon, Oct 18, 2021 at 4:02 PM Alvaro Herrera wrote: > On 2021-Oct-18, Euler Taveira wrote: > > > On Fri, Oct 8, 2021, at 6:33 AM, nikolai.berkoff wrote: > > > Running --analyze-in-stages on a database with statistics causes the > > > statistics to become significantly worse before they are improved. > > > This could be made clearer in the documentation. > > > > I think the paragraph you modified *already* stated that usefulness of > > --analyze-in-stages (newly populated from a restored dump or by > pg_upgrade). I > > don't think your change is an improvement for --analyze-in-stages > description. > > If you read the previous paragraph, it explains _how_ (3 stages of > ANALYZE with > > different settings) and _why_ (produce usable statistics faster). > > Yeah, but it doesn't say that the initial stats target (1) is the worst > value you could possibly have. > [...] > Given that the first stage uses statistic target=1, running this option > in a database with any stats at all is probably a bad idea. > > Add the word "only"? This option is only useful to analyze a database... Beyond that maybe adjust the procedure description to include a comment that we don't actually skip tables that already have a higher statistics target than the current pass would apply. (can we do this?) "Run several (currently three) stages of analyze with different configuration settings, to produce usable statistics faster. The first of these stages will remove any existing statistics even if they use a larger statistic target configuration." David J.
Re: vacuumdb --analyze-in-stages
On 2021-Oct-18, David G. Johnston wrote: > On Mon, Oct 18, 2021 at 4:02 PM Alvaro Herrera > wrote: > > Given that the first stage uses statistic target=1, running this option > > in a database with any stats at all is probably a bad idea. > > Add the word "only"? > > This option is only useful to analyze a database... Maybe this is sufficient, since it would drive people away from trying to do anything else than help upgrades with it. > Beyond that maybe adjust the procedure description to include a comment > that we don't actually skip tables that already have a higher statistics > target than the current pass would apply. (can we do this?) Hmm, vacuumdb can certainly query the catalogs to see what we have and skip tables for which we have more than that number, perhaps with a query based on this number select tablename, max(coalesce(cardinality(histogram_bounds), cardinality(most_common_freqs))) - 1 from pg_stats group by tablename; (and of course *don't* skip tables in the final stage, since the stored stats could be obsolete.) > "Run several (currently three) stages of analyze with different > configuration settings, to produce usable statistics faster. The first of > these stages will remove any existing statistics even if they use a larger > statistic target configuration." .. yeah, this is another option. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Re: vacuumdb --analyze-in-stages
On Mon, Oct 18, 2021, at 8:53 PM, Alvaro Herrera wrote: > On 2021-Oct-18, David G. Johnston wrote: > > > On Mon, Oct 18, 2021 at 4:02 PM Alvaro Herrera > > wrote: > > > > Given that the first stage uses statistic target=1, running this option > > > in a database with any stats at all is probably a bad idea. > > > > Add the word "only"? > > > > This option is only useful to analyze a database... > > Maybe this is sufficient, since it would drive people away from trying > to do anything else than help upgrades with it. +1. I like your 2nd suggestion. "This option is only useful to analyze a database that was newly populated from a restored dump or by pg_upgrade. Beware that running with this option in a database with existing statistics may cause query optimizer choices to become transiently worse, because of the very low statistics target that is used in the early stages." > > > "Run several (currently three) stages of analyze with different > > configuration settings, to produce usable statistics faster. The first of > > these stages will remove any existing statistics even if they use a larger > > statistic target configuration." > > .. yeah, this is another option. We might include it too but I would suggest "replace" instead of "remove" because it seems there won't be statistics after the first stage. -- Euler Taveira EDB https://www.enterprisedb.com/
Re: ALTER TABLE ... SET DATA TYPE removes statistics
On Mon, Oct 18, 2021 at 05:15:59PM -0300, Euler Taveira wrote: > I agree that it might surprise an user and it would be good to document it. > However, it does not belong to the description. I would add it to the Notes > section at the end of the ALTER TABLE page. No objections to the suggested addition and the location of the addition (paragraph of SET DATA TYPE rather than "Notes"), but I think that the phrasing could be better: "The column's statistics are removed, hence a follow-up ANALYZE is suited to update the statistics to the new column type." -- Michael signature.asc Description: PGP signature