On Mon, Jan 06, 2025 at 03:27:18PM -0600, Nathan Bossart wrote: > On Mon, Dec 30, 2024 at 03:45:03PM -0500, Bruce Momjian wrote: >> On Mon, Dec 30, 2024 at 12:02:47PM -0800, Jeff Davis wrote: >>> I suggest that we make a new thread about the vacuumdb changes and >>> focus this thread and patch series on the pg_dump changes (and minor >>> flag adjustments to pg_upgrade). >>> >>> Unless you think that the pg_dump changes should block on the vacuumdb >>> changes? In which case please let me know because the pg_dump changes >>> are otherwise close to commit. >> >> I think that is a good idea. I don't see vacuumdb blocking this. > > +1, I've been reviewing the vacuumdb portion and am planning to start a new > thread in the near future. IIUC the bulk of the vacuumdb changes are > relatively noncontroversial, we just haven't reached consensus on the user > interface.
As promised, I'm starting a new thread for this. The original thread [0] has some preliminary discussion about the subject. As you may be aware, there is an ongoing effort to carry over statistics during pg_upgrade. Today, we encourage users to use vacuumdb to run ANALYZE on all relations after upgrading. There's even a special --analyze-in-stages option that fast-tracks an initial set of minimal statistics for this use-case. Once the statistics are carried over by pg_upgrade, there will be little need to do this, except for perhaps extended statistics if they aren't carried over. But there are patches in flight for that, too [1]. This thread is dedicated to figuring out what, if anything, to do about vacuumdb. I see the following general categories of options: * Do nothing. Other than updating our recommended guidance for post-upgrade analyzing, we'd leave vacuumdb alone. While this is certainly a simple option, it has a couple of key drawbacks. For one, anyone who doesn't see the new vacuumdb guidance may continue to do unnecessary post-upgrade analyzes. Also, if we don't get the extended statistics piece completed for v18, users will have to manually construct ANALYZE commands for those to run post-upgrade. * Add a breaking change so that users are forced to fix any outdated post-upgrade scripts. This is what the attached patches do. In short, they add a required parameter to --analyze-in-stages that can be set to either "all" or "missing". The new "missing" mode generates ANALYZE commands for relations that are missing statistics, while the "all" mode does the same thing that --analyze-in-stages does today. While the "missing" mode might be useful outside of upgrade cases, we could also recommend it as a post-upgrade step if the extended statistics work doesn't get committed for v18. * Add a new option that will make it easy to ANALYZE any relations that are missing statistics, but don't make any breaking changes to existing post-upgrade scripts. This option isn't really strictly necessary if we get the extended statistics parts committed, but it could be a nice feature, anyway. I chose the second approach because it had the most support in the other thread, but I definitely wouldn't characterize it as a consensus. 0001 simply refactors the main catalog query to its own function so that its results can be reused in later stages of --analyze-in-stages. This might require a bit more memory and make --analyze-in-stages less responsive to concurrent changes, but it wasn't all that responsive to begin with. 0002 adds the new "missing" mode functionality. Note that it regenerates all statistics for a relation if any applicable statistics types are missing. It's not clear whether we can or should do any better than that. Corey and I put a lot of effort into the catalog query changes, and we think we've covered everything, but we would of course appreciate some review on that part. BTW as long as the basic "missing" mode idea seems reasonable, it's easy enough to adjust the user interface to whatever we want, and I'm happy to do so as needed. Finally, I think another open question is whether any of this should apply to --analyze and/or --analyze-only. We do recommend the latter as a post-upgrade step in our pg_upgrade documentation, and I could see the "missing" mode being useful on its own for these modes, too. Thoughts? [0] https://postgr.es/m/CADkLM%3DfR7TwH0cLREQkf5_%3DKLcOYVxJw0Km0i5MpaWeuDwVo6g%40mail.gmail.com [1] https://postgr.es/m/CADkLM%3Ddpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho%3DqN3kX0Zg%40mail.gmail.com -- nathan
>From 99d653f59e3ed7a7f5809e6546f0514ea8f0beda Mon Sep 17 00:00:00 2001 From: Nathan Bossart <nat...@postgresql.org> Date: Fri, 24 Jan 2025 09:29:30 -0600 Subject: [PATCH v1 1/2] vacuumdb: Save catalog query results for --analyze-in-stages. Presently, each call to vacuum_one_database() for each stage of --analyze-in-stages mode performs the catalog query to retrieve the list of tables to process. A proposed follow-up commit would add a "missing only" feature to --analyze-in-stages, which requires us to save the results of the catalog query (since tables without statistics would have them after the first stage). This commit adds this ability via a new parameter for vacuum_one_database() that specifies either a previously-retrieved list to process or a place to store the results of the catalog query for later use. This commit also makes use of this new parameter for --analyze-in-stages. The trade-offs of this approach are increased memory usage and less responsiveness to concurrent catalog changes in later stages, neither of which is expected to bother anyone. Author: Corey Huinker --- src/bin/scripts/vacuumdb.c | 316 +++++++++++++++++++++---------------- 1 file changed, 180 insertions(+), 136 deletions(-) diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c index 3dc428674ae..88ceb42746d 100644 --- a/src/bin/scripts/vacuumdb.c +++ b/src/bin/scripts/vacuumdb.c @@ -62,10 +62,16 @@ typedef enum static VacObjFilter objfilter = OBJFILTER_NONE; +static SimpleStringList *retrieve_objects(PGconn *conn, + vacuumingOptions *vacopts, + SimpleStringList *objects, + bool echo); + static void vacuum_one_database(ConnParams *cparams, vacuumingOptions *vacopts, int stage, SimpleStringList *objects, + SimpleStringList **found_objs, int concurrentCons, const char *progname, bool echo, bool quiet); @@ -400,12 +406,13 @@ main(int argc, char *argv[]) if (analyze_in_stages) { int stage; + SimpleStringList *found_objs = NULL; for (stage = 0; stage < ANALYZE_NUM_STAGES; stage++) { vacuum_one_database(&cparams, &vacopts, stage, - &objects, + &objects, &found_objs, concurrentCons, progname, echo, quiet); } @@ -413,7 +420,7 @@ main(int argc, char *argv[]) else vacuum_one_database(&cparams, &vacopts, ANALYZE_NO_STAGE, - &objects, + &objects, NULL, concurrentCons, progname, echo, quiet); } @@ -461,8 +468,17 @@ escape_quotes(const char *src) /* * vacuum_one_database * - * Process tables in the given database. If the 'objects' list is empty, - * process all tables in the database. + * Process tables in the given database. + * + * 'found_objs' should be a fully qualified list of objects to process, as + * returned by a previous call to vacuum_one_database(). If *found_objs is + * NULL, it is set to the results of the catalog query discussed below. If + * found_objs is NULL, the results of the catalog query are not returned. + * + * If *found_objs is NULL, this function performs a catalog query to retrieve + * the list of tables to process. When 'objects' is NULL, all tables in the + * database are processed. Otherwise, the catalog query performs a lookup for + * the objects listed in 'objects'. * * Note that this function is only concerned with running exactly one stage * when in analyze-in-stages mode; caller must iterate on us if necessary. @@ -475,22 +491,18 @@ vacuum_one_database(ConnParams *cparams, vacuumingOptions *vacopts, int stage, SimpleStringList *objects, + SimpleStringList **found_objs, int concurrentCons, const char *progname, bool echo, bool quiet) { PQExpBufferData sql; - PQExpBufferData buf; - PQExpBufferData catalog_query; - PGresult *res; PGconn *conn; SimpleStringListCell *cell; ParallelSlotArray *sa; - SimpleStringList dbtables = {NULL, NULL}; - int i; - int ntups; + int ntups = 0; bool failed = false; - bool objects_listed = false; const char *initcmd; + SimpleStringList *ret = NULL; const char *stage_commands[] = { "SET default_statistics_target=1; SET vacuum_cost_delay=0;", "SET default_statistics_target=10; RESET vacuum_cost_delay;", @@ -587,19 +599,155 @@ vacuum_one_database(ConnParams *cparams, } /* - * Prepare the list of tables to process by querying the catalogs. - * - * Since we execute the constructed query with the default search_path - * (which could be unsafe), everything in this query MUST be fully - * qualified. - * - * First, build a WITH clause for the catalog query if any tables were - * specified, with a set of values made of relation names and their - * optional set of columns. This is used to match any provided column - * lists with the generated qualified identifiers and to filter for the - * tables provided via --table. If a listed table does not exist, the - * catalog query will fail. + * If the caller provided the results of a previous catalog query, just + * use that. Otherwise, run the catalog query ourselves and set the + * return variable if provided. + */ + if (found_objs && *found_objs) + ret = *found_objs; + else + { + ret = retrieve_objects(conn, vacopts, objects, echo); + if (found_objs) + *found_objs = ret; + } + + /* + * Count the number of objects in the catalog query result. If there are + * none, we are done. + */ + for (cell = ret ? ret->head : NULL; cell; cell = cell->next) + ntups++; + + if (ntups == 0) + { + PQfinish(conn); + return; + } + + /* + * Ensure concurrentCons is sane. If there are more connections than + * vacuumable relations, we don't need to use them all. + */ + if (concurrentCons > ntups) + concurrentCons = ntups; + if (concurrentCons <= 0) + concurrentCons = 1; + + /* + * All slots need to be prepared to run the appropriate analyze stage, if + * caller requested that mode. We have to prepare the initial connection + * ourselves before setting up the slots. + */ + if (stage == ANALYZE_NO_STAGE) + initcmd = NULL; + else + { + initcmd = stage_commands[stage]; + executeCommand(conn, initcmd, echo); + } + + /* + * Setup the database connections. We reuse the connection we already have + * for the first slot. If not in parallel mode, the first slot in the + * array contains the connection. */ + sa = ParallelSlotsSetup(concurrentCons, cparams, progname, echo, initcmd); + ParallelSlotsAdoptConn(sa, conn); + + initPQExpBuffer(&sql); + + cell = ret->head; + do + { + const char *tabname = cell->val; + ParallelSlot *free_slot; + + if (CancelRequested) + { + failed = true; + goto finish; + } + + free_slot = ParallelSlotsGetIdle(sa, NULL); + if (!free_slot) + { + failed = true; + goto finish; + } + + prepare_vacuum_command(&sql, PQserverVersion(free_slot->connection), + vacopts, tabname); + + /* + * Execute the vacuum. All errors are handled in processQueryResult + * through ParallelSlotsGetIdle. + */ + ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL); + run_vacuum_command(free_slot->connection, sql.data, + echo, tabname); + + cell = cell->next; + } while (cell != NULL); + + if (!ParallelSlotsWaitCompletion(sa)) + { + failed = true; + goto finish; + } + + /* If we used SKIP_DATABASE_STATS, mop up with ONLY_DATABASE_STATS */ + if (vacopts->skip_database_stats && stage == ANALYZE_NO_STAGE) + { + const char *cmd = "VACUUM (ONLY_DATABASE_STATS);"; + ParallelSlot *free_slot = ParallelSlotsGetIdle(sa, NULL); + + if (!free_slot) + { + failed = true; + goto finish; + } + + ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL); + run_vacuum_command(free_slot->connection, cmd, echo, NULL); + + if (!ParallelSlotsWaitCompletion(sa)) + failed = true; + } + +finish: + ParallelSlotsTerminate(sa); + pg_free(sa); + + termPQExpBuffer(&sql); + + if (failed) + exit(1); +} + +/* + * Prepare the list of tables to process by querying the catalogs. + * + * Since we execute the constructed query with the default search_path (which + * could be unsafe), everything in this query MUST be fully qualified. + * + * First, build a WITH clause for the catalog query if any tables were + * specified, with a set of values made of relation names and their optional + * set of columns. This is used to match any provided column lists with the + * generated qualified identifiers and to filter for the tables provided via + * --table. If a listed table does not exist, the catalog query will fail. + */ +static SimpleStringList * +retrieve_objects(PGconn *conn, vacuumingOptions *vacopts, + SimpleStringList *objects, bool echo) +{ + PQExpBufferData buf; + PQExpBufferData catalog_query; + PGresult *res; + SimpleStringListCell *cell; + SimpleStringList *found_objs = palloc0(sizeof(SimpleStringList)); + bool objects_listed = false; + initPQExpBuffer(&catalog_query); for (cell = objects ? objects->head : NULL; cell; cell = cell->next) { @@ -753,23 +901,12 @@ vacuum_one_database(ConnParams *cparams, termPQExpBuffer(&catalog_query); PQclear(executeQuery(conn, ALWAYS_SECURE_SEARCH_PATH_SQL, echo)); - /* - * If no rows are returned, there are no matching tables, so we are done. - */ - ntups = PQntuples(res); - if (ntups == 0) - { - PQclear(res); - PQfinish(conn); - return; - } - /* * Build qualified identifiers for each table, including the column list * if given. */ initPQExpBuffer(&buf); - for (i = 0; i < ntups; i++) + for (int i = 0; i < PQntuples(res); i++) { appendPQExpBufferStr(&buf, fmtQualifiedId(PQgetvalue(res, i, 1), @@ -778,110 +915,13 @@ vacuum_one_database(ConnParams *cparams, if (objects_listed && !PQgetisnull(res, i, 2)) appendPQExpBufferStr(&buf, PQgetvalue(res, i, 2)); - simple_string_list_append(&dbtables, buf.data); + simple_string_list_append(found_objs, buf.data); resetPQExpBuffer(&buf); } termPQExpBuffer(&buf); PQclear(res); - /* - * Ensure concurrentCons is sane. If there are more connections than - * vacuumable relations, we don't need to use them all. - */ - if (concurrentCons > ntups) - concurrentCons = ntups; - if (concurrentCons <= 0) - concurrentCons = 1; - - /* - * All slots need to be prepared to run the appropriate analyze stage, if - * caller requested that mode. We have to prepare the initial connection - * ourselves before setting up the slots. - */ - if (stage == ANALYZE_NO_STAGE) - initcmd = NULL; - else - { - initcmd = stage_commands[stage]; - executeCommand(conn, initcmd, echo); - } - - /* - * Setup the database connections. We reuse the connection we already have - * for the first slot. If not in parallel mode, the first slot in the - * array contains the connection. - */ - sa = ParallelSlotsSetup(concurrentCons, cparams, progname, echo, initcmd); - ParallelSlotsAdoptConn(sa, conn); - - initPQExpBuffer(&sql); - - cell = dbtables.head; - do - { - const char *tabname = cell->val; - ParallelSlot *free_slot; - - if (CancelRequested) - { - failed = true; - goto finish; - } - - free_slot = ParallelSlotsGetIdle(sa, NULL); - if (!free_slot) - { - failed = true; - goto finish; - } - - prepare_vacuum_command(&sql, PQserverVersion(free_slot->connection), - vacopts, tabname); - - /* - * Execute the vacuum. All errors are handled in processQueryResult - * through ParallelSlotsGetIdle. - */ - ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL); - run_vacuum_command(free_slot->connection, sql.data, - echo, tabname); - - cell = cell->next; - } while (cell != NULL); - - if (!ParallelSlotsWaitCompletion(sa)) - { - failed = true; - goto finish; - } - - /* If we used SKIP_DATABASE_STATS, mop up with ONLY_DATABASE_STATS */ - if (vacopts->skip_database_stats && stage == ANALYZE_NO_STAGE) - { - const char *cmd = "VACUUM (ONLY_DATABASE_STATS);"; - ParallelSlot *free_slot = ParallelSlotsGetIdle(sa, NULL); - - if (!free_slot) - { - failed = true; - goto finish; - } - - ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL); - run_vacuum_command(free_slot->connection, cmd, echo, NULL); - - if (!ParallelSlotsWaitCompletion(sa)) - failed = true; - } - -finish: - ParallelSlotsTerminate(sa); - pg_free(sa); - - termPQExpBuffer(&sql); - - if (failed) - exit(1); + return found_objs; } /* @@ -912,6 +952,10 @@ vacuum_all_databases(ConnParams *cparams, if (analyze_in_stages) { + SimpleStringList **found_objs; + + found_objs = palloc0(PQntuples(result) * sizeof(SimpleStringList *)); + /* * When analyzing all databases in stages, we analyze them all in the * fastest stage first, so that initial statistics become available @@ -928,7 +972,7 @@ vacuum_all_databases(ConnParams *cparams, vacuum_one_database(cparams, vacopts, stage, - objects, + objects, &found_objs[i], concurrentCons, progname, echo, quiet); } @@ -942,7 +986,7 @@ vacuum_all_databases(ConnParams *cparams, vacuum_one_database(cparams, vacopts, ANALYZE_NO_STAGE, - objects, + objects, NULL, concurrentCons, progname, echo, quiet); } -- 2.39.5 (Apple Git-154)
>From 320c0491dea628a0b56926563c94fa7044eb191c Mon Sep 17 00:00:00 2001 From: Nathan Bossart <nat...@postgresql.org> Date: Fri, 24 Jan 2025 09:30:14 -0600 Subject: [PATCH v1 2/2] vacuumdb: Allow analyzing-in-stages only relations missing stats. This commit adds a required argument to --analyze-in-stages that accepts either "missing" or "all". "all" provides the same behavior that --analyze-in-stages has today. "missing" can be used to instead analyze only relations that are missing statistics. This new argument is intentionally marked as required to break any existing post-pg_upgrade vacuumdb scripts. Since stats are going to be carried over by pg_upgrade to v18 or later, running --analyze-in-stages=all is unnecessary for the vast majority of users and is likely to even hurt performance post-upgrade. XXX: I think the above paragraph changes depending on whether extended statistics are carried over by pg_upgrade. If they aren't, we should probably recommend --analyze-in-stages=missing. If they are, then we probably shouldn't recommend anything at all. Does this also change how we structure the user-facing part of this change? Note that the new "missing" mode will generate ANALYZE commands for a relation if it is missing any statistics it should ordinarily have. For example, if a table has statistics for one column but not another, we will analyze the whole table. A similar principle applies to extended statistics, expression indexes, table inheritance, and partitioned tables. XXX: Do we need to make a similar change to --analyze-only and/or --analyze (e.g., a new --missing-stats-only option)? Author: Corey Huinker --- doc/src/sgml/ref/vacuumdb.sgml | 12 +++- src/bin/scripts/t/102_vacuumdb_stages.pl | 64 ++++++++++++++++++++- src/bin/scripts/vacuumdb.c | 73 +++++++++++++++++++++++- src/test/perl/PostgreSQL/Test/Cluster.pm | 27 +++++++++ 4 files changed, 171 insertions(+), 5 deletions(-) diff --git a/doc/src/sgml/ref/vacuumdb.sgml b/doc/src/sgml/ref/vacuumdb.sgml index 66fccb30a2d..3ef8a52a163 100644 --- a/doc/src/sgml/ref/vacuumdb.sgml +++ b/doc/src/sgml/ref/vacuumdb.sgml @@ -429,7 +429,7 @@ PostgreSQL documentation </varlistentry> <varlistentry> - <term><option>--analyze-in-stages</option></term> + <term><option>--analyze-in-stages=<replaceable class="parameter">analyze_in_stages_option</replaceable></option></term> <listitem> <para> Only calculate statistics for use by the optimizer (no vacuum), @@ -441,6 +441,8 @@ PostgreSQL documentation </para> <para> + When set to <literal>all</literal>, vacuumdb will calculate statistics + for all relations, even relations with existing statistics. This option is only useful to analyze a database that currently has no statistics or has wholly incorrect ones, such as if it is newly populated from a restored dump or by <command>pg_upgrade</command>. @@ -449,6 +451,14 @@ PostgreSQL documentation transiently worse due to the low statistics targets of the early stages. </para> + + <para> + When set to <literal>missing</literal>, vacuumdb will calculate + statistics only for relations that are missing statistics for a column, + index expression, or extended statistics object. This option prevents + vacuumdb from deleting existing statistics so that the query + optimizer's choices do not become transiently worse. + </para> </listitem> </varlistentry> diff --git a/src/bin/scripts/t/102_vacuumdb_stages.pl b/src/bin/scripts/t/102_vacuumdb_stages.pl index 984c8d06de6..0f560accb46 100644 --- a/src/bin/scripts/t/102_vacuumdb_stages.pl +++ b/src/bin/scripts/t/102_vacuumdb_stages.pl @@ -12,7 +12,7 @@ $node->init; $node->start; $node->issues_sql_like( - [ 'vacuumdb', '--analyze-in-stages', 'postgres' ], + [ 'vacuumdb', '--analyze-in-stages', 'all', 'postgres' ], qr/statement:\ SET\ default_statistics_target=1;\ SET\ vacuum_cost_delay=0; .*statement:\ ANALYZE .*statement:\ SET\ default_statistics_target=10;\ RESET\ vacuum_cost_delay; @@ -21,8 +21,68 @@ $node->issues_sql_like( .*statement:\ ANALYZE/sx, 'analyze three times'); +$node->safe_psql('postgres', + 'CREATE TABLE regression_vacuumdb_test AS select generate_series(1, 10) a, generate_series(2, 11) b;'); $node->issues_sql_like( - [ 'vacuumdb', '--analyze-in-stages', '--all' ], + [ 'vacuumdb', '--analyze-in-stages', 'missing', '-t', 'regression_vacuumdb_test', 'postgres' ], + qr/statement:\ ANALYZE/sx, + '--analyze-in-stages=missing with missing stats'); +$node->issues_sql_unlike( + [ 'vacuumdb', '--analyze-in-stages', 'missing', '-t', 'regression_vacuumdb_test', 'postgres' ], + qr/statement:\ ANALYZE/sx, + '--analyze-in-stages=missing with no missing stats'); + +$node->safe_psql('postgres', + 'CREATE INDEX regression_vacuumdb_test_idx ON regression_vacuumdb_test (mod(a, 2));'); +$node->issues_sql_like( + [ 'vacuumdb', '--analyze-in-stages', 'missing', '-t', 'regression_vacuumdb_test', 'postgres' ], + qr/statement:\ ANALYZE/sx, + '--analyze-in-stages=missing with missing index expression stats'); +$node->issues_sql_unlike( + [ 'vacuumdb', '--analyze-in-stages', 'missing', '-t', 'regression_vacuumdb_test', 'postgres' ], + qr/statement:\ ANALYZE/sx, + '--analyze-in-stages=missing with no missing index expression stats'); + +$node->safe_psql('postgres', + 'CREATE STATISTICS regression_vacuumdb_test_stat ON a, b FROM regression_vacuumdb_test;'); +$node->issues_sql_like( + [ 'vacuumdb', '--analyze-in-stages', 'missing', '-t', 'regression_vacuumdb_test', 'postgres' ], + qr/statement:\ ANALYZE/sx, + '--analyze-in-stages=missing with missing extended stats'); +$node->issues_sql_unlike( + [ 'vacuumdb', '--analyze-in-stages', 'missing', '-t', 'regression_vacuumdb_test', 'postgres' ], + qr/statement:\ ANALYZE/sx, + '--analyze-in-stages=missing with no missing extended stats'); + +$node->safe_psql('postgres', + "CREATE TABLE regression_vacuumdb_child (a INT) INHERITS (regression_vacuumdb_test);\n" + . "INSERT INTO regression_vacuumdb_child VALUES (1, 2);\n" + . "ANALYZE regression_vacuumdb_child;\n"); +$node->issues_sql_like( + [ 'vacuumdb', '--analyze-in-stages', 'missing', '-t', 'regression_vacuumdb_test', 'postgres' ], + qr/statement:\ ANALYZE/sx, + '--analyze-in-stages=missing with missing inherited stats'); +$node->issues_sql_unlike( + [ 'vacuumdb', '--analyze-in-stages', 'missing', '-t', 'regression_vacuumdb_test', 'postgres' ], + qr/statement:\ ANALYZE/sx, + '--analyze-in-stages=missing with no missing inherited stats'); + +$node->safe_psql('postgres', + "CREATE TABLE regression_vacuumdb_parted (a INT) PARTITION BY LIST (a);\n" + . "CREATE TABLE regression_vacuumdb_part1 PARTITION OF regression_vacuumdb_parted FOR VALUES IN (1);\n" + . "INSERT INTO regression_vacuumdb_parted VALUES (1);\n" + . "ANALYZE regression_vacuumdb_part1;\n"); +$node->issues_sql_like( + [ 'vacuumdb', '--analyze-in-stages', 'missing', '-t', 'regression_vacuumdb_parted', 'postgres' ], + qr/statement:\ ANALYZE/sx, + '--analyze-in-stages=missing with missing partition stats'); +$node->issues_sql_unlike( + [ 'vacuumdb', '--analyze-in-stages', 'missing', '-t', 'regression_vacuumdb_parted', 'postgres' ], + qr/statement:\ ANALYZE/sx, + '--analyze-in-stages=missing with no missing partition stats'); + +$node->issues_sql_like( + [ 'vacuumdb', '--analyze-in-stages', 'all', '--all' ], qr/statement:\ SET\ default_statistics_target=1;\ SET\ vacuum_cost_delay=0; .*statement:\ ANALYZE .*statement:\ SET\ default_statistics_target=1;\ SET\ vacuum_cost_delay=0; diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c index 88ceb42746d..ddc01e9d723 100644 --- a/src/bin/scripts/vacuumdb.c +++ b/src/bin/scripts/vacuumdb.c @@ -61,6 +61,7 @@ typedef enum } VacObjFilter; static VacObjFilter objfilter = OBJFILTER_NONE; +static bool missing_only; static SimpleStringList *retrieve_objects(PGconn *conn, vacuumingOptions *vacopts, @@ -123,7 +124,7 @@ main(int argc, char *argv[]) {"schema", required_argument, NULL, 'n'}, {"exclude-schema", required_argument, NULL, 'N'}, {"maintenance-db", required_argument, NULL, 2}, - {"analyze-in-stages", no_argument, NULL, 3}, + {"analyze-in-stages", required_argument, NULL, 3}, {"disable-page-skipping", no_argument, NULL, 4}, {"skip-locked", no_argument, NULL, 5}, {"min-xid-age", required_argument, NULL, 6}, @@ -246,6 +247,16 @@ main(int argc, char *argv[]) break; case 3: analyze_in_stages = vacopts.analyze_only = true; + if (strcmp(optarg, "all") == 0) + missing_only = false; + else if (strcmp(optarg, "missing") == 0) + missing_only = true; + else + { + pg_log_error("unrecognized --analyze-in-stages option: %s", + optarg); + exit(1); + } break; case 4: vacopts.disable_page_skipping = true; @@ -808,6 +819,7 @@ retrieve_objects(PGconn *conn, vacuumingOptions *vacopts, " FROM pg_catalog.pg_class c\n" " JOIN pg_catalog.pg_namespace ns" " ON c.relnamespace OPERATOR(pg_catalog.=) ns.oid\n" + " CROSS JOIN LATERAL (SELECT c.relkind IN ('p', 'I')) as p (inherited)\n" " LEFT JOIN pg_catalog.pg_class t" " ON c.reltoastrelid OPERATOR(pg_catalog.=) t.oid\n"); @@ -891,6 +903,63 @@ retrieve_objects(PGconn *conn, vacuumingOptions *vacopts, vacopts->min_mxid_age); } + if (missing_only) + { + appendPQExpBufferStr(&catalog_query, " AND (\n"); + + /* regular stats */ + appendPQExpBufferStr(&catalog_query, + " EXISTS (SELECT NULL FROM pg_catalog.pg_attribute a\n" + " WHERE a.attrelid OPERATOR(pg_catalog.=) c.oid\n" + " AND c.reltuples OPERATOR(pg_catalog.!=) 0::pg_catalog.float4\n" + " AND a.attnum OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n" + " AND NOT a.attisdropped\n" + " AND a.attstattarget IS DISTINCT FROM 0::pg_catalog.int2\n" + " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n" + " WHERE s.starelid OPERATOR(pg_catalog.=) a.attrelid\n" + " AND s.staattnum OPERATOR(pg_catalog.=) a.attnum\n" + " AND s.stainherit OPERATOR(pg_catalog.=) p.inherited))\n"); + + /* extended stats */ + appendPQExpBufferStr(&catalog_query, + " OR EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext e\n" + " WHERE e.stxrelid OPERATOR(pg_catalog.=) c.oid\n" + " AND c.reltuples OPERATOR(pg_catalog.!=) 0::pg_catalog.float4\n" + " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext_data d\n" + " WHERE d.stxoid OPERATOR(pg_catalog.=) e.oid))\n"); + + /* expression indexes */ + appendPQExpBufferStr(&catalog_query, + " OR EXISTS (SELECT NULL FROM pg_catalog.pg_index i\n" + " CROSS JOIN LATERAL pg_catalog.unnest(i.indkey) WITH ORDINALITY u (attnum, ord)\n" + " WHERE i.indrelid OPERATOR(pg_catalog.=) c.oid\n" + " AND c.reltuples OPERATOR(pg_catalog.!=) 0::pg_catalog.float4\n" + " AND i.indexprs IS NOT NULL\n" + " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n" + " WHERE s.starelid OPERATOR(pg_catalog.=) i.indexrelid\n" + " AND s.staattnum OPERATOR(pg_catalog.=) u.ord\n" + " AND s.stainherit OPERATOR(pg_catalog.=) p.inherited))\n"); + + /* table inheritance */ + appendPQExpBufferStr(&catalog_query, + " OR EXISTS (SELECT NULL FROM pg_catalog.pg_attribute a\n" + " WHERE a.attrelid OPERATOR(pg_catalog.=) c.oid\n" + " AND c.reltuples OPERATOR(pg_catalog.!=) 0::pg_catalog.float4\n" + " AND a.attnum OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n" + " AND NOT a.attisdropped\n" + " AND a.attstattarget IS DISTINCT FROM 0::pg_catalog.int2\n" + " AND c.relhassubclass\n" + " AND NOT p.inherited\n" + " AND EXISTS (SELECT NULL FROM pg_catalog.pg_inherits h\n" + " WHERE h.inhparent OPERATOR(pg_catalog.=) c.oid)\n" + " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n" + " WHERE s.starelid OPERATOR(pg_catalog.=) a.attrelid\n" + " AND s.staattnum OPERATOR(pg_catalog.=) a.attnum\n" + " AND s.stainherit))\n"); + + appendPQExpBufferStr(&catalog_query, " )\n"); + } + /* * Execute the catalog query. We use the default search_path for this * query for consistency with table lookups done elsewhere by the user. @@ -1226,7 +1295,7 @@ help(const char *progname) printf(_(" -V, --version output version information, then exit\n")); printf(_(" -z, --analyze update optimizer statistics\n")); printf(_(" -Z, --analyze-only only update optimizer statistics; no vacuum\n")); - printf(_(" --analyze-in-stages only update optimizer statistics, in multiple\n" + printf(_(" --analyze-in-stages=OPTION only update optimizer statistics, in multiple\n" " stages for faster results; no vacuum\n")); printf(_(" -?, --help show this help, then exit\n")); printf(_("\nConnection options:\n")); diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm index f521ad0b12f..d30d97aa96c 100644 --- a/src/test/perl/PostgreSQL/Test/Cluster.pm +++ b/src/test/perl/PostgreSQL/Test/Cluster.pm @@ -2802,6 +2802,33 @@ sub issues_sql_like =pod +=item $node->issues_sql_unlike(cmd, unexpected_sql, test_name) + +Run a command on the node, then verify that $unexpected_sql does not appear in +the server log file. + +=cut + +sub issues_sql_unlike +{ + local $Test::Builder::Level = $Test::Builder::Level + 1; + + my ($self, $cmd, $unexpected_sql, $test_name) = @_; + + local %ENV = $self->_get_env(); + + my $log_location = -s $self->logfile; + + my $result = PostgreSQL::Test::Utils::run_log($cmd); + ok($result, "@$cmd exit code 0"); + my $log = + PostgreSQL::Test::Utils::slurp_file($self->logfile, $log_location); + unlike($log, $unexpected_sql, "$test_name: SQL not found in server log"); + return; +} + +=pod + =item $node->log_content() Returns the contents of log of the node -- 2.39.5 (Apple Git-154)