vacuumdb changes for stats import/export

Nathan Bossart Fri, 24 Jan 2025 07:45:13 -0800

On Mon, Jan 06, 2025 at 03:27:18PM -0600, Nathan Bossart wrote:
> On Mon, Dec 30, 2024 at 03:45:03PM -0500, Bruce Momjian wrote:
>> On Mon, Dec 30, 2024 at 12:02:47PM -0800, Jeff Davis wrote:
>>> I suggest that we make a new thread about the vacuumdb changes and
>>> focus this thread and patch series on the pg_dump changes (and minor
>>> flag adjustments to pg_upgrade).
>>> 
>>> Unless you think that the pg_dump changes should block on the vacuumdb
>>> changes? In which case please let me know because the pg_dump changes
>>> are otherwise close to commit.
>> 
>> I think that is a good idea.  I don't see vacuumdb blocking this.
> 
> +1, I've been reviewing the vacuumdb portion and am planning to start a new
> thread in the near future.  IIUC the bulk of the vacuumdb changes are
> relatively noncontroversial, we just haven't reached consensus on the user
> interface.

As promised, I'm starting a new thread for this. The original thread [0]
has some preliminary discussion about the subject.

As you may be aware, there is an ongoing effort to carry over statistics
during pg_upgrade. Today, we encourage users to use vacuumdb to run
ANALYZE on all relations after upgrading. There's even a special
--analyze-in-stages option that fast-tracks an initial set of minimal
statistics for this use-case. Once the statistics are carried over by
pg_upgrade, there will be little need to do this, except for perhaps
extended statistics if they aren't carried over. But there are patches in
flight for that, too [1].

This thread is dedicated to figuring out what, if anything, to do about
vacuumdb. I see the following general categories of options:

* Do nothing. Other than updating our recommended guidance for
post-upgrade analyzing, we'd leave vacuumdb alone. While this is
certainly a simple option, it has a couple of key drawbacks. For one,
anyone who doesn't see the new vacuumdb guidance may continue to do
unnecessary post-upgrade analyzes. Also, if we don't get the extended
statistics piece completed for v18, users will have to manually construct
ANALYZE commands for those to run post-upgrade.

* Add a breaking change so that users are forced to fix any outdated
post-upgrade scripts. This is what the attached patches do. In short,
they add a required parameter to --analyze-in-stages that can be set to
either "all" or "missing". The new "missing" mode generates ANALYZE
commands for relations that are missing statistics, while the "all" mode
does the same thing that --analyze-in-stages does today. While the
"missing" mode might be useful outside of upgrade cases, we could also
recommend it as a post-upgrade step if the extended statistics work
doesn't get committed for v18.

* Add a new option that will make it easy to ANALYZE any relations that are
missing statistics, but don't make any breaking changes to existing
post-upgrade scripts. This option isn't really strictly necessary if we
get the extended statistics parts committed, but it could be a nice
feature, anyway.

I chose the second approach because it had the most support in the other
thread, but I definitely wouldn't characterize it as a consensus. 0001
simply refactors the main catalog query to its own function so that its
results can be reused in later stages of --analyze-in-stages. This might
require a bit more memory and make --analyze-in-stages less responsive to
concurrent changes, but it wasn't all that responsive to begin with. 0002
adds the new "missing" mode functionality. Note that it regenerates all
statistics for a relation if any applicable statistics types are missing.
It's not clear whether we can or should do any better than that. Corey and
I put a lot of effort into the catalog query changes, and we think we've
covered everything, but we would of course appreciate some review on that
part.

BTW as long as the basic "missing" mode idea seems reasonable, it's easy
enough to adjust the user interface to whatever we want, and I'm happy to
do so as needed.

Finally, I think another open question is whether any of this should apply
to --analyze and/or --analyze-only. We do recommend the latter as a
post-upgrade step in our pg_upgrade documentation, and I could see the
"missing" mode being useful on its own for these modes, too.

Thoughts?

[0]
https://postgr.es/m/CADkLM%3DfR7TwH0cLREQkf5_%3DKLcOYVxJw0Km0i5MpaWeuDwVo6g%40mail.gmail.com
[1]
https://postgr.es/m/CADkLM%3Ddpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho%3DqN3kX0Zg%40mail.gmail.com

--
nathan

>From 99d653f59e3ed7a7f5809e6546f0514ea8f0beda Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nat...@postgresql.org>
Date: Fri, 24 Jan 2025 09:29:30 -0600
Subject: [PATCH v1 1/2] vacuumdb: Save catalog query results for
 --analyze-in-stages.

Presently, each call to vacuum_one_database() for each stage of
--analyze-in-stages mode performs the catalog query to retrieve the
list of tables to process.  A proposed follow-up commit would add a
"missing only" feature to --analyze-in-stages, which requires us to
save the results of the catalog query (since tables without
statistics would have them after the first stage).  This commit
adds this ability via a new parameter for vacuum_one_database()
that specifies either a previously-retrieved list to process or a
place to store the results of the catalog query for later use.
This commit also makes use of this new parameter for
--analyze-in-stages.

The trade-offs of this approach are increased memory usage and less
responsiveness to concurrent catalog changes in later stages,
neither of which is expected to bother anyone.

Author: Corey Huinker
---
 src/bin/scripts/vacuumdb.c | 316 +++++++++++++++++++++----------------
 1 file changed, 180 insertions(+), 136 deletions(-)

diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index 3dc428674ae..88ceb42746d 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -62,10 +62,16 @@ typedef enum
 
 static VacObjFilter objfilter = OBJFILTER_NONE;
 
+static SimpleStringList *retrieve_objects(PGconn *conn,
+                                                                               
  vacuumingOptions *vacopts,
+                                                                               
  SimpleStringList *objects,
+                                                                               
  bool echo);
+
 static void vacuum_one_database(ConnParams *cparams,
                                                                
vacuumingOptions *vacopts,
                                                                int stage,
                                                                
SimpleStringList *objects,
+                                                               
SimpleStringList **found_objs,
                                                                int 
concurrentCons,
                                                                const char 
*progname, bool echo, bool quiet);
 
@@ -400,12 +406,13 @@ main(int argc, char *argv[])
                if (analyze_in_stages)
                {
                        int                     stage;
+                       SimpleStringList *found_objs = NULL;
 
                        for (stage = 0; stage < ANALYZE_NUM_STAGES; stage++)
                        {
                                vacuum_one_database(&cparams, &vacopts,
                                                                        stage,
-                                                                       
&objects,
+                                                                       
&objects, &found_objs,
                                                                        
concurrentCons,
                                                                        
progname, echo, quiet);
                        }
@@ -413,7 +420,7 @@ main(int argc, char *argv[])
                else
                        vacuum_one_database(&cparams, &vacopts,
                                                                
ANALYZE_NO_STAGE,
-                                                               &objects,
+                                                               &objects, NULL,
                                                                concurrentCons,
                                                                progname, echo, 
quiet);
        }
@@ -461,8 +468,17 @@ escape_quotes(const char *src)
 /*
  * vacuum_one_database
  *
- * Process tables in the given database.  If the 'objects' list is empty,
- * process all tables in the database.
+ * Process tables in the given database.
+ *
+ * 'found_objs' should be a fully qualified list of objects to process, as
+ * returned by a previous call to vacuum_one_database().  If *found_objs is
+ * NULL, it is set to the results of the catalog query discussed below.  If
+ * found_objs is NULL, the results of the catalog query are not returned.
+ *
+ * If *found_objs is NULL, this function performs a catalog query to retrieve
+ * the list of tables to process.  When 'objects' is NULL, all tables in the
+ * database are processed.  Otherwise, the catalog query performs a lookup for
+ * the objects listed in 'objects'.
  *
  * Note that this function is only concerned with running exactly one stage
  * when in analyze-in-stages mode; caller must iterate on us if necessary.
@@ -475,22 +491,18 @@ vacuum_one_database(ConnParams *cparams,
                                        vacuumingOptions *vacopts,
                                        int stage,
                                        SimpleStringList *objects,
+                                       SimpleStringList **found_objs,
                                        int concurrentCons,
                                        const char *progname, bool echo, bool 
quiet)
 {
        PQExpBufferData sql;
-       PQExpBufferData buf;
-       PQExpBufferData catalog_query;
-       PGresult   *res;
        PGconn     *conn;
        SimpleStringListCell *cell;
        ParallelSlotArray *sa;
-       SimpleStringList dbtables = {NULL, NULL};
-       int                     i;
-       int                     ntups;
+       int                     ntups = 0;
        bool            failed = false;
-       bool            objects_listed = false;
        const char *initcmd;
+       SimpleStringList *ret = NULL;
        const char *stage_commands[] = {
                "SET default_statistics_target=1; SET vacuum_cost_delay=0;",
                "SET default_statistics_target=10; RESET vacuum_cost_delay;",
@@ -587,19 +599,155 @@ vacuum_one_database(ConnParams *cparams,
        }
 
        /*
-        * Prepare the list of tables to process by querying the catalogs.
-        *
-        * Since we execute the constructed query with the default search_path
-        * (which could be unsafe), everything in this query MUST be fully
-        * qualified.
-        *
-        * First, build a WITH clause for the catalog query if any tables were
-        * specified, with a set of values made of relation names and their
-        * optional set of columns.  This is used to match any provided column
-        * lists with the generated qualified identifiers and to filter for the
-        * tables provided via --table.  If a listed table does not exist, the
-        * catalog query will fail.
+        * If the caller provided the results of a previous catalog query, just
+        * use that.  Otherwise, run the catalog query ourselves and set the
+        * return variable if provided.
+        */
+       if (found_objs && *found_objs)
+               ret = *found_objs;
+       else
+       {
+               ret = retrieve_objects(conn, vacopts, objects, echo);
+               if (found_objs)
+                       *found_objs = ret;
+       }
+
+       /*
+        * Count the number of objects in the catalog query result.  If there 
are
+        * none, we are done.
+        */
+       for (cell = ret ? ret->head : NULL; cell; cell = cell->next)
+               ntups++;
+
+       if (ntups == 0)
+       {
+               PQfinish(conn);
+               return;
+       }
+
+       /*
+        * Ensure concurrentCons is sane.  If there are more connections than
+        * vacuumable relations, we don't need to use them all.
+        */
+       if (concurrentCons > ntups)
+               concurrentCons = ntups;
+       if (concurrentCons <= 0)
+               concurrentCons = 1;
+
+       /*
+        * All slots need to be prepared to run the appropriate analyze stage, 
if
+        * caller requested that mode.  We have to prepare the initial 
connection
+        * ourselves before setting up the slots.
+        */
+       if (stage == ANALYZE_NO_STAGE)
+               initcmd = NULL;
+       else
+       {
+               initcmd = stage_commands[stage];
+               executeCommand(conn, initcmd, echo);
+       }
+
+       /*
+        * Setup the database connections. We reuse the connection we already 
have
+        * for the first slot.  If not in parallel mode, the first slot in the
+        * array contains the connection.
         */
+       sa = ParallelSlotsSetup(concurrentCons, cparams, progname, echo, 
initcmd);
+       ParallelSlotsAdoptConn(sa, conn);
+
+       initPQExpBuffer(&sql);
+
+       cell = ret->head;
+       do
+       {
+               const char *tabname = cell->val;
+               ParallelSlot *free_slot;
+
+               if (CancelRequested)
+               {
+                       failed = true;
+                       goto finish;
+               }
+
+               free_slot = ParallelSlotsGetIdle(sa, NULL);
+               if (!free_slot)
+               {
+                       failed = true;
+                       goto finish;
+               }
+
+               prepare_vacuum_command(&sql, 
PQserverVersion(free_slot->connection),
+                                                          vacopts, tabname);
+
+               /*
+                * Execute the vacuum.  All errors are handled in 
processQueryResult
+                * through ParallelSlotsGetIdle.
+                */
+               ParallelSlotSetHandler(free_slot, TableCommandResultHandler, 
NULL);
+               run_vacuum_command(free_slot->connection, sql.data,
+                                                  echo, tabname);
+
+               cell = cell->next;
+       } while (cell != NULL);
+
+       if (!ParallelSlotsWaitCompletion(sa))
+       {
+               failed = true;
+               goto finish;
+       }
+
+       /* If we used SKIP_DATABASE_STATS, mop up with ONLY_DATABASE_STATS */
+       if (vacopts->skip_database_stats && stage == ANALYZE_NO_STAGE)
+       {
+               const char *cmd = "VACUUM (ONLY_DATABASE_STATS);";
+               ParallelSlot *free_slot = ParallelSlotsGetIdle(sa, NULL);
+
+               if (!free_slot)
+               {
+                       failed = true;
+                       goto finish;
+               }
+
+               ParallelSlotSetHandler(free_slot, TableCommandResultHandler, 
NULL);
+               run_vacuum_command(free_slot->connection, cmd, echo, NULL);
+
+               if (!ParallelSlotsWaitCompletion(sa))
+                       failed = true;
+       }
+
+finish:
+       ParallelSlotsTerminate(sa);
+       pg_free(sa);
+
+       termPQExpBuffer(&sql);
+
+       if (failed)
+               exit(1);
+}
+
+/*
+ * Prepare the list of tables to process by querying the catalogs.
+ *
+ * Since we execute the constructed query with the default search_path (which
+ * could be unsafe), everything in this query MUST be fully qualified.
+ *
+ * First, build a WITH clause for the catalog query if any tables were
+ * specified, with a set of values made of relation names and their optional
+ * set of columns.  This is used to match any provided column lists with the
+ * generated qualified identifiers and to filter for the tables provided via
+ * --table.  If a listed table does not exist, the catalog query will fail.
+ */
+static SimpleStringList *
+retrieve_objects(PGconn *conn, vacuumingOptions *vacopts,
+                                SimpleStringList *objects, bool echo)
+{
+       PQExpBufferData buf;
+       PQExpBufferData catalog_query;
+       PGresult   *res;
+       SimpleStringListCell *cell;
+       SimpleStringList *found_objs = palloc0(sizeof(SimpleStringList));
+       bool            objects_listed = false;
+
        initPQExpBuffer(&catalog_query);
        for (cell = objects ? objects->head : NULL; cell; cell = cell->next)
        {
@@ -753,23 +901,12 @@ vacuum_one_database(ConnParams *cparams,
        termPQExpBuffer(&catalog_query);
        PQclear(executeQuery(conn, ALWAYS_SECURE_SEARCH_PATH_SQL, echo));
 
-       /*
-        * If no rows are returned, there are no matching tables, so we are 
done.
-        */
-       ntups = PQntuples(res);
-       if (ntups == 0)
-       {
-               PQclear(res);
-               PQfinish(conn);
-               return;
-       }
-
        /*
         * Build qualified identifiers for each table, including the column list
         * if given.
         */
        initPQExpBuffer(&buf);
-       for (i = 0; i < ntups; i++)
+       for (int i = 0; i < PQntuples(res); i++)
        {
                appendPQExpBufferStr(&buf,
                                                         
fmtQualifiedId(PQgetvalue(res, i, 1),
@@ -778,110 +915,13 @@ vacuum_one_database(ConnParams *cparams,
                if (objects_listed && !PQgetisnull(res, i, 2))
                        appendPQExpBufferStr(&buf, PQgetvalue(res, i, 2));
 
-               simple_string_list_append(&dbtables, buf.data);
+               simple_string_list_append(found_objs, buf.data);
                resetPQExpBuffer(&buf);
        }
        termPQExpBuffer(&buf);
        PQclear(res);
 
-       /*
-        * Ensure concurrentCons is sane.  If there are more connections than
-        * vacuumable relations, we don't need to use them all.
-        */
-       if (concurrentCons > ntups)
-               concurrentCons = ntups;
-       if (concurrentCons <= 0)
-               concurrentCons = 1;
-
-       /*
-        * All slots need to be prepared to run the appropriate analyze stage, 
if
-        * caller requested that mode.  We have to prepare the initial 
connection
-        * ourselves before setting up the slots.
-        */
-       if (stage == ANALYZE_NO_STAGE)
-               initcmd = NULL;
-       else
-       {
-               initcmd = stage_commands[stage];
-               executeCommand(conn, initcmd, echo);
-       }
-
-       /*
-        * Setup the database connections. We reuse the connection we already 
have
-        * for the first slot.  If not in parallel mode, the first slot in the
-        * array contains the connection.
-        */
-       sa = ParallelSlotsSetup(concurrentCons, cparams, progname, echo, 
initcmd);
-       ParallelSlotsAdoptConn(sa, conn);
-
-       initPQExpBuffer(&sql);
-
-       cell = dbtables.head;
-       do
-       {
-               const char *tabname = cell->val;
-               ParallelSlot *free_slot;
-
-               if (CancelRequested)
-               {
-                       failed = true;
-                       goto finish;
-               }
-
-               free_slot = ParallelSlotsGetIdle(sa, NULL);
-               if (!free_slot)
-               {
-                       failed = true;
-                       goto finish;
-               }
-
-               prepare_vacuum_command(&sql, 
PQserverVersion(free_slot->connection),
-                                                          vacopts, tabname);
-
-               /*
-                * Execute the vacuum.  All errors are handled in 
processQueryResult
-                * through ParallelSlotsGetIdle.
-                */
-               ParallelSlotSetHandler(free_slot, TableCommandResultHandler, 
NULL);
-               run_vacuum_command(free_slot->connection, sql.data,
-                                                  echo, tabname);
-
-               cell = cell->next;
-       } while (cell != NULL);
-
-       if (!ParallelSlotsWaitCompletion(sa))
-       {
-               failed = true;
-               goto finish;
-       }
-
-       /* If we used SKIP_DATABASE_STATS, mop up with ONLY_DATABASE_STATS */
-       if (vacopts->skip_database_stats && stage == ANALYZE_NO_STAGE)
-       {
-               const char *cmd = "VACUUM (ONLY_DATABASE_STATS);";
-               ParallelSlot *free_slot = ParallelSlotsGetIdle(sa, NULL);
-
-               if (!free_slot)
-               {
-                       failed = true;
-                       goto finish;
-               }
-
-               ParallelSlotSetHandler(free_slot, TableCommandResultHandler, 
NULL);
-               run_vacuum_command(free_slot->connection, cmd, echo, NULL);
-
-               if (!ParallelSlotsWaitCompletion(sa))
-                       failed = true;
-       }
-
-finish:
-       ParallelSlotsTerminate(sa);
-       pg_free(sa);
-
-       termPQExpBuffer(&sql);
-
-       if (failed)
-               exit(1);
+       return found_objs;
 }
 
 /*
@@ -912,6 +952,10 @@ vacuum_all_databases(ConnParams *cparams,
 
        if (analyze_in_stages)
        {
+               SimpleStringList **found_objs;
+
+               found_objs = palloc0(PQntuples(result) * 
sizeof(SimpleStringList *));
+
                /*
                 * When analyzing all databases in stages, we analyze them all 
in the
                 * fastest stage first, so that initial statistics become 
available
@@ -928,7 +972,7 @@ vacuum_all_databases(ConnParams *cparams,
 
                                vacuum_one_database(cparams, vacopts,
                                                                        stage,
-                                                                       objects,
+                                                                       
objects, &found_objs[i],
                                                                        
concurrentCons,
                                                                        
progname, echo, quiet);
                        }
@@ -942,7 +986,7 @@ vacuum_all_databases(ConnParams *cparams,
 
                        vacuum_one_database(cparams, vacopts,
                                                                
ANALYZE_NO_STAGE,
-                                                               objects,
+                                                               objects, NULL,
                                                                concurrentCons,
                                                                progname, echo, 
quiet);
                }
-- 
2.39.5 (Apple Git-154)

>From 320c0491dea628a0b56926563c94fa7044eb191c Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nat...@postgresql.org>
Date: Fri, 24 Jan 2025 09:30:14 -0600
Subject: [PATCH v1 2/2] vacuumdb: Allow analyzing-in-stages only relations
 missing stats.

This commit adds a required argument to --analyze-in-stages that
accepts either "missing" or "all".  "all" provides the same
behavior that --analyze-in-stages has today.  "missing" can be used
to instead analyze only relations that are missing statistics.
This new argument is intentionally marked as required to break any
existing post-pg_upgrade vacuumdb scripts.  Since stats are going
to be carried over by pg_upgrade to v18 or later, running
--analyze-in-stages=all is unnecessary for the vast majority of
users and is likely to even hurt performance post-upgrade.

XXX: I think the above paragraph changes depending on whether
extended statistics are carried over by pg_upgrade.  If they
aren't, we should probably recommend --analyze-in-stages=missing.
If they are, then we probably shouldn't recommend anything at all.
Does this also change how we structure the user-facing part of this
change?

Note that the new "missing" mode will generate ANALYZE commands for
a relation if it is missing any statistics it should ordinarily
have.  For example, if a table has statistics for one column but
not another, we will analyze the whole table.  A similar principle
applies to extended statistics, expression indexes, table
inheritance, and partitioned tables.

XXX: Do we need to make a similar change to --analyze-only and/or
--analyze (e.g., a new --missing-stats-only option)?

Author: Corey Huinker
---
 doc/src/sgml/ref/vacuumdb.sgml           | 12 +++-
 src/bin/scripts/t/102_vacuumdb_stages.pl | 64 ++++++++++++++++++++-
 src/bin/scripts/vacuumdb.c               | 73 +++++++++++++++++++++++-
 src/test/perl/PostgreSQL/Test/Cluster.pm | 27 +++++++++
 4 files changed, 171 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/ref/vacuumdb.sgml b/doc/src/sgml/ref/vacuumdb.sgml
index 66fccb30a2d..3ef8a52a163 100644
--- a/doc/src/sgml/ref/vacuumdb.sgml
+++ b/doc/src/sgml/ref/vacuumdb.sgml
@@ -429,7 +429,7 @@ PostgreSQL documentation
      </varlistentry>
 
      <varlistentry>
-      <term><option>--analyze-in-stages</option></term>
+      <term><option>--analyze-in-stages=<replaceable 
class="parameter">analyze_in_stages_option</replaceable></option></term>
       <listitem>
        <para>
         Only calculate statistics for use by the optimizer (no vacuum),
@@ -441,6 +441,8 @@ PostgreSQL documentation
        </para>
 
        <para>
+        When set to <literal>all</literal>, vacuumdb will calculate statistics
+        for all relations, even relations with existing statistics.
         This option is only useful to analyze a database that currently has
         no statistics or has wholly incorrect ones, such as if it is newly
         populated from a restored dump or by <command>pg_upgrade</command>.
@@ -449,6 +451,14 @@ PostgreSQL documentation
         transiently worse due to the low statistics targets of the early
         stages.
        </para>
+
+       <para>
+        When set to <literal>missing</literal>, vacuumdb will calculate
+        statistics only for relations that are missing statistics for a column,
+        index expression, or extended statistics object.  This option prevents
+        vacuumdb from deleting existing statistics so that the query
+        optimizer's choices do not become transiently worse.
+       </para>
       </listitem>
      </varlistentry>
 
diff --git a/src/bin/scripts/t/102_vacuumdb_stages.pl 
b/src/bin/scripts/t/102_vacuumdb_stages.pl
index 984c8d06de6..0f560accb46 100644
--- a/src/bin/scripts/t/102_vacuumdb_stages.pl
+++ b/src/bin/scripts/t/102_vacuumdb_stages.pl
@@ -12,7 +12,7 @@ $node->init;
 $node->start;
 
 $node->issues_sql_like(
-       [ 'vacuumdb', '--analyze-in-stages', 'postgres' ],
+       [ 'vacuumdb', '--analyze-in-stages', 'all', 'postgres' ],
        qr/statement:\ SET\ default_statistics_target=1;\ SET\ 
vacuum_cost_delay=0;
                    .*statement:\ ANALYZE
                    .*statement:\ SET\ default_statistics_target=10;\ RESET\ 
vacuum_cost_delay;
@@ -21,8 +21,68 @@ $node->issues_sql_like(
                    .*statement:\ ANALYZE/sx,
        'analyze three times');
 
+$node->safe_psql('postgres',
+       'CREATE TABLE regression_vacuumdb_test AS select generate_series(1, 10) 
a, generate_series(2, 11) b;');
 $node->issues_sql_like(
-       [ 'vacuumdb', '--analyze-in-stages', '--all' ],
+       [ 'vacuumdb', '--analyze-in-stages', 'missing', '-t', 
'regression_vacuumdb_test', 'postgres' ],
+       qr/statement:\ ANALYZE/sx,
+       '--analyze-in-stages=missing with missing stats');
+$node->issues_sql_unlike(
+       [ 'vacuumdb', '--analyze-in-stages', 'missing', '-t', 
'regression_vacuumdb_test', 'postgres' ],
+       qr/statement:\ ANALYZE/sx,
+       '--analyze-in-stages=missing with no missing stats');
+
+$node->safe_psql('postgres',
+       'CREATE INDEX regression_vacuumdb_test_idx ON regression_vacuumdb_test 
(mod(a, 2));');
+$node->issues_sql_like(
+       [ 'vacuumdb', '--analyze-in-stages', 'missing', '-t', 
'regression_vacuumdb_test', 'postgres' ],
+       qr/statement:\ ANALYZE/sx,
+       '--analyze-in-stages=missing with missing index expression stats');
+$node->issues_sql_unlike(
+       [ 'vacuumdb', '--analyze-in-stages', 'missing', '-t', 
'regression_vacuumdb_test', 'postgres' ],
+       qr/statement:\ ANALYZE/sx,
+       '--analyze-in-stages=missing with no missing index expression stats');
+
+$node->safe_psql('postgres',
+       'CREATE STATISTICS regression_vacuumdb_test_stat ON a, b FROM 
regression_vacuumdb_test;');
+$node->issues_sql_like(
+       [ 'vacuumdb', '--analyze-in-stages', 'missing', '-t', 
'regression_vacuumdb_test', 'postgres' ],
+       qr/statement:\ ANALYZE/sx,
+       '--analyze-in-stages=missing with missing extended stats');
+$node->issues_sql_unlike(
+       [ 'vacuumdb', '--analyze-in-stages', 'missing', '-t', 
'regression_vacuumdb_test', 'postgres' ],
+       qr/statement:\ ANALYZE/sx,
+       '--analyze-in-stages=missing with no missing extended stats');
+
+$node->safe_psql('postgres',
+       "CREATE TABLE regression_vacuumdb_child (a INT) INHERITS 
(regression_vacuumdb_test);\n"
+       . "INSERT INTO regression_vacuumdb_child VALUES (1, 2);\n"
+       . "ANALYZE regression_vacuumdb_child;\n");
+$node->issues_sql_like(
+    [ 'vacuumdb', '--analyze-in-stages', 'missing', '-t', 
'regression_vacuumdb_test', 'postgres' ],
+    qr/statement:\ ANALYZE/sx,
+    '--analyze-in-stages=missing with missing inherited stats');
+$node->issues_sql_unlike(
+    [ 'vacuumdb', '--analyze-in-stages', 'missing', '-t', 
'regression_vacuumdb_test', 'postgres' ],
+    qr/statement:\ ANALYZE/sx,
+    '--analyze-in-stages=missing with no missing inherited stats');
+
+$node->safe_psql('postgres',
+       "CREATE TABLE regression_vacuumdb_parted (a INT) PARTITION BY LIST 
(a);\n"
+       . "CREATE TABLE regression_vacuumdb_part1 PARTITION OF 
regression_vacuumdb_parted FOR VALUES IN (1);\n"
+       . "INSERT INTO regression_vacuumdb_parted VALUES (1);\n"
+       . "ANALYZE regression_vacuumdb_part1;\n");
+$node->issues_sql_like(
+    [ 'vacuumdb', '--analyze-in-stages', 'missing', '-t', 
'regression_vacuumdb_parted', 'postgres' ],
+    qr/statement:\ ANALYZE/sx,
+    '--analyze-in-stages=missing with missing partition stats');
+$node->issues_sql_unlike(
+    [ 'vacuumdb', '--analyze-in-stages', 'missing', '-t', 
'regression_vacuumdb_parted', 'postgres' ],
+    qr/statement:\ ANALYZE/sx,
+    '--analyze-in-stages=missing with no missing partition stats');
+
+$node->issues_sql_like(
+       [ 'vacuumdb', '--analyze-in-stages', 'all', '--all' ],
        qr/statement:\ SET\ default_statistics_target=1;\ SET\ 
vacuum_cost_delay=0;
                    .*statement:\ ANALYZE
                    .*statement:\ SET\ default_statistics_target=1;\ SET\ 
vacuum_cost_delay=0;
diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index 88ceb42746d..ddc01e9d723 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -61,6 +61,7 @@ typedef enum
 } VacObjFilter;
 
 static VacObjFilter objfilter = OBJFILTER_NONE;
+static bool missing_only;
 
 static SimpleStringList *retrieve_objects(PGconn *conn,
                                                                                
  vacuumingOptions *vacopts,
@@ -123,7 +124,7 @@ main(int argc, char *argv[])
                {"schema", required_argument, NULL, 'n'},
                {"exclude-schema", required_argument, NULL, 'N'},
                {"maintenance-db", required_argument, NULL, 2},
-               {"analyze-in-stages", no_argument, NULL, 3},
+               {"analyze-in-stages", required_argument, NULL, 3},
                {"disable-page-skipping", no_argument, NULL, 4},
                {"skip-locked", no_argument, NULL, 5},
                {"min-xid-age", required_argument, NULL, 6},
@@ -246,6 +247,16 @@ main(int argc, char *argv[])
                                break;
                        case 3:
                                analyze_in_stages = vacopts.analyze_only = true;
+                               if (strcmp(optarg, "all") == 0)
+                                       missing_only = false;
+                               else if (strcmp(optarg, "missing") == 0)
+                                       missing_only = true;
+                               else
+                               {
+                                       pg_log_error("unrecognized 
--analyze-in-stages option: %s",
+                                                                optarg);
+                                       exit(1);
+                               }
                                break;
                        case 4:
                                vacopts.disable_page_skipping = true;
@@ -808,6 +819,7 @@ retrieve_objects(PGconn *conn, vacuumingOptions *vacopts,
                                                 " FROM pg_catalog.pg_class c\n"
                                                 " JOIN pg_catalog.pg_namespace 
ns"
                                                 " ON c.relnamespace 
OPERATOR(pg_catalog.=) ns.oid\n"
+                                                " CROSS JOIN LATERAL (SELECT 
c.relkind IN ('p', 'I')) as p (inherited)\n"
                                                 " LEFT JOIN 
pg_catalog.pg_class t"
                                                 " ON c.reltoastrelid 
OPERATOR(pg_catalog.=) t.oid\n");
 
@@ -891,6 +903,63 @@ retrieve_objects(PGconn *conn, vacuumingOptions *vacopts,
                                                  vacopts->min_mxid_age);
        }
 
+       if (missing_only)
+       {
+               appendPQExpBufferStr(&catalog_query, " AND (\n");
+
+               /* regular stats */
+               appendPQExpBufferStr(&catalog_query,
+                                                        " EXISTS (SELECT NULL 
FROM pg_catalog.pg_attribute a\n"
+                                                        " WHERE a.attrelid 
OPERATOR(pg_catalog.=) c.oid\n"
+                                                        " AND c.reltuples 
OPERATOR(pg_catalog.!=) 0::pg_catalog.float4\n"
+                                                        " AND a.attnum 
OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
+                                                        " AND NOT 
a.attisdropped\n"
+                                                        " AND a.attstattarget 
IS DISTINCT FROM 0::pg_catalog.int2\n"
+                                                        " AND NOT EXISTS 
(SELECT NULL FROM pg_catalog.pg_statistic s\n"
+                                                        " WHERE s.starelid 
OPERATOR(pg_catalog.=) a.attrelid\n"
+                                                        " AND s.staattnum 
OPERATOR(pg_catalog.=) a.attnum\n"
+                                                        " AND s.stainherit 
OPERATOR(pg_catalog.=) p.inherited))\n");
+
+               /* extended stats */
+               appendPQExpBufferStr(&catalog_query,
+                                                        " OR EXISTS (SELECT 
NULL FROM pg_catalog.pg_statistic_ext e\n"
+                                                        " WHERE e.stxrelid 
OPERATOR(pg_catalog.=) c.oid\n"
+                                                        " AND c.reltuples 
OPERATOR(pg_catalog.!=) 0::pg_catalog.float4\n"
+                                                        " AND NOT EXISTS 
(SELECT NULL FROM pg_catalog.pg_statistic_ext_data d\n"
+                                                        " WHERE d.stxoid 
OPERATOR(pg_catalog.=) e.oid))\n");
+
+               /* expression indexes */
+               appendPQExpBufferStr(&catalog_query,
+                                                        " OR EXISTS (SELECT 
NULL FROM pg_catalog.pg_index i\n"
+                                                        " CROSS JOIN LATERAL 
pg_catalog.unnest(i.indkey) WITH ORDINALITY u (attnum, ord)\n"
+                                                        " WHERE i.indrelid 
OPERATOR(pg_catalog.=) c.oid\n"
+                                                        " AND c.reltuples 
OPERATOR(pg_catalog.!=) 0::pg_catalog.float4\n"
+                                                        " AND i.indexprs IS 
NOT NULL\n"
+                                                        " AND NOT EXISTS 
(SELECT NULL FROM pg_catalog.pg_statistic s\n"
+                                                        " WHERE s.starelid 
OPERATOR(pg_catalog.=) i.indexrelid\n"
+                                                        " AND s.staattnum 
OPERATOR(pg_catalog.=) u.ord\n"
+                                                        " AND s.stainherit 
OPERATOR(pg_catalog.=) p.inherited))\n");
+
+               /* table inheritance */
+               appendPQExpBufferStr(&catalog_query,
+                                                        " OR EXISTS (SELECT 
NULL FROM pg_catalog.pg_attribute a\n"
+                                                        " WHERE a.attrelid 
OPERATOR(pg_catalog.=) c.oid\n"
+                                                        " AND c.reltuples 
OPERATOR(pg_catalog.!=) 0::pg_catalog.float4\n"
+                                                        " AND a.attnum 
OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
+                                                        " AND NOT 
a.attisdropped\n"
+                                                        " AND a.attstattarget 
IS DISTINCT FROM 0::pg_catalog.int2\n"
+                                                        " AND 
c.relhassubclass\n"
+                                                        " AND NOT 
p.inherited\n"
+                                                        " AND EXISTS (SELECT 
NULL FROM pg_catalog.pg_inherits h\n"
+                                                        " WHERE h.inhparent 
OPERATOR(pg_catalog.=) c.oid)\n"
+                                                        " AND NOT EXISTS 
(SELECT NULL FROM pg_catalog.pg_statistic s\n"
+                                                        " WHERE s.starelid 
OPERATOR(pg_catalog.=) a.attrelid\n"
+                                                        " AND s.staattnum 
OPERATOR(pg_catalog.=) a.attnum\n"
+                                                        " AND 
s.stainherit))\n");
+
+               appendPQExpBufferStr(&catalog_query, " )\n");
+       }
+
        /*
         * Execute the catalog query.  We use the default search_path for this
         * query for consistency with table lookups done elsewhere by the user.
@@ -1226,7 +1295,7 @@ help(const char *progname)
        printf(_("  -V, --version                   output version information, 
then exit\n"));
        printf(_("  -z, --analyze                   update optimizer 
statistics\n"));
        printf(_("  -Z, --analyze-only              only update optimizer 
statistics; no vacuum\n"));
-       printf(_("      --analyze-in-stages         only update optimizer 
statistics, in multiple\n"
+       printf(_("      --analyze-in-stages=OPTION  only update optimizer 
statistics, in multiple\n"
                         "                                  stages for faster 
results; no vacuum\n"));
        printf(_("  -?, --help                      show this help, then 
exit\n"));
        printf(_("\nConnection options:\n"));
diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm 
b/src/test/perl/PostgreSQL/Test/Cluster.pm
index f521ad0b12f..d30d97aa96c 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -2802,6 +2802,33 @@ sub issues_sql_like
 
 =pod
 
+=item $node->issues_sql_unlike(cmd, unexpected_sql, test_name)
+
+Run a command on the node, then verify that $unexpected_sql does not appear in
+the server log file.
+
+=cut
+
+sub issues_sql_unlike
+{
+       local $Test::Builder::Level = $Test::Builder::Level + 1;
+
+       my ($self, $cmd, $unexpected_sql, $test_name) = @_;
+
+       local %ENV = $self->_get_env();
+
+       my $log_location = -s $self->logfile;
+
+       my $result = PostgreSQL::Test::Utils::run_log($cmd);
+       ok($result, "@$cmd exit code 0");
+       my $log =
+         PostgreSQL::Test::Utils::slurp_file($self->logfile, $log_location);
+       unlike($log, $unexpected_sql, "$test_name: SQL not found in server 
log");
+       return;
+}
+
+=pod
+
 =item $node->log_content()
 
 Returns the contents of log of the node
-- 
2.39.5 (Apple Git-154)

vacuumdb changes for stats import/export

Reply via email to