Thanks for reviewing.

On Thu, Apr 03, 2025 at 03:23:40PM -0700, Jeff Davis wrote:
> This simplifies commit a0a4601765. I'd break out that simplification as
> a separate commit to make it easier to understand what happened. 

Done.

> In patch 0003, there are quite a few static function-scoped variables,
> which is not a style that I'm used to. One idea is to bundle them into
> a struct representing the cache state (including enough information to
> fetch the next batch), and have a single static variable that points to
> that.

As discussed off-list, I didn't take this suggestion for now.  Corey did
this originally, and I converted it to static function-scoped variables 1)
to reduce patch size and 2) because I noticed that each of the state
variables were only needed in one function.  I agree that a struct might be
slightly more readable, but we can always change this in the future if
desired.

> Also in 0003, the "next_te" variable is a bit confusing, because it's
> actually the last TocEntry, until it's advanced to point to the current
> one.

I've renamed it to expected_te.

> Other than that, looks good to me.

Great.  I'm planning to commit the attached patch set tomorrow morning.

For the record, I spent most of today trying very hard to fix the layering
violations in 0002.  While I was successful, the result was awkward,
complicated, and nigh unreadable.  This is now the second time I've
attempted to fix this and have felt the result was worse than where I
started.  So, I added extremely descriptive comments instead.  I'm hoping
that it will be possible to clean this up with some additional work in v19.
I have a few ideas, but if anyone has suggestions, I'm all ears.

-- 
nathan
>From a5a31f2754bf69a81fdc48769c1ee950317a2cf0 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nat...@postgresql.org>
Date: Thu, 3 Apr 2025 13:20:28 -0500
Subject: [PATCH v12n6 1/4] Skip second WriteToc() call for custom-format dumps
 without data.

Presently, "pg_dump --format=custom" calls WriteToc() twice.  The
second call updates the data offset information, which allegedly
makes parallel pg_restore significantly faster.  However, if we're
not dumping any data, there are no data offsets to update, so we
can skip this step.

Reviewed-by: Jeff Davis <pg...@j-davis.com>
Discussion: https://postgr.es/m/Z9c1rbzZegYQTOQE%40nathan
---
 src/bin/pg_dump/pg_backup_custom.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/bin/pg_dump/pg_backup_custom.c 
b/src/bin/pg_dump/pg_backup_custom.c
index e44b887eb29..f7c3af56304 100644
--- a/src/bin/pg_dump/pg_backup_custom.c
+++ b/src/bin/pg_dump/pg_backup_custom.c
@@ -755,9 +755,11 @@ _CloseArchive(ArchiveHandle *AH)
                 * If possible, re-write the TOC in order to update the data 
offset
                 * information.  This is not essential, as pg_restore can cope 
in most
                 * cases without it; but it can make pg_restore significantly 
faster
-                * in some situations (especially parallel restore).
+                * in some situations (especially parallel restore).  We can 
skip this
+                * step if we're not dumping any data; there are no offsets to 
update
+                * in that case.
                 */
-               if (ctx->hasSeek &&
+               if (ctx->hasSeek && AH->public.dopt->dumpData &&
                        fseeko(AH->FH, tpos, SEEK_SET) == 0)
                        WriteToc(AH);
        }
-- 
2.39.5 (Apple Git-154)

>From 0244b3f02e083e6c37a2c282292d4d8fa0a69fed Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nat...@postgresql.org>
Date: Thu, 3 Apr 2025 17:15:51 -0500
Subject: [PATCH v12n6 2/4] pg_dump: Reduce memory usage of dumps with
 statistics.

Right now, pg_dump stores all generated commands for statistics in
memory.  These commands can be quite large and therefore can
significantly increase pg_dump's memory footprint.  To fix, wait
until we are about to write out the commands before generating
them, and be sure to free the commands after writing.  This is
implemented via a new defnDumper callback that works much like the
dataDumper one but is specifically designed for TOC entries.

Custom dumps that include data might write the TOC twice (to update
data offset information), which would ordinarily cause pg_dump to
run the attribute statistics queries twice.  However, as a hack, we
save the length of the written-out entry in the first pass and skip
over it in the second.  While there is no known technical issue
with executing the queries multiple times and rewriting the
results, it's expensive and feels risky, so let's avoid it.

As an exception, we _do_ execute the queries twice for the tar
format.  This format does a second pass through the TOC to generate
the restore.sql file.  pg_restore doesn't use this file, so even if
the second round of queries returns different results than the
first, it won't corrupt the output; the archive and restore.sql
file will just have different content.

Author: Corey Huinker <corey.huin...@gmail.com>
Co-authored-by: Nathan Bossart <nathandboss...@gmail.com>
Reviewed-by: Jeff Davis <pg...@j-davis.com>
Discussion: 
https://postgr.es/m/CADkLM%3Dc%2Br05srPy9w%2B-%2BnbmLEo15dKXYQ03Q_xyK%2BriJerigLQ%40mail.gmail.com
---
 src/bin/pg_dump/pg_backup.h          |  1 +
 src/bin/pg_dump/pg_backup_archiver.c | 84 +++++++++++++++++++++++++++-
 src/bin/pg_dump/pg_backup_archiver.h |  6 ++
 src/bin/pg_dump/pg_dump.c            | 46 ++++++++++-----
 4 files changed, 121 insertions(+), 16 deletions(-)

diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index 658986de6f8..781f8fa1cc9 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -285,6 +285,7 @@ typedef int DumpId;
  * Function pointer prototypes for assorted callback methods.
  */
 
+typedef char *(*DefnDumperPtr) (Archive *AH, const void *userArg);
 typedef int (*DataDumperPtr) (Archive *AH, const void *userArg);
 
 typedef void (*SetupWorkerPtrType) (Archive *AH);
diff --git a/src/bin/pg_dump/pg_backup_archiver.c 
b/src/bin/pg_dump/pg_backup_archiver.c
index 1d131e5a57d..3debd0892c7 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -1266,6 +1266,9 @@ ArchiveEntry(Archive *AHX, CatalogId catalogId, DumpId 
dumpId,
        newToc->dataDumperArg = opts->dumpArg;
        newToc->hadDumper = opts->dumpFn ? true : false;
 
+       newToc->defnDumper = opts->defnFn;
+       newToc->defnDumperArg = opts->defnArg;
+
        newToc->formatData = NULL;
        newToc->dataLength = 0;
 
@@ -2621,7 +2624,45 @@ WriteToc(ArchiveHandle *AH)
                WriteStr(AH, te->tag);
                WriteStr(AH, te->desc);
                WriteInt(AH, te->section);
-               WriteStr(AH, te->defn);
+
+               if (te->defnLen)
+               {
+                       /*
+                        * defnLen should only be set for custom format's 
second call to
+                        * WriteToc(), which rewrites the TOC in place to 
update data
+                        * offsets.  Instead of calling the defnDumper a second 
time
+                        * (which could involve re-executing queries), just 
skip writing
+                        * the entry.  While regenerating the definition should
+                        * theoretically produce the same result as before, 
it's expensive
+                        * and feels risky.
+                        *
+                        * The custom format only calls WriteToc() a second 
time if
+                        * fseeko() is usable (see _CloseArchive() in 
pg_backup_custom.c),
+                        * so we can safely use it without checking.  For other 
formats,
+                        * we fail because one of our assumptions must no 
longer hold
+                        * true.
+                        *
+                        * XXX This is certainly a layering violation, but the 
alternative
+                        * is an awkward and complicated callback 
infrastructure for this
+                        * special case.  This might be worth revisiting in the 
future.
+                        */
+                       if (AH->format != archCustom)
+                               pg_fatal("unexpected TOC entry in WriteToc(): 
%d %s %s",
+                                                te->dumpId, te->desc, te->tag);
+
+                       if (fseeko(AH->FH, te->defnLen, SEEK_CUR != 0))
+                               pg_fatal("error during file seek: %m");
+               }
+               else if (te->defnDumper)
+               {
+                       char       *defn = te->defnDumper((Archive *) AH, 
te->defnDumperArg);
+
+                       te->defnLen = WriteStr(AH, defn);
+                       pg_free(defn);
+               }
+               else
+                       WriteStr(AH, te->defn);
+
                WriteStr(AH, te->dropStmt);
                WriteStr(AH, te->copyStmt);
                WriteStr(AH, te->namespace);
@@ -3849,7 +3890,7 @@ _printTocEntry(ArchiveHandle *AH, TocEntry *te, const 
char *pfx)
 
        /*
         * Actually print the definition.  Normally we can just print the defn
-        * string if any, but we have three special cases:
+        * string if any, but we have four special cases:
         *
         * 1. A crude hack for suppressing AUTHORIZATION clause that old pg_dump
         * versions put into CREATE SCHEMA.  Don't mutate the variant for schema
@@ -3862,6 +3903,11 @@ _printTocEntry(ArchiveHandle *AH, TocEntry *te, const 
char *pfx)
         * 3. ACL LARGE OBJECTS entries need special processing because they
         * contain only one copy of the ACL GRANT/REVOKE commands, which we must
         * apply to each large object listed in the associated BLOB METADATA.
+        *
+        * 4. Entries with a defnDumper need to call it to generate the
+        * definition.  This is primarily intended to provide a way to save 
memory
+        * for objects that would otherwise need a lot of it (e.g., statistics
+        * data).
         */
        if (ropt->noOwner &&
                strcmp(te->desc, "SCHEMA") == 0 && strncmp(te->defn, "--", 2) 
!= 0)
@@ -3877,6 +3923,40 @@ _printTocEntry(ArchiveHandle *AH, TocEntry *te, const 
char *pfx)
        {
                IssueACLPerBlob(AH, te);
        }
+       else if (te->defnLen && AH->format != archTar)
+       {
+               /*
+                * If defnLen is set, the defnDumper has already been called 
for this
+                * TOC entry.  We don't normally expect a defnDumper to be 
called for
+                * a TOC entry a second time in _printTocEntry(), but there's an
+                * exception.  The tar format first calls WriteToc(), which 
scans the
+                * entire TOC, and then it later calls RestoreArchive() to 
generate
+                * restore.sql, which scans the TOC again.  There doesn't 
appear to be
+                * a good way to prevent a second defnDumper call in this case 
without
+                * storing the definition in memory, which defeats the purpose. 
 This
+                * second defnDumper invocation should generate the same output 
as the
+                * first, but even if it doesn't, the worst-case scenario is 
that the
+                * content of the archive and restore.sql (which isn't used by
+                * pg_restore) will differ.
+                *
+                * In all other cases, encountering a TOC entry a second time in
+                * _printTocEntry() is unexpected, so we fail because one of our
+                * assumptions must no longer hold true.
+                *
+                * XXX This is certainly a layering violation, but the 
alternative is
+                * an awkward and complicated callback infrastructure for this 
special
+                * case.  This might be worth revisiting in the future.
+                */
+               pg_fatal("unexpected TOC entry in _printTocEntry(): %d %s %s",
+                                te->dumpId, te->desc, te->tag);
+       }
+       else if (te->defnDumper)
+       {
+               char       *defn = te->defnDumper((Archive *) AH, 
te->defnDumperArg);
+
+               te->defnLen = ahprintf(AH, "%s\n\n", defn);
+               pg_free(defn);
+       }
        else if (te->defn && strlen(te->defn) > 0)
        {
                ahprintf(AH, "%s\n\n", te->defn);
diff --git a/src/bin/pg_dump/pg_backup_archiver.h 
b/src/bin/pg_dump/pg_backup_archiver.h
index a2064f471ed..b7ebc2b39cd 100644
--- a/src/bin/pg_dump/pg_backup_archiver.h
+++ b/src/bin/pg_dump/pg_backup_archiver.h
@@ -368,6 +368,10 @@ struct _tocEntry
        const void *dataDumperArg;      /* Arg for above routine */
        void       *formatData;         /* TOC Entry data specific to file 
format */
 
+       DefnDumperPtr defnDumper;       /* routine to dump definition statement 
*/
+       const void *defnDumperArg;      /* arg for above routine */
+       size_t          defnLen;                /* length of dumped definition 
*/
+
        /* working state while dumping/restoring */
        pgoff_t         dataLength;             /* item's data size; 0 if none 
or unknown */
        int                     reqs;                   /* do we need schema 
and/or data of object
@@ -407,6 +411,8 @@ typedef struct _archiveOpts
        int                     nDeps;
        DataDumperPtr dumpFn;
        const void *dumpArg;
+       DefnDumperPtr defnFn;
+       const void *defnArg;
 } ArchiveOpts;
 #define ARCHIVE_OPTS(...) &(ArchiveOpts){__VA_ARGS__}
 /* Called to add a TOC entry */
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index eef74f78271..77fce7c8738 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -10554,17 +10554,21 @@ appendNamedArgument(PQExpBuffer out, Archive *fout, 
const char *argname,
 }
 
 /*
- * dumpRelationStats --
+ * dumpRelationStats_dumper --
  *
- * Dump command to import stats into the relation on the new database.
+ * Generate command to import stats into the relation on the new database.
+ * This routine is called by the Archiver when it wants the statistics to be
+ * dumped.
  */
-static void
-dumpRelationStats(Archive *fout, const RelStatsInfo *rsinfo)
+static char *
+dumpRelationStats_dumper(Archive *fout, const void *userArg)
 {
+       const RelStatsInfo *rsinfo = (RelStatsInfo *) userArg;
        const DumpableObject *dobj = &rsinfo->dobj;
        PGresult   *res;
        PQExpBuffer query;
-       PQExpBuffer out;
+       PQExpBufferData out_data;
+       PQExpBuffer out = &out_data;
        int                     i_attname;
        int                     i_inherited;
        int                     i_null_frac;
@@ -10581,10 +10585,6 @@ dumpRelationStats(Archive *fout, const RelStatsInfo 
*rsinfo)
        int                     i_range_empty_frac;
        int                     i_range_bounds_histogram;
 
-       /* nothing to do if we are not dumping statistics */
-       if (!fout->dopt->dumpStatistics)
-               return;
-
        query = createPQExpBuffer();
        if (!fout->is_prepared[PREPQUERY_GETATTRIBUTESTATS])
        {
@@ -10620,7 +10620,7 @@ dumpRelationStats(Archive *fout, const RelStatsInfo 
*rsinfo)
                resetPQExpBuffer(query);
        }
 
-       out = createPQExpBuffer();
+       initPQExpBuffer(out);
 
        /* restore relation stats */
        appendPQExpBufferStr(out, "SELECT * FROM 
pg_catalog.pg_restore_relation_stats(\n");
@@ -10764,17 +10764,35 @@ dumpRelationStats(Archive *fout, const RelStatsInfo 
*rsinfo)
 
        PQclear(res);
 
+       destroyPQExpBuffer(query);
+       return out->data;
+}
+
+/*
+ * dumpRelationStats --
+ *
+ * Make an ArchiveEntry for the relation statistics.  The Archiver will take
+ * care of gathering the statistics and generating the restore commands when
+ * they are needed.
+ */
+static void
+dumpRelationStats(Archive *fout, const RelStatsInfo *rsinfo)
+{
+       const DumpableObject *dobj = &rsinfo->dobj;
+
+       /* nothing to do if we are not dumping statistics */
+       if (!fout->dopt->dumpStatistics)
+               return;
+
        ArchiveEntry(fout, nilCatalogId, createDumpId(),
                                 ARCHIVE_OPTS(.tag = dobj->name,
                                                          .namespace = 
dobj->namespace->dobj.name,
                                                          .description = 
"STATISTICS DATA",
                                                          .section = 
rsinfo->section,
-                                                         .createStmt = 
out->data,
+                                                         .defnFn = 
dumpRelationStats_dumper,
+                                                         .defnArg = rsinfo,
                                                          .deps = 
dobj->dependencies,
                                                          .nDeps = 
dobj->nDeps));
-
-       destroyPQExpBuffer(out);
-       destroyPQExpBuffer(query);
 }
 
 /*
-- 
2.39.5 (Apple Git-154)

>From 8e63be0265c5489818043696e3880790ffcee374 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nat...@postgresql.org>
Date: Thu, 3 Apr 2025 17:20:01 -0500
Subject: [PATCH v12n6 3/4] pg_dump: Retrieve attribute statistics in batches.

Currently, pg_dump gathers attribute statistics with a query per
relation, which can cause pg_dump to take significantly longer,
especially when there are many tables.  This commit address this by
teaching pg_dump to gather attribute statistics for 64 relations at
a time.  Some simple tests showed this was the optimal batch size,
but performance may vary depending on the workload.  While this
change increases pg_dump's memory usage a bit, it isn't expected to
be too egregious and seems well worth the trade-off.

Our lookahead code determines the next batch of relations by
searching the TOC sequentially for relevant entries.  This approach
assumes that we will dump all such entries in TOC order, which
unfortunately isn't true for dump formats that use
RestoreArchive().  RestoreArchive() does multiple passes through
the TOC and selectively dumps certain groups of entries each time.
This is particularly problematic for index stats and a subset of
matview stats; both are in SECTION_POST_DATA, but matview stats
that depend on matview data are dumped in RESTORE_PASS_POST_ACL,
while all other stats data entries are dumped in RESTORE_PASS_MAIN.
To handle this, this commit moves all statistics data entries in
SECTION_POST_DATA to RESTORE_PASS_POST_ACL, which ensures that we
always dump them in TOC order.  A convenient side effect of this
change is that we can revert a decent chunk of commit a0a4601765,
but that is left for a follow-up commit.

Author: Corey Huinker <corey.huin...@gmail.com>
Co-authored-by: Nathan Bossart <nathandboss...@gmail.com>
Reviewed-by: Jeff Davis <pg...@j-davis.com>
Discussion: 
https://postgr.es/m/CADkLM%3Dc%2Br05srPy9w%2B-%2BnbmLEo15dKXYQ03Q_xyK%2BriJerigLQ%40mail.gmail.com
---
 src/bin/pg_dump/pg_backup.h          |   5 +-
 src/bin/pg_dump/pg_backup_archiver.c |  30 +++---
 src/bin/pg_dump/pg_dump.c            | 148 +++++++++++++++++++++++----
 3 files changed, 145 insertions(+), 38 deletions(-)

diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index 781f8fa1cc9..9005b4253b4 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -285,7 +285,10 @@ typedef int DumpId;
  * Function pointer prototypes for assorted callback methods.
  */
 
-typedef char *(*DefnDumperPtr) (Archive *AH, const void *userArg);
+/* forward declaration to avoid including pg_backup_archiver.h here */
+typedef struct _tocEntry TocEntry;
+
+typedef char *(*DefnDumperPtr) (Archive *AH, const void *userArg, const 
TocEntry *te);
 typedef int (*DataDumperPtr) (Archive *AH, const void *userArg);
 
 typedef void (*SetupWorkerPtrType) (Archive *AH);
diff --git a/src/bin/pg_dump/pg_backup_archiver.c 
b/src/bin/pg_dump/pg_backup_archiver.c
index 3debd0892c7..84f32aef411 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -2655,7 +2655,7 @@ WriteToc(ArchiveHandle *AH)
                }
                else if (te->defnDumper)
                {
-                       char       *defn = te->defnDumper((Archive *) AH, 
te->defnDumperArg);
+                       char       *defn = te->defnDumper((Archive *) AH, 
te->defnDumperArg, te);
 
                        te->defnLen = WriteStr(AH, defn);
                        pg_free(defn);
@@ -3284,23 +3284,17 @@ _tocEntryRestorePass(ArchiveHandle *AH, TocEntry *te)
 
        /*
         * If statistics data is dependent on materialized view data, it must be
-        * deferred to RESTORE_PASS_POST_ACL.
+        * deferred to RESTORE_PASS_POST_ACL.  Those entries are already marked
+        * with SECTION_POST_DATA, and some other stats entries (e.g., index
+        * stats) will also be marked SECTION_POST_DATA.  Additionally, our
+        * lookahead code in fetchAttributeStats() assumes that we dump all
+        * statistics data entries in TOC order.  To ensure this assumption 
holds,
+        * we move all statistics data entries in SECTION_POST_DATA to
+        * RESTORE_PASS_POST_ACL.
         */
-       if (strcmp(te->desc, "STATISTICS DATA") == 0)
-       {
-               for (int i = 0; i < te->nDeps; i++)
-               {
-                       DumpId          depid = te->dependencies[i];
-
-                       if (depid <= AH->maxDumpId && AH->tocsByDumpId[depid] 
!= NULL)
-                       {
-                               TocEntry   *otherte = AH->tocsByDumpId[depid];
-
-                               if (strcmp(otherte->desc, "MATERIALIZED VIEW 
DATA") == 0)
-                                       return RESTORE_PASS_POST_ACL;
-                       }
-               }
-       }
+       if (strcmp(te->desc, "STATISTICS DATA") == 0 &&
+               te->section == SECTION_POST_DATA)
+               return RESTORE_PASS_POST_ACL;
 
        /* All else can be handled in the main pass. */
        return RESTORE_PASS_MAIN;
@@ -3952,7 +3946,7 @@ _printTocEntry(ArchiveHandle *AH, TocEntry *te, const 
char *pfx)
        }
        else if (te->defnDumper)
        {
-               char       *defn = te->defnDumper((Archive *) AH, 
te->defnDumperArg);
+               char       *defn = te->defnDumper((Archive *) AH, 
te->defnDumperArg, te);
 
                te->defnLen = ahprintf(AH, "%s\n\n", defn);
                pg_free(defn);
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 77fce7c8738..a3bcd9c019a 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -209,6 +209,9 @@ static int  nbinaryUpgradeClassOids = 0;
 static SequenceItem *sequences = NULL;
 static int     nsequences = 0;
 
+/* Maximum number of relations to fetch in a fetchAttributeStats() call. */
+#define MAX_ATTR_STATS_RELS 64
+
 /*
  * The default number of rows per INSERT when
  * --inserts is specified without --rows-per-insert
@@ -10553,6 +10556,79 @@ appendNamedArgument(PQExpBuffer out, Archive *fout, 
const char *argname,
        appendPQExpBuffer(out, "::%s", argtype);
 }
 
+/*
+ * fetchAttributeStats --
+ *
+ * Fetch next batch of rows for getAttributeStats().
+ */
+static PGresult *
+fetchAttributeStats(Archive *fout)
+{
+       ArchiveHandle *AH = (ArchiveHandle *) fout;
+       PQExpBuffer nspnames = createPQExpBuffer();
+       PQExpBuffer relnames = createPQExpBuffer();
+       int                     count = 0;
+       PGresult   *res = NULL;
+       static TocEntry *te;
+       static bool restarted;
+
+       /* If we're just starting, set our TOC pointer. */
+       if (!te)
+               te = AH->toc->next;
+
+       /*
+        * We can't easily avoid a second TOC scan for the tar format because it
+        * writes restore.sql separately, which means we must execute the 
queries
+        * twice.  This feels risky, but there is no known reason it should
+        * generate different output than the first pass.  Even if it does, the
+        * worst-case scenario is that restore.sql might have different 
statistics
+        * data than the archive.
+        */
+       if (!restarted && te == AH->toc && AH->format == archTar)
+       {
+               te = AH->toc->next;
+               restarted = true;
+       }
+
+       /*
+        * Scan the TOC for the next set of relevant stats entries.  We assume
+        * that statistics are dumped in the order they are listed in the TOC.
+        * This is perhaps not the sturdiest assumption, so we verify it matches
+        * reality in dumpRelationStats_dumper().
+        */
+       for (; te != AH->toc && count < MAX_ATTR_STATS_RELS; te = te->next)
+       {
+               if ((te->reqs & REQ_STATS) != 0 &&
+                       strcmp(te->desc, "STATISTICS DATA") == 0)
+               {
+                       RelStatsInfo *rsinfo = (RelStatsInfo *) 
te->defnDumperArg;
+
+                       appendPQExpBuffer(nspnames, "%s%s", count ? "," : "",
+                                                         
fmtId(rsinfo->dobj.namespace->dobj.name));
+                       appendPQExpBuffer(relnames, "%s%s", count ? "," : "",
+                                                         
fmtId(rsinfo->dobj.name));
+                       count++;
+               }
+       }
+
+       /* Execute the query for the next batch of relations. */
+       if (count > 0)
+       {
+               PQExpBuffer query = createPQExpBuffer();
+
+               appendPQExpBuffer(query, "EXECUTE getAttributeStats("
+                                                 "'{%s}'::pg_catalog.name[],"
+                                                 "'{%s}'::pg_catalog.name[])",
+                                                 nspnames->data, 
relnames->data);
+               res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+               destroyPQExpBuffer(query);
+       }
+
+       destroyPQExpBuffer(nspnames);
+       destroyPQExpBuffer(relnames);
+       return res;
+}
+
 /*
  * dumpRelationStats_dumper --
  *
@@ -10561,14 +10637,17 @@ appendNamedArgument(PQExpBuffer out, Archive *fout, 
const char *argname,
  * dumped.
  */
 static char *
-dumpRelationStats_dumper(Archive *fout, const void *userArg)
+dumpRelationStats_dumper(Archive *fout, const void *userArg, const TocEntry 
*te)
 {
        const RelStatsInfo *rsinfo = (RelStatsInfo *) userArg;
        const DumpableObject *dobj = &rsinfo->dobj;
-       PGresult   *res;
+       static PGresult *res;
+       static int      rownum;
        PQExpBuffer query;
        PQExpBufferData out_data;
        PQExpBuffer out = &out_data;
+       int                     i_schemaname;
+       int                     i_tablename;
        int                     i_attname;
        int                     i_inherited;
        int                     i_null_frac;
@@ -10584,13 +10663,31 @@ dumpRelationStats_dumper(Archive *fout, const void 
*userArg)
        int                     i_range_length_histogram;
        int                     i_range_empty_frac;
        int                     i_range_bounds_histogram;
+       static TocEntry *expected_te;
+
+       /*
+        * fetchAttributeStats() assumes that the statistics are dumped in the
+        * order they are listed in the TOC.  We verify that here for safety.
+        */
+       if (!expected_te)
+               expected_te = ((ArchiveHandle *) fout)->toc;
+
+       expected_te = expected_te->next;
+       while ((expected_te->reqs & REQ_STATS) == 0 ||
+                  strcmp(expected_te->desc, "STATISTICS DATA") != 0)
+               expected_te = expected_te->next;
+
+       if (te != expected_te)
+               pg_fatal("stats dumped out of order (current: %d %s %s) 
(expected: %d %s %s)",
+                                te->dumpId, te->desc, te->tag,
+                                expected_te->dumpId, expected_te->desc, 
expected_te->tag);
 
        query = createPQExpBuffer();
        if (!fout->is_prepared[PREPQUERY_GETATTRIBUTESTATS])
        {
                appendPQExpBufferStr(query,
-                                                        "PREPARE 
getAttributeStats(pg_catalog.name, pg_catalog.name) AS\n"
-                                                        "SELECT s.attname, 
s.inherited, "
+                                                        "PREPARE 
getAttributeStats(pg_catalog.name[], pg_catalog.name[]) AS\n"
+                                                        "SELECT s.schemaname, 
s.tablename, s.attname, s.inherited, "
                                                         "s.null_frac, 
s.avg_width, s.n_distinct, "
                                                         "s.most_common_vals, 
s.most_common_freqs, "
                                                         "s.histogram_bounds, 
s.correlation, "
@@ -10608,11 +10705,21 @@ dumpRelationStats_dumper(Archive *fout, const void 
*userArg)
                                                                 "NULL AS 
range_empty_frac,"
                                                                 "NULL AS 
range_bounds_histogram ");
 
+               /*
+                * The results must be in the order of the relations supplied 
in the
+                * parameters to ensure we remain in sync as we walk through 
the TOC.
+                * The redundant filter clause on s.tablename = ANY(...) seems
+                * sufficient to convince the planner to use
+                * pg_class_relname_nsp_index, which avoids a full scan of 
pg_stats.
+                * This may not work for all versions.
+                */
                appendPQExpBufferStr(query,
                                                         "FROM 
pg_catalog.pg_stats s "
-                                                        "WHERE s.schemaname = 
$1 "
-                                                        "AND s.tablename = $2 "
-                                                        "ORDER BY s.attname, 
s.inherited");
+                                                        "JOIN unnest($1, $2) 
WITH ORDINALITY AS u (schemaname, tablename, ord) "
+                                                        "ON s.schemaname = 
u.schemaname "
+                                                        "AND s.tablename = 
u.tablename "
+                                                        "WHERE s.tablename = 
ANY($2) "
+                                                        "ORDER BY u.ord, 
s.attname, s.inherited");
 
                ExecuteSqlStatement(fout, query->data);
 
@@ -10642,16 +10749,16 @@ dumpRelationStats_dumper(Archive *fout, const void 
*userArg)
 
        appendPQExpBufferStr(out, "\n);\n");
 
+       /* Fetch the next batch of attribute statistics if needed. */
+       if (rownum >= PQntuples(res))
+       {
+               PQclear(res);
+               res = fetchAttributeStats(fout);
+               rownum = 0;
+       }
 
-       /* fetch attribute stats */
-       appendPQExpBufferStr(query, "EXECUTE getAttributeStats(");
-       appendStringLiteralAH(query, dobj->namespace->dobj.name, fout);
-       appendPQExpBufferStr(query, ", ");
-       appendStringLiteralAH(query, dobj->name, fout);
-       appendPQExpBufferStr(query, ");");
-
-       res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
-
+       i_schemaname = PQfnumber(res, "schemaname");
+       i_tablename = PQfnumber(res, "tablename");
        i_attname = PQfnumber(res, "attname");
        i_inherited = PQfnumber(res, "inherited");
        i_null_frac = PQfnumber(res, "null_frac");
@@ -10669,10 +10776,15 @@ dumpRelationStats_dumper(Archive *fout, const void 
*userArg)
        i_range_bounds_histogram = PQfnumber(res, "range_bounds_histogram");
 
        /* restore attribute stats */
-       for (int rownum = 0; rownum < PQntuples(res); rownum++)
+       for (; rownum < PQntuples(res); rownum++)
        {
                const char *attname;
 
+               /* Stop if the next stat row in our cache isn't for this 
relation. */
+               if (strcmp(dobj->name, PQgetvalue(res, rownum, i_tablename)) != 
0 ||
+                       strcmp(dobj->namespace->dobj.name, PQgetvalue(res, 
rownum, i_schemaname)) != 0)
+                       break;
+
                appendPQExpBufferStr(out, "SELECT * FROM 
pg_catalog.pg_restore_attribute_stats(\n");
                appendPQExpBuffer(out, "\t'version', '%u'::integer,\n",
                                                  fout->remoteVersion);
@@ -10762,8 +10874,6 @@ dumpRelationStats_dumper(Archive *fout, const void 
*userArg)
                appendPQExpBufferStr(out, "\n);\n");
        }
 
-       PQclear(res);
-
        destroyPQExpBuffer(query);
        return out->data;
 }
-- 
2.39.5 (Apple Git-154)

>From 3860fc596ea6038c5f4edeb1ab3de85d8a297013 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nat...@postgresql.org>
Date: Thu, 3 Apr 2025 20:13:40 -0500
Subject: [PATCH v12n6 4/4] Partially revert commit a0a4601765.

Thanks to commit XXXXXXXXXX, which simplified some code in
_tocEntryRestorePass(), we can remove the now-unused ArchiveHandle
parameter from _tocEntryRestorePass() and move_to_ready_heap().

Reviewed-by: Jeff Davis <pg...@j-davis.com>
Discussion: https://postgr.es/m/Z-3x2AnPCP331JA3%40nathan
---
 src/bin/pg_dump/pg_backup_archiver.c | 26 ++++++++++++--------------
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/src/bin/pg_dump/pg_backup_archiver.c 
b/src/bin/pg_dump/pg_backup_archiver.c
index 84f32aef411..5ccedc383bb 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -72,7 +72,7 @@ static void processEncodingEntry(ArchiveHandle *AH, TocEntry 
*te);
 static void processStdStringsEntry(ArchiveHandle *AH, TocEntry *te);
 static void processSearchPathEntry(ArchiveHandle *AH, TocEntry *te);
 static int     _tocEntryRequired(TocEntry *te, teSection curSection, 
ArchiveHandle *AH);
-static RestorePass _tocEntryRestorePass(ArchiveHandle *AH, TocEntry *te);
+static RestorePass _tocEntryRestorePass(TocEntry *te);
 static bool _tocEntryIsACL(TocEntry *te);
 static void _disableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te);
 static void _enableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te);
@@ -102,8 +102,7 @@ static void pending_list_append(TocEntry *l, TocEntry *te);
 static void pending_list_remove(TocEntry *te);
 static int     TocEntrySizeCompareQsort(const void *p1, const void *p2);
 static int     TocEntrySizeCompareBinaryheap(void *p1, void *p2, void *arg);
-static void move_to_ready_heap(ArchiveHandle *AH,
-                                                          TocEntry 
*pending_list,
+static void move_to_ready_heap(TocEntry *pending_list,
                                                           binaryheap 
*ready_heap,
                                                           RestorePass pass);
 static TocEntry *pop_next_work_item(binaryheap *ready_heap,
@@ -749,7 +748,7 @@ RestoreArchive(Archive *AHX)
                        if ((te->reqs & (REQ_SCHEMA | REQ_DATA | REQ_STATS)) == 
0)
                                continue;               /* ignore if not to be 
dumped at all */
 
-                       switch (_tocEntryRestorePass(AH, te))
+                       switch (_tocEntryRestorePass(te))
                        {
                                case RESTORE_PASS_MAIN:
                                        (void) restore_toc_entry(AH, te, false);
@@ -768,7 +767,7 @@ RestoreArchive(Archive *AHX)
                        for (te = AH->toc->next; te != AH->toc; te = te->next)
                        {
                                if ((te->reqs & (REQ_SCHEMA | REQ_DATA | 
REQ_STATS)) != 0 &&
-                                       _tocEntryRestorePass(AH, te) == 
RESTORE_PASS_ACL)
+                                       _tocEntryRestorePass(te) == 
RESTORE_PASS_ACL)
                                        (void) restore_toc_entry(AH, te, false);
                        }
                }
@@ -778,7 +777,7 @@ RestoreArchive(Archive *AHX)
                        for (te = AH->toc->next; te != AH->toc; te = te->next)
                        {
                                if ((te->reqs & (REQ_SCHEMA | REQ_DATA | 
REQ_STATS)) != 0 &&
-                                       _tocEntryRestorePass(AH, te) == 
RESTORE_PASS_POST_ACL)
+                                       _tocEntryRestorePass(te) == 
RESTORE_PASS_POST_ACL)
                                        (void) restore_toc_entry(AH, te, false);
                        }
                }
@@ -3261,7 +3260,7 @@ _tocEntryRequired(TocEntry *te, teSection curSection, 
ArchiveHandle *AH)
  * See notes with the RestorePass typedef in pg_backup_archiver.h.
  */
 static RestorePass
-_tocEntryRestorePass(ArchiveHandle *AH, TocEntry *te)
+_tocEntryRestorePass(TocEntry *te)
 {
        /* "ACL LANGUAGE" was a crock emitted only in PG 7.4 */
        if (strcmp(te->desc, "ACL") == 0 ||
@@ -4344,7 +4343,7 @@ restore_toc_entries_prefork(ArchiveHandle *AH, TocEntry 
*pending_list)
                 * not set skipped_some in this case, since by assumption no 
main-pass
                 * items could depend on these.
                 */
-               if (_tocEntryRestorePass(AH, next_work_item) != 
RESTORE_PASS_MAIN)
+               if (_tocEntryRestorePass(next_work_item) != RESTORE_PASS_MAIN)
                        do_now = false;
 
                if (do_now)
@@ -4426,7 +4425,7 @@ restore_toc_entries_parallel(ArchiveHandle *AH, 
ParallelState *pstate,
         * process in the current restore pass.
         */
        AH->restorePass = RESTORE_PASS_MAIN;
-       move_to_ready_heap(AH, pending_list, ready_heap, AH->restorePass);
+       move_to_ready_heap(pending_list, ready_heap, AH->restorePass);
 
        /*
         * main parent loop
@@ -4475,7 +4474,7 @@ restore_toc_entries_parallel(ArchiveHandle *AH, 
ParallelState *pstate,
                        /* Advance to next restore pass */
                        AH->restorePass++;
                        /* That probably allows some stuff to be made ready */
-                       move_to_ready_heap(AH, pending_list, ready_heap, 
AH->restorePass);
+                       move_to_ready_heap(pending_list, ready_heap, 
AH->restorePass);
                        /* Loop around to see if anything's now ready */
                        continue;
                }
@@ -4646,8 +4645,7 @@ TocEntrySizeCompareBinaryheap(void *p1, void *p2, void 
*arg)
  * which applies the same logic one-at-a-time.)
  */
 static void
-move_to_ready_heap(ArchiveHandle *AH,
-                                  TocEntry *pending_list,
+move_to_ready_heap(TocEntry *pending_list,
                                   binaryheap *ready_heap,
                                   RestorePass pass)
 {
@@ -4660,7 +4658,7 @@ move_to_ready_heap(ArchiveHandle *AH,
                next_te = te->pending_next;
 
                if (te->depCount == 0 &&
-                       _tocEntryRestorePass(AH, te) == pass)
+                       _tocEntryRestorePass(te) == pass)
                {
                        /* Remove it from pending_list ... */
                        pending_list_remove(te);
@@ -5054,7 +5052,7 @@ reduce_dependencies(ArchiveHandle *AH, TocEntry *te,
                 * memberships changed.
                 */
                if (otherte->depCount == 0 &&
-                       _tocEntryRestorePass(AH, otherte) == AH->restorePass &&
+                       _tocEntryRestorePass(otherte) == AH->restorePass &&
                        otherte->pending_prev != NULL &&
                        ready_heap != NULL)
                {
-- 
2.39.5 (Apple Git-154)

Reply via email to