Re: Statistics Import and Export

Nathan Bossart Tue, 01 Apr 2025 20:22:12 -0700

On Tue, Apr 01, 2025 at 03:05:59PM -0700, Jeff Davis wrote:
> To restate the problem: one of the problems being solved here is that
> the existing code for custom-format dumps calls WriteToc twice. That
> was not a big problem before this patch, when the contents of the
> entries was easily accessible in memory. But the point of 0002 is to
> avoid keeping all of the stats in memory at once, because that causes
> bloat; and instead to query it on demand.
> 
> In theory, we could fix the pre-existing code by making the second pass
> able to jump over the other contents of the entry and just update the
> data offsets. But that seems invasive, at least to do it properly.
> 
> 0001 sidesteps the problem by skipping the second pass if data's not
> being dumped (because there are no offsets that need updating). The
> worst case is when there are a lot of objects with a small amount of
> data. But that's a worst case for stats in general, so I don't think
> that needs to be solved here.
> 
> Issuing the stats queries twice is not great, though. If there's any
> non-deterministic output in the query, that could lead to strangeness.
> How bad can that be? If the results change in some way that looks
> benign, but changes the length of the definition string, can it lead to
> corruption of a ToC entry? I'm not saying there's a problem, but trying
> to understand the risk of future problems.


It certainly feels risky.  I was able to avoid executing the queries twice
in all cases by saving the definition length in the TOC entry and skipping
that many bytes the second time round.  That's simple enough, but it relies
on various assumptions such as fseeko() being available (IIUC the file will
only be open for writing so we cannot fall back on fread()) and WriteStr()
returning an accurate value (which I'm skeptical of because some formats
compress this data).  But AFAICT custom format is the only format that does
a second WriteToc() pass at the moment, and it only does so when fseeko()
is usable.  Plus, custom format doesn't appear to compress anything written
via WriteStr().

We might be able to improve this by inventing a new callback that fails for
all formats except for custom with feesko() available.  That would at least
ensure hard failures if these assumptions change.  That problably wouldn't
be terribly invasive.  I'm curious what you think.

> For 0003, it makes an assumption about the way the scan happens in
> WriteToc(). Can you add some additional sanity checks to verify that
> something doesn't happen in a different order than we expect?

Hm.  One thing we could do is to send the TocEntry to the callback and
verify that matches the one we were expecting to see next (as set by a
previous call).  Does that sound like a strong enough check?  FWIW the
pg_dump tests failed miserably until Corey and I got this part right, so
our usual tests should also offer some assurance.

> Also, why do we need the clause "WHERE s.tablename = ANY($2)"? Isn't
> that already implied by "JOIN unnest($1, $2) ... s.tablename =
> u.tablename"?

Good question.  Corey, do you recall why this was needed?

-- 
nathan

>From 0b20e8a4c8f153e9f292da82a9dbc82ed3adbb3e Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nat...@postgresql.org>
Date: Mon, 31 Mar 2025 10:44:26 -0500
Subject: [PATCH v12n4 1/3] Skip second WriteToc() for custom-format dumps
 without data.

Presently, "pg_dump --format=custom" calls WriteToc() twice.  The
second call is intended to update the data offset information,
which allegedly makes parallel pg_restore significantly faster.
However, if we aren't dumping any data, this step accomplishes
nothing and can be skipped.  This is a preparatory optimization for
follow-up commits that will move the queries for attribute
statistics to WriteToc() to save memory.

Reviewed-by: Jeff Davis <pg...@j-davis.com>
Discussion: https://postgr.es/m/Z9c1rbzZegYQTOQE%40nathan
---
 src/bin/pg_dump/pg_backup_custom.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/bin/pg_dump/pg_backup_custom.c 
b/src/bin/pg_dump/pg_backup_custom.c
index e44b887eb29..b971e3bc16e 100644
--- a/src/bin/pg_dump/pg_backup_custom.c
+++ b/src/bin/pg_dump/pg_backup_custom.c
@@ -755,9 +755,11 @@ _CloseArchive(ArchiveHandle *AH)
                 * If possible, re-write the TOC in order to update the data 
offset
                 * information.  This is not essential, as pg_restore can cope 
in most
                 * cases without it; but it can make pg_restore significantly 
faster
-                * in some situations (especially parallel restore).
+                * in some situations (especially parallel restore).  We can 
skip this
+                * step if we're not dumping any data.
                 */
-               if (ctx->hasSeek &&
+               if (AH->public.dopt->dumpData &&
+                       ctx->hasSeek &&
                        fseeko(AH->FH, tpos, SEEK_SET) == 0)
                        WriteToc(AH);
        }
-- 
2.39.5 (Apple Git-154)

>From 6e6d5345d45092bc0acc5ca31c7d7d663fe2ccad Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nat...@postgresql.org>
Date: Tue, 1 Apr 2025 20:46:24 -0500
Subject: [PATCH v12n4 2/3] pg_dump: Reduce memory usage of dumps with
 statistics.

Right now, pg_dump stores all generated commands for statistics in
memory.  These commands can be quite large and therefore can
significantly increase pg_dump's memory footprint.  To fix, wait
until we are about to write out the commands before generating
them, and be sure to free the commands after writing.  This is
implemented via a new defnDumper callback that works much like the
dataDumper one but is specially designed for TOC entries.

Custom dumps that include data might write the TOC twice (to update
data offset information), which would ordinarily cause pg_dump to
run the attribute statistics queries twice.  However, as a hack, we
save the length of the written-out entry in the first pass, and we
skip over it in the second.  While there is no known technical
problem with executing the queries multiple times and rewriting the
results, it's expensive and feels risky, so it seems prudent to
avoid it.

Author: Corey Huinker <corey.huin...@gmail.com>
Reviewed-by: Jeff Davis <pg...@j-davis.com>
Discussion: 
https://postgr.es/m/CADkLM%3Dc%2Br05srPy9w%2B-%2BnbmLEo15dKXYQ03Q_xyK%2BriJerigLQ%40mail.gmail.com
---
 src/bin/pg_dump/pg_backup.h          |  1 +
 src/bin/pg_dump/pg_backup_archiver.c | 44 ++++++++++++++++++++++++++--
 src/bin/pg_dump/pg_backup_archiver.h |  6 ++++
 src/bin/pg_dump/pg_dump.c            | 44 ++++++++++++++++++++--------
 4 files changed, 81 insertions(+), 14 deletions(-)

diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index 658986de6f8..781f8fa1cc9 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -285,6 +285,7 @@ typedef int DumpId;
  * Function pointer prototypes for assorted callback methods.
  */
 
+typedef char *(*DefnDumperPtr) (Archive *AH, const void *userArg);
 typedef int (*DataDumperPtr) (Archive *AH, const void *userArg);
 
 typedef void (*SetupWorkerPtrType) (Archive *AH);
diff --git a/src/bin/pg_dump/pg_backup_archiver.c 
b/src/bin/pg_dump/pg_backup_archiver.c
index 1d131e5a57d..334b5dedfd7 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -1266,6 +1266,9 @@ ArchiveEntry(Archive *AHX, CatalogId catalogId, DumpId 
dumpId,
        newToc->dataDumperArg = opts->dumpArg;
        newToc->hadDumper = opts->dumpFn ? true : false;
 
+       newToc->defnDumper = opts->defnFn;
+       newToc->defnDumperArg = opts->defnArg;
+
        newToc->formatData = NULL;
        newToc->dataLength = 0;
 
@@ -2621,7 +2624,33 @@ WriteToc(ArchiveHandle *AH)
                WriteStr(AH, te->tag);
                WriteStr(AH, te->desc);
                WriteInt(AH, te->section);
-               WriteStr(AH, te->defn);
+
+               if (te->defnLen)
+               {
+                       /*
+                        * We only set defnLen when a definition is generated 
by the
+                        * defnDumper during WriteToc(), so this must be a 
second
+                        * WriteToc() pass.  The defnDumper might execute 
queries, and
+                        * while running the same queries twice should in 
theory work
+                        * fine, it's expensive and feels risky.  So, we just 
seek through
+                        * those entries.  Presently, the only time we do a 
second
+                        * WriteToc() pass is for custom-format dumps when 
we've already
+                        * verified fseeko() works, so we can use it without 
checking.
+                        * We'll need to figure out something else if this 
changes.
+                        */
+                       if (fseeko(AH->FH, te->defnLen, SEEK_CUR) != 0)
+                               pg_fatal("error during file seek: %m");
+               }
+               else if (te->defnDumper)
+               {
+                       char       *defn = te->defnDumper((Archive *) AH, 
te->defnDumperArg);
+
+                       te->defnLen = WriteStr(AH, defn);
+                       pg_free(defn);
+               }
+               else
+                       WriteStr(AH, te->defn);
+
                WriteStr(AH, te->dropStmt);
                WriteStr(AH, te->copyStmt);
                WriteStr(AH, te->namespace);
@@ -3849,7 +3878,7 @@ _printTocEntry(ArchiveHandle *AH, TocEntry *te, const 
char *pfx)
 
        /*
         * Actually print the definition.  Normally we can just print the defn
-        * string if any, but we have three special cases:
+        * string if any, but we have four special cases:
         *
         * 1. A crude hack for suppressing AUTHORIZATION clause that old pg_dump
         * versions put into CREATE SCHEMA.  Don't mutate the variant for schema
@@ -3862,6 +3891,10 @@ _printTocEntry(ArchiveHandle *AH, TocEntry *te, const 
char *pfx)
         * 3. ACL LARGE OBJECTS entries need special processing because they
         * contain only one copy of the ACL GRANT/REVOKE commands, which we must
         * apply to each large object listed in the associated BLOB METADATA.
+        *
+        * 4. Entries with a defnDumper need to call it to generate the
+        * definition.  This is primarily intended to provide a way to save 
memory
+        * for objects that need a lot of it (e.g., statistics data).
         */
        if (ropt->noOwner &&
                strcmp(te->desc, "SCHEMA") == 0 && strncmp(te->defn, "--", 2) 
!= 0)
@@ -3877,6 +3910,13 @@ _printTocEntry(ArchiveHandle *AH, TocEntry *te, const 
char *pfx)
        {
                IssueACLPerBlob(AH, te);
        }
+       else if (te->defnDumper)
+       {
+               char       *defn = te->defnDumper((Archive *) AH, 
te->defnDumperArg);
+
+               ahprintf(AH, "%s\n\n", defn);
+               pg_free(defn);
+       }
        else if (te->defn && strlen(te->defn) > 0)
        {
                ahprintf(AH, "%s\n\n", te->defn);
diff --git a/src/bin/pg_dump/pg_backup_archiver.h 
b/src/bin/pg_dump/pg_backup_archiver.h
index a2064f471ed..b7ebc2b39cd 100644
--- a/src/bin/pg_dump/pg_backup_archiver.h
+++ b/src/bin/pg_dump/pg_backup_archiver.h
@@ -368,6 +368,10 @@ struct _tocEntry
        const void *dataDumperArg;      /* Arg for above routine */
        void       *formatData;         /* TOC Entry data specific to file 
format */
 
+       DefnDumperPtr defnDumper;       /* routine to dump definition statement 
*/
+       const void *defnDumperArg;      /* arg for above routine */
+       size_t          defnLen;                /* length of dumped definition 
*/
+
        /* working state while dumping/restoring */
        pgoff_t         dataLength;             /* item's data size; 0 if none 
or unknown */
        int                     reqs;                   /* do we need schema 
and/or data of object
@@ -407,6 +411,8 @@ typedef struct _archiveOpts
        int                     nDeps;
        DataDumperPtr dumpFn;
        const void *dumpArg;
+       DefnDumperPtr defnFn;
+       const void *defnArg;
 } ArchiveOpts;
 #define ARCHIVE_OPTS(...) &(ArchiveOpts){__VA_ARGS__}
 /* Called to add a TOC entry */
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 4ca34be230c..46fb70e0a8b 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -10560,13 +10560,16 @@ appendNamedArgument(PQExpBuffer out, Archive *fout, 
const char *argname,
 }
 
 /*
- * dumpRelationStats --
+ * dumpRelationStats_dumper --
  *
- * Dump command to import stats into the relation on the new database.
+ * Generate command to import stats into the relation on the new database.
+ * This routine is called by the Archiver when it wants the statistics to be
+ * dumped.
  */
-static void
-dumpRelationStats(Archive *fout, const RelStatsInfo *rsinfo)
+static char *
+dumpRelationStats_dumper(Archive *fout, const void *userArg)
 {
+       const RelStatsInfo *rsinfo = (RelStatsInfo *) userArg;
        const DumpableObject *dobj = &rsinfo->dobj;
        PGresult   *res;
        PQExpBuffer query;
@@ -10586,10 +10589,7 @@ dumpRelationStats(Archive *fout, const RelStatsInfo 
*rsinfo)
        int                     i_range_length_histogram;
        int                     i_range_empty_frac;
        int                     i_range_bounds_histogram;
-
-       /* nothing to do if we are not dumping statistics */
-       if (!fout->dopt->dumpStatistics)
-               return;
+       char       *ret;
 
        query = createPQExpBuffer();
        if (!fout->is_prepared[PREPQUERY_GETATTRIBUTESTATS])
@@ -10770,17 +10770,37 @@ dumpRelationStats(Archive *fout, const RelStatsInfo 
*rsinfo)
 
        PQclear(res);
 
+       destroyPQExpBuffer(query);
+       ret = out->data;
+       pg_free(out);
+       return ret;
+}
+
+/*
+ * dumpRelationStats --
+ *
+ * Make an ArchiveEntry for the relation statistics.  The Archiver will take
+ * care of gathering the statistics and generating the restore commands when
+ * they are needed.
+ */
+static void
+dumpRelationStats(Archive *fout, const RelStatsInfo *rsinfo)
+{
+       const DumpableObject *dobj = &rsinfo->dobj;
+
+       /* nothing to do if we are not dumping statistics */
+       if (!fout->dopt->dumpStatistics)
+               return;
+
        ArchiveEntry(fout, nilCatalogId, createDumpId(),
                                 ARCHIVE_OPTS(.tag = dobj->name,
                                                          .namespace = 
dobj->namespace->dobj.name,
                                                          .description = 
"STATISTICS DATA",
                                                          .section = 
rsinfo->section,
-                                                         .createStmt = 
out->data,
+                                                         .defnFn = 
dumpRelationStats_dumper,
+                                                         .defnArg = rsinfo,
                                                          .deps = 
dobj->dependencies,
                                                          .nDeps = 
dobj->nDeps));
-
-       destroyPQExpBuffer(out);
-       destroyPQExpBuffer(query);
 }
 
 /*
-- 
2.39.5 (Apple Git-154)

>From 299c7b6d0f5b009d42ee5a4a28278ab3c1bc725c Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nat...@postgresql.org>
Date: Tue, 1 Apr 2025 21:00:24 -0500
Subject: [PATCH v12n4 3/3] pg_dump: Retrieve attribute statistics in batches.

Currently, pg_dump gathers attribute statistics with a query per
relation, which can cause pg_dump to take significantly longer,
especially when there are many tables.  This commit improves
matters by gathering attribute statistics for 64 relations at a
time.  Some simple testing showed this was the ideal batch size,
but performance may vary depending on workload.

This change increases the memory usage of pg_dump a bit, but that
isn't expected to be too egregious and is arguably well worth the
trade-off.

Author: Corey Huinker <corey.huin...@gmail.com>
Reviewed-by: Jeff Davis <pg...@j-davis.com>
Discussion: 
https://postgr.es/m/CADkLM%3Dc%2Br05srPy9w%2B-%2BnbmLEo15dKXYQ03Q_xyK%2BriJerigLQ%40mail.gmail.com
---
 src/bin/pg_dump/pg_dump.c | 104 +++++++++++++++++++++++++++++++-------
 1 file changed, 86 insertions(+), 18 deletions(-)

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 46fb70e0a8b..9d60c77865f 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -209,6 +209,9 @@ static int  nbinaryUpgradeClassOids = 0;
 static SequenceItem *sequences = NULL;
 static int     nsequences = 0;
 
+/* Maximum number of relations to fetch in a fetchAttributeStats() call. */
+#define MAX_ATTR_STATS_RELS 64
+
 /*
  * The default number of rows per INSERT when
  * --inserts is specified without --rows-per-insert
@@ -10559,6 +10562,63 @@ appendNamedArgument(PQExpBuffer out, Archive *fout, 
const char *argname,
        appendPQExpBuffer(out, "::%s", argtype);
 }
 
+/*
+ * fetchAttributeStats --
+ *
+ * Fetch next batch of rows for getAttributeStats().
+ */
+static PGresult *
+fetchAttributeStats(Archive *fout)
+{
+       ArchiveHandle *AH = (ArchiveHandle *) fout;
+       PQExpBuffer nspnames = createPQExpBuffer();
+       PQExpBuffer relnames = createPQExpBuffer();
+       int                     count = 0;
+       PGresult   *res = NULL;
+       static TocEntry *te;
+
+       /* If we're just starting, set our TOC pointer. */
+       if (!te)
+               te = AH->toc->next;
+
+       /*
+        * Scan the TOC for the next set of relevant stats entries.  We assume
+        * that statistics are dumped in the order they are listed in the TOC.
+        * This is perhaps not the sturdiest assumption, but it appears to
+        * presently be true in all cases.
+        */
+       for (; te != AH->toc && count < MAX_ATTR_STATS_RELS; te = te->next)
+       {
+               if (te->reqs && strcmp(te->desc, "STATISTICS DATA") == 0)
+               {
+                       RelStatsInfo *rsinfo = (RelStatsInfo *) 
te->defnDumperArg;
+
+                       appendPQExpBuffer(nspnames, "%s%s", count ? "," : "",
+                                                         
fmtId(rsinfo->dobj.namespace->dobj.name));
+                       appendPQExpBuffer(relnames, "%s%s", count ? "," : "",
+                                                         
fmtId(rsinfo->dobj.name));
+                       count++;
+               }
+       }
+
+       /* Execute the query for the next batch of relations. */
+       if (count > 0)
+       {
+               PQExpBuffer query = createPQExpBuffer();
+
+               appendPQExpBuffer(query, "EXECUTE getAttributeStats("
+                                                 "'{%s}'::pg_catalog.name[],"
+                                                 "'{%s}'::pg_catalog.name[])",
+                                                 nspnames->data, 
relnames->data);
+               res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+               destroyPQExpBuffer(query);
+       }
+
+       destroyPQExpBuffer(nspnames);
+       destroyPQExpBuffer(relnames);
+       return res;
+}
+
 /*
  * dumpRelationStats_dumper --
  *
@@ -10571,9 +10631,12 @@ dumpRelationStats_dumper(Archive *fout, const void 
*userArg)
 {
        const RelStatsInfo *rsinfo = (RelStatsInfo *) userArg;
        const DumpableObject *dobj = &rsinfo->dobj;
-       PGresult   *res;
+       static PGresult *res;
+       static int      rownum;
        PQExpBuffer query;
        PQExpBuffer out;
+       int                     i_schemaname;
+       int                     i_tablename;
        int                     i_attname;
        int                     i_inherited;
        int                     i_null_frac;
@@ -10595,8 +10658,8 @@ dumpRelationStats_dumper(Archive *fout, const void 
*userArg)
        if (!fout->is_prepared[PREPQUERY_GETATTRIBUTESTATS])
        {
                appendPQExpBufferStr(query,
-                                                        "PREPARE 
getAttributeStats(pg_catalog.name, pg_catalog.name) AS\n"
-                                                        "SELECT s.attname, 
s.inherited, "
+                                                        "PREPARE 
getAttributeStats(pg_catalog.name[], pg_catalog.name[]) AS\n"
+                                                        "SELECT s.schemaname, 
s.tablename, s.attname, s.inherited, "
                                                         "s.null_frac, 
s.avg_width, s.n_distinct, "
                                                         "s.most_common_vals, 
s.most_common_freqs, "
                                                         "s.histogram_bounds, 
s.correlation, "
@@ -10616,9 +10679,11 @@ dumpRelationStats_dumper(Archive *fout, const void 
*userArg)
 
                appendPQExpBufferStr(query,
                                                         "FROM 
pg_catalog.pg_stats s "
-                                                        "WHERE s.schemaname = 
$1 "
-                                                        "AND s.tablename = $2 "
-                                                        "ORDER BY s.attname, 
s.inherited");
+                                                        "JOIN unnest($1, $2) 
WITH ORDINALITY AS u (schemaname, tablename, ord) "
+                                                        "ON s.schemaname = 
u.schemaname "
+                                                        "AND s.tablename = 
u.tablename "
+                                                        "WHERE s.tablename = 
ANY($2) "
+                                                        "ORDER BY u.ord, 
s.attname, s.inherited");
 
                ExecuteSqlStatement(fout, query->data);
 
@@ -10648,16 +10713,16 @@ dumpRelationStats_dumper(Archive *fout, const void 
*userArg)
 
        appendPQExpBufferStr(out, "\n);\n");
 
+       /* Fetch the next batch of attribute statistics if needed. */
+       if (rownum >= PQntuples(res))
+       {
+               PQclear(res);
+               res = fetchAttributeStats(fout);
+               rownum = 0;
+       }
 
-       /* fetch attribute stats */
-       appendPQExpBufferStr(query, "EXECUTE getAttributeStats(");
-       appendStringLiteralAH(query, dobj->namespace->dobj.name, fout);
-       appendPQExpBufferStr(query, ", ");
-       appendStringLiteralAH(query, dobj->name, fout);
-       appendPQExpBufferStr(query, ");");
-
-       res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
-
+       i_schemaname = PQfnumber(res, "schemaname");
+       i_tablename = PQfnumber(res, "tablename");
        i_attname = PQfnumber(res, "attname");
        i_inherited = PQfnumber(res, "inherited");
        i_null_frac = PQfnumber(res, "null_frac");
@@ -10675,10 +10740,15 @@ dumpRelationStats_dumper(Archive *fout, const void 
*userArg)
        i_range_bounds_histogram = PQfnumber(res, "range_bounds_histogram");
 
        /* restore attribute stats */
-       for (int rownum = 0; rownum < PQntuples(res); rownum++)
+       for (; rownum < PQntuples(res); rownum++)
        {
                const char *attname;
 
+               /* Stop if the next stat row in our cache isn't for this 
relation. */
+               if (strcmp(dobj->name, PQgetvalue(res, rownum, i_tablename)) != 
0 ||
+                       strcmp(dobj->namespace->dobj.name, PQgetvalue(res, 
rownum, i_schemaname)) != 0)
+                       break;
+
                appendPQExpBufferStr(out, "SELECT * FROM 
pg_catalog.pg_restore_attribute_stats(\n");
                appendPQExpBuffer(out, "\t'version', '%u'::integer,\n",
                                                  fout->remoteVersion);
@@ -10768,8 +10838,6 @@ dumpRelationStats_dumper(Archive *fout, const void 
*userArg)
                appendPQExpBufferStr(out, "\n);\n");
        }
 
-       PQclear(res);
-
        destroyPQExpBuffer(query);
        ret = out->data;
        pg_free(out);
-- 
2.39.5 (Apple Git-154)

Re: Statistics Import and Export

Reply via email to