Thanks for reviewing. On Thu, Apr 03, 2025 at 03:23:40PM -0700, Jeff Davis wrote: > This simplifies commit a0a4601765. I'd break out that simplification as > a separate commit to make it easier to understand what happened.
Done. > In patch 0003, there are quite a few static function-scoped variables, > which is not a style that I'm used to. One idea is to bundle them into > a struct representing the cache state (including enough information to > fetch the next batch), and have a single static variable that points to > that. As discussed off-list, I didn't take this suggestion for now. Corey did this originally, and I converted it to static function-scoped variables 1) to reduce patch size and 2) because I noticed that each of the state variables were only needed in one function. I agree that a struct might be slightly more readable, but we can always change this in the future if desired. > Also in 0003, the "next_te" variable is a bit confusing, because it's > actually the last TocEntry, until it's advanced to point to the current > one. I've renamed it to expected_te. > Other than that, looks good to me. Great. I'm planning to commit the attached patch set tomorrow morning. For the record, I spent most of today trying very hard to fix the layering violations in 0002. While I was successful, the result was awkward, complicated, and nigh unreadable. This is now the second time I've attempted to fix this and have felt the result was worse than where I started. So, I added extremely descriptive comments instead. I'm hoping that it will be possible to clean this up with some additional work in v19. I have a few ideas, but if anyone has suggestions, I'm all ears. -- nathan
>From a5a31f2754bf69a81fdc48769c1ee950317a2cf0 Mon Sep 17 00:00:00 2001 From: Nathan Bossart <nat...@postgresql.org> Date: Thu, 3 Apr 2025 13:20:28 -0500 Subject: [PATCH v12n6 1/4] Skip second WriteToc() call for custom-format dumps without data. Presently, "pg_dump --format=custom" calls WriteToc() twice. The second call updates the data offset information, which allegedly makes parallel pg_restore significantly faster. However, if we're not dumping any data, there are no data offsets to update, so we can skip this step. Reviewed-by: Jeff Davis <pg...@j-davis.com> Discussion: https://postgr.es/m/Z9c1rbzZegYQTOQE%40nathan --- src/bin/pg_dump/pg_backup_custom.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/src/bin/pg_dump/pg_backup_custom.c b/src/bin/pg_dump/pg_backup_custom.c index e44b887eb29..f7c3af56304 100644 --- a/src/bin/pg_dump/pg_backup_custom.c +++ b/src/bin/pg_dump/pg_backup_custom.c @@ -755,9 +755,11 @@ _CloseArchive(ArchiveHandle *AH) * If possible, re-write the TOC in order to update the data offset * information. This is not essential, as pg_restore can cope in most * cases without it; but it can make pg_restore significantly faster - * in some situations (especially parallel restore). + * in some situations (especially parallel restore). We can skip this + * step if we're not dumping any data; there are no offsets to update + * in that case. */ - if (ctx->hasSeek && + if (ctx->hasSeek && AH->public.dopt->dumpData && fseeko(AH->FH, tpos, SEEK_SET) == 0) WriteToc(AH); } -- 2.39.5 (Apple Git-154)
>From 0244b3f02e083e6c37a2c282292d4d8fa0a69fed Mon Sep 17 00:00:00 2001 From: Nathan Bossart <nat...@postgresql.org> Date: Thu, 3 Apr 2025 17:15:51 -0500 Subject: [PATCH v12n6 2/4] pg_dump: Reduce memory usage of dumps with statistics. Right now, pg_dump stores all generated commands for statistics in memory. These commands can be quite large and therefore can significantly increase pg_dump's memory footprint. To fix, wait until we are about to write out the commands before generating them, and be sure to free the commands after writing. This is implemented via a new defnDumper callback that works much like the dataDumper one but is specifically designed for TOC entries. Custom dumps that include data might write the TOC twice (to update data offset information), which would ordinarily cause pg_dump to run the attribute statistics queries twice. However, as a hack, we save the length of the written-out entry in the first pass and skip over it in the second. While there is no known technical issue with executing the queries multiple times and rewriting the results, it's expensive and feels risky, so let's avoid it. As an exception, we _do_ execute the queries twice for the tar format. This format does a second pass through the TOC to generate the restore.sql file. pg_restore doesn't use this file, so even if the second round of queries returns different results than the first, it won't corrupt the output; the archive and restore.sql file will just have different content. Author: Corey Huinker <corey.huin...@gmail.com> Co-authored-by: Nathan Bossart <nathandboss...@gmail.com> Reviewed-by: Jeff Davis <pg...@j-davis.com> Discussion: https://postgr.es/m/CADkLM%3Dc%2Br05srPy9w%2B-%2BnbmLEo15dKXYQ03Q_xyK%2BriJerigLQ%40mail.gmail.com --- src/bin/pg_dump/pg_backup.h | 1 + src/bin/pg_dump/pg_backup_archiver.c | 84 +++++++++++++++++++++++++++- src/bin/pg_dump/pg_backup_archiver.h | 6 ++ src/bin/pg_dump/pg_dump.c | 46 ++++++++++----- 4 files changed, 121 insertions(+), 16 deletions(-) diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h index 658986de6f8..781f8fa1cc9 100644 --- a/src/bin/pg_dump/pg_backup.h +++ b/src/bin/pg_dump/pg_backup.h @@ -285,6 +285,7 @@ typedef int DumpId; * Function pointer prototypes for assorted callback methods. */ +typedef char *(*DefnDumperPtr) (Archive *AH, const void *userArg); typedef int (*DataDumperPtr) (Archive *AH, const void *userArg); typedef void (*SetupWorkerPtrType) (Archive *AH); diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c index 1d131e5a57d..3debd0892c7 100644 --- a/src/bin/pg_dump/pg_backup_archiver.c +++ b/src/bin/pg_dump/pg_backup_archiver.c @@ -1266,6 +1266,9 @@ ArchiveEntry(Archive *AHX, CatalogId catalogId, DumpId dumpId, newToc->dataDumperArg = opts->dumpArg; newToc->hadDumper = opts->dumpFn ? true : false; + newToc->defnDumper = opts->defnFn; + newToc->defnDumperArg = opts->defnArg; + newToc->formatData = NULL; newToc->dataLength = 0; @@ -2621,7 +2624,45 @@ WriteToc(ArchiveHandle *AH) WriteStr(AH, te->tag); WriteStr(AH, te->desc); WriteInt(AH, te->section); - WriteStr(AH, te->defn); + + if (te->defnLen) + { + /* + * defnLen should only be set for custom format's second call to + * WriteToc(), which rewrites the TOC in place to update data + * offsets. Instead of calling the defnDumper a second time + * (which could involve re-executing queries), just skip writing + * the entry. While regenerating the definition should + * theoretically produce the same result as before, it's expensive + * and feels risky. + * + * The custom format only calls WriteToc() a second time if + * fseeko() is usable (see _CloseArchive() in pg_backup_custom.c), + * so we can safely use it without checking. For other formats, + * we fail because one of our assumptions must no longer hold + * true. + * + * XXX This is certainly a layering violation, but the alternative + * is an awkward and complicated callback infrastructure for this + * special case. This might be worth revisiting in the future. + */ + if (AH->format != archCustom) + pg_fatal("unexpected TOC entry in WriteToc(): %d %s %s", + te->dumpId, te->desc, te->tag); + + if (fseeko(AH->FH, te->defnLen, SEEK_CUR != 0)) + pg_fatal("error during file seek: %m"); + } + else if (te->defnDumper) + { + char *defn = te->defnDumper((Archive *) AH, te->defnDumperArg); + + te->defnLen = WriteStr(AH, defn); + pg_free(defn); + } + else + WriteStr(AH, te->defn); + WriteStr(AH, te->dropStmt); WriteStr(AH, te->copyStmt); WriteStr(AH, te->namespace); @@ -3849,7 +3890,7 @@ _printTocEntry(ArchiveHandle *AH, TocEntry *te, const char *pfx) /* * Actually print the definition. Normally we can just print the defn - * string if any, but we have three special cases: + * string if any, but we have four special cases: * * 1. A crude hack for suppressing AUTHORIZATION clause that old pg_dump * versions put into CREATE SCHEMA. Don't mutate the variant for schema @@ -3862,6 +3903,11 @@ _printTocEntry(ArchiveHandle *AH, TocEntry *te, const char *pfx) * 3. ACL LARGE OBJECTS entries need special processing because they * contain only one copy of the ACL GRANT/REVOKE commands, which we must * apply to each large object listed in the associated BLOB METADATA. + * + * 4. Entries with a defnDumper need to call it to generate the + * definition. This is primarily intended to provide a way to save memory + * for objects that would otherwise need a lot of it (e.g., statistics + * data). */ if (ropt->noOwner && strcmp(te->desc, "SCHEMA") == 0 && strncmp(te->defn, "--", 2) != 0) @@ -3877,6 +3923,40 @@ _printTocEntry(ArchiveHandle *AH, TocEntry *te, const char *pfx) { IssueACLPerBlob(AH, te); } + else if (te->defnLen && AH->format != archTar) + { + /* + * If defnLen is set, the defnDumper has already been called for this + * TOC entry. We don't normally expect a defnDumper to be called for + * a TOC entry a second time in _printTocEntry(), but there's an + * exception. The tar format first calls WriteToc(), which scans the + * entire TOC, and then it later calls RestoreArchive() to generate + * restore.sql, which scans the TOC again. There doesn't appear to be + * a good way to prevent a second defnDumper call in this case without + * storing the definition in memory, which defeats the purpose. This + * second defnDumper invocation should generate the same output as the + * first, but even if it doesn't, the worst-case scenario is that the + * content of the archive and restore.sql (which isn't used by + * pg_restore) will differ. + * + * In all other cases, encountering a TOC entry a second time in + * _printTocEntry() is unexpected, so we fail because one of our + * assumptions must no longer hold true. + * + * XXX This is certainly a layering violation, but the alternative is + * an awkward and complicated callback infrastructure for this special + * case. This might be worth revisiting in the future. + */ + pg_fatal("unexpected TOC entry in _printTocEntry(): %d %s %s", + te->dumpId, te->desc, te->tag); + } + else if (te->defnDumper) + { + char *defn = te->defnDumper((Archive *) AH, te->defnDumperArg); + + te->defnLen = ahprintf(AH, "%s\n\n", defn); + pg_free(defn); + } else if (te->defn && strlen(te->defn) > 0) { ahprintf(AH, "%s\n\n", te->defn); diff --git a/src/bin/pg_dump/pg_backup_archiver.h b/src/bin/pg_dump/pg_backup_archiver.h index a2064f471ed..b7ebc2b39cd 100644 --- a/src/bin/pg_dump/pg_backup_archiver.h +++ b/src/bin/pg_dump/pg_backup_archiver.h @@ -368,6 +368,10 @@ struct _tocEntry const void *dataDumperArg; /* Arg for above routine */ void *formatData; /* TOC Entry data specific to file format */ + DefnDumperPtr defnDumper; /* routine to dump definition statement */ + const void *defnDumperArg; /* arg for above routine */ + size_t defnLen; /* length of dumped definition */ + /* working state while dumping/restoring */ pgoff_t dataLength; /* item's data size; 0 if none or unknown */ int reqs; /* do we need schema and/or data of object @@ -407,6 +411,8 @@ typedef struct _archiveOpts int nDeps; DataDumperPtr dumpFn; const void *dumpArg; + DefnDumperPtr defnFn; + const void *defnArg; } ArchiveOpts; #define ARCHIVE_OPTS(...) &(ArchiveOpts){__VA_ARGS__} /* Called to add a TOC entry */ diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index eef74f78271..77fce7c8738 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -10554,17 +10554,21 @@ appendNamedArgument(PQExpBuffer out, Archive *fout, const char *argname, } /* - * dumpRelationStats -- + * dumpRelationStats_dumper -- * - * Dump command to import stats into the relation on the new database. + * Generate command to import stats into the relation on the new database. + * This routine is called by the Archiver when it wants the statistics to be + * dumped. */ -static void -dumpRelationStats(Archive *fout, const RelStatsInfo *rsinfo) +static char * +dumpRelationStats_dumper(Archive *fout, const void *userArg) { + const RelStatsInfo *rsinfo = (RelStatsInfo *) userArg; const DumpableObject *dobj = &rsinfo->dobj; PGresult *res; PQExpBuffer query; - PQExpBuffer out; + PQExpBufferData out_data; + PQExpBuffer out = &out_data; int i_attname; int i_inherited; int i_null_frac; @@ -10581,10 +10585,6 @@ dumpRelationStats(Archive *fout, const RelStatsInfo *rsinfo) int i_range_empty_frac; int i_range_bounds_histogram; - /* nothing to do if we are not dumping statistics */ - if (!fout->dopt->dumpStatistics) - return; - query = createPQExpBuffer(); if (!fout->is_prepared[PREPQUERY_GETATTRIBUTESTATS]) { @@ -10620,7 +10620,7 @@ dumpRelationStats(Archive *fout, const RelStatsInfo *rsinfo) resetPQExpBuffer(query); } - out = createPQExpBuffer(); + initPQExpBuffer(out); /* restore relation stats */ appendPQExpBufferStr(out, "SELECT * FROM pg_catalog.pg_restore_relation_stats(\n"); @@ -10764,17 +10764,35 @@ dumpRelationStats(Archive *fout, const RelStatsInfo *rsinfo) PQclear(res); + destroyPQExpBuffer(query); + return out->data; +} + +/* + * dumpRelationStats -- + * + * Make an ArchiveEntry for the relation statistics. The Archiver will take + * care of gathering the statistics and generating the restore commands when + * they are needed. + */ +static void +dumpRelationStats(Archive *fout, const RelStatsInfo *rsinfo) +{ + const DumpableObject *dobj = &rsinfo->dobj; + + /* nothing to do if we are not dumping statistics */ + if (!fout->dopt->dumpStatistics) + return; + ArchiveEntry(fout, nilCatalogId, createDumpId(), ARCHIVE_OPTS(.tag = dobj->name, .namespace = dobj->namespace->dobj.name, .description = "STATISTICS DATA", .section = rsinfo->section, - .createStmt = out->data, + .defnFn = dumpRelationStats_dumper, + .defnArg = rsinfo, .deps = dobj->dependencies, .nDeps = dobj->nDeps)); - - destroyPQExpBuffer(out); - destroyPQExpBuffer(query); } /* -- 2.39.5 (Apple Git-154)
>From 8e63be0265c5489818043696e3880790ffcee374 Mon Sep 17 00:00:00 2001 From: Nathan Bossart <nat...@postgresql.org> Date: Thu, 3 Apr 2025 17:20:01 -0500 Subject: [PATCH v12n6 3/4] pg_dump: Retrieve attribute statistics in batches. Currently, pg_dump gathers attribute statistics with a query per relation, which can cause pg_dump to take significantly longer, especially when there are many tables. This commit address this by teaching pg_dump to gather attribute statistics for 64 relations at a time. Some simple tests showed this was the optimal batch size, but performance may vary depending on the workload. While this change increases pg_dump's memory usage a bit, it isn't expected to be too egregious and seems well worth the trade-off. Our lookahead code determines the next batch of relations by searching the TOC sequentially for relevant entries. This approach assumes that we will dump all such entries in TOC order, which unfortunately isn't true for dump formats that use RestoreArchive(). RestoreArchive() does multiple passes through the TOC and selectively dumps certain groups of entries each time. This is particularly problematic for index stats and a subset of matview stats; both are in SECTION_POST_DATA, but matview stats that depend on matview data are dumped in RESTORE_PASS_POST_ACL, while all other stats data entries are dumped in RESTORE_PASS_MAIN. To handle this, this commit moves all statistics data entries in SECTION_POST_DATA to RESTORE_PASS_POST_ACL, which ensures that we always dump them in TOC order. A convenient side effect of this change is that we can revert a decent chunk of commit a0a4601765, but that is left for a follow-up commit. Author: Corey Huinker <corey.huin...@gmail.com> Co-authored-by: Nathan Bossart <nathandboss...@gmail.com> Reviewed-by: Jeff Davis <pg...@j-davis.com> Discussion: https://postgr.es/m/CADkLM%3Dc%2Br05srPy9w%2B-%2BnbmLEo15dKXYQ03Q_xyK%2BriJerigLQ%40mail.gmail.com --- src/bin/pg_dump/pg_backup.h | 5 +- src/bin/pg_dump/pg_backup_archiver.c | 30 +++--- src/bin/pg_dump/pg_dump.c | 148 +++++++++++++++++++++++---- 3 files changed, 145 insertions(+), 38 deletions(-) diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h index 781f8fa1cc9..9005b4253b4 100644 --- a/src/bin/pg_dump/pg_backup.h +++ b/src/bin/pg_dump/pg_backup.h @@ -285,7 +285,10 @@ typedef int DumpId; * Function pointer prototypes for assorted callback methods. */ -typedef char *(*DefnDumperPtr) (Archive *AH, const void *userArg); +/* forward declaration to avoid including pg_backup_archiver.h here */ +typedef struct _tocEntry TocEntry; + +typedef char *(*DefnDumperPtr) (Archive *AH, const void *userArg, const TocEntry *te); typedef int (*DataDumperPtr) (Archive *AH, const void *userArg); typedef void (*SetupWorkerPtrType) (Archive *AH); diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c index 3debd0892c7..84f32aef411 100644 --- a/src/bin/pg_dump/pg_backup_archiver.c +++ b/src/bin/pg_dump/pg_backup_archiver.c @@ -2655,7 +2655,7 @@ WriteToc(ArchiveHandle *AH) } else if (te->defnDumper) { - char *defn = te->defnDumper((Archive *) AH, te->defnDumperArg); + char *defn = te->defnDumper((Archive *) AH, te->defnDumperArg, te); te->defnLen = WriteStr(AH, defn); pg_free(defn); @@ -3284,23 +3284,17 @@ _tocEntryRestorePass(ArchiveHandle *AH, TocEntry *te) /* * If statistics data is dependent on materialized view data, it must be - * deferred to RESTORE_PASS_POST_ACL. + * deferred to RESTORE_PASS_POST_ACL. Those entries are already marked + * with SECTION_POST_DATA, and some other stats entries (e.g., index + * stats) will also be marked SECTION_POST_DATA. Additionally, our + * lookahead code in fetchAttributeStats() assumes that we dump all + * statistics data entries in TOC order. To ensure this assumption holds, + * we move all statistics data entries in SECTION_POST_DATA to + * RESTORE_PASS_POST_ACL. */ - if (strcmp(te->desc, "STATISTICS DATA") == 0) - { - for (int i = 0; i < te->nDeps; i++) - { - DumpId depid = te->dependencies[i]; - - if (depid <= AH->maxDumpId && AH->tocsByDumpId[depid] != NULL) - { - TocEntry *otherte = AH->tocsByDumpId[depid]; - - if (strcmp(otherte->desc, "MATERIALIZED VIEW DATA") == 0) - return RESTORE_PASS_POST_ACL; - } - } - } + if (strcmp(te->desc, "STATISTICS DATA") == 0 && + te->section == SECTION_POST_DATA) + return RESTORE_PASS_POST_ACL; /* All else can be handled in the main pass. */ return RESTORE_PASS_MAIN; @@ -3952,7 +3946,7 @@ _printTocEntry(ArchiveHandle *AH, TocEntry *te, const char *pfx) } else if (te->defnDumper) { - char *defn = te->defnDumper((Archive *) AH, te->defnDumperArg); + char *defn = te->defnDumper((Archive *) AH, te->defnDumperArg, te); te->defnLen = ahprintf(AH, "%s\n\n", defn); pg_free(defn); diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index 77fce7c8738..a3bcd9c019a 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -209,6 +209,9 @@ static int nbinaryUpgradeClassOids = 0; static SequenceItem *sequences = NULL; static int nsequences = 0; +/* Maximum number of relations to fetch in a fetchAttributeStats() call. */ +#define MAX_ATTR_STATS_RELS 64 + /* * The default number of rows per INSERT when * --inserts is specified without --rows-per-insert @@ -10553,6 +10556,79 @@ appendNamedArgument(PQExpBuffer out, Archive *fout, const char *argname, appendPQExpBuffer(out, "::%s", argtype); } +/* + * fetchAttributeStats -- + * + * Fetch next batch of rows for getAttributeStats(). + */ +static PGresult * +fetchAttributeStats(Archive *fout) +{ + ArchiveHandle *AH = (ArchiveHandle *) fout; + PQExpBuffer nspnames = createPQExpBuffer(); + PQExpBuffer relnames = createPQExpBuffer(); + int count = 0; + PGresult *res = NULL; + static TocEntry *te; + static bool restarted; + + /* If we're just starting, set our TOC pointer. */ + if (!te) + te = AH->toc->next; + + /* + * We can't easily avoid a second TOC scan for the tar format because it + * writes restore.sql separately, which means we must execute the queries + * twice. This feels risky, but there is no known reason it should + * generate different output than the first pass. Even if it does, the + * worst-case scenario is that restore.sql might have different statistics + * data than the archive. + */ + if (!restarted && te == AH->toc && AH->format == archTar) + { + te = AH->toc->next; + restarted = true; + } + + /* + * Scan the TOC for the next set of relevant stats entries. We assume + * that statistics are dumped in the order they are listed in the TOC. + * This is perhaps not the sturdiest assumption, so we verify it matches + * reality in dumpRelationStats_dumper(). + */ + for (; te != AH->toc && count < MAX_ATTR_STATS_RELS; te = te->next) + { + if ((te->reqs & REQ_STATS) != 0 && + strcmp(te->desc, "STATISTICS DATA") == 0) + { + RelStatsInfo *rsinfo = (RelStatsInfo *) te->defnDumperArg; + + appendPQExpBuffer(nspnames, "%s%s", count ? "," : "", + fmtId(rsinfo->dobj.namespace->dobj.name)); + appendPQExpBuffer(relnames, "%s%s", count ? "," : "", + fmtId(rsinfo->dobj.name)); + count++; + } + } + + /* Execute the query for the next batch of relations. */ + if (count > 0) + { + PQExpBuffer query = createPQExpBuffer(); + + appendPQExpBuffer(query, "EXECUTE getAttributeStats(" + "'{%s}'::pg_catalog.name[]," + "'{%s}'::pg_catalog.name[])", + nspnames->data, relnames->data); + res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK); + destroyPQExpBuffer(query); + } + + destroyPQExpBuffer(nspnames); + destroyPQExpBuffer(relnames); + return res; +} + /* * dumpRelationStats_dumper -- * @@ -10561,14 +10637,17 @@ appendNamedArgument(PQExpBuffer out, Archive *fout, const char *argname, * dumped. */ static char * -dumpRelationStats_dumper(Archive *fout, const void *userArg) +dumpRelationStats_dumper(Archive *fout, const void *userArg, const TocEntry *te) { const RelStatsInfo *rsinfo = (RelStatsInfo *) userArg; const DumpableObject *dobj = &rsinfo->dobj; - PGresult *res; + static PGresult *res; + static int rownum; PQExpBuffer query; PQExpBufferData out_data; PQExpBuffer out = &out_data; + int i_schemaname; + int i_tablename; int i_attname; int i_inherited; int i_null_frac; @@ -10584,13 +10663,31 @@ dumpRelationStats_dumper(Archive *fout, const void *userArg) int i_range_length_histogram; int i_range_empty_frac; int i_range_bounds_histogram; + static TocEntry *expected_te; + + /* + * fetchAttributeStats() assumes that the statistics are dumped in the + * order they are listed in the TOC. We verify that here for safety. + */ + if (!expected_te) + expected_te = ((ArchiveHandle *) fout)->toc; + + expected_te = expected_te->next; + while ((expected_te->reqs & REQ_STATS) == 0 || + strcmp(expected_te->desc, "STATISTICS DATA") != 0) + expected_te = expected_te->next; + + if (te != expected_te) + pg_fatal("stats dumped out of order (current: %d %s %s) (expected: %d %s %s)", + te->dumpId, te->desc, te->tag, + expected_te->dumpId, expected_te->desc, expected_te->tag); query = createPQExpBuffer(); if (!fout->is_prepared[PREPQUERY_GETATTRIBUTESTATS]) { appendPQExpBufferStr(query, - "PREPARE getAttributeStats(pg_catalog.name, pg_catalog.name) AS\n" - "SELECT s.attname, s.inherited, " + "PREPARE getAttributeStats(pg_catalog.name[], pg_catalog.name[]) AS\n" + "SELECT s.schemaname, s.tablename, s.attname, s.inherited, " "s.null_frac, s.avg_width, s.n_distinct, " "s.most_common_vals, s.most_common_freqs, " "s.histogram_bounds, s.correlation, " @@ -10608,11 +10705,21 @@ dumpRelationStats_dumper(Archive *fout, const void *userArg) "NULL AS range_empty_frac," "NULL AS range_bounds_histogram "); + /* + * The results must be in the order of the relations supplied in the + * parameters to ensure we remain in sync as we walk through the TOC. + * The redundant filter clause on s.tablename = ANY(...) seems + * sufficient to convince the planner to use + * pg_class_relname_nsp_index, which avoids a full scan of pg_stats. + * This may not work for all versions. + */ appendPQExpBufferStr(query, "FROM pg_catalog.pg_stats s " - "WHERE s.schemaname = $1 " - "AND s.tablename = $2 " - "ORDER BY s.attname, s.inherited"); + "JOIN unnest($1, $2) WITH ORDINALITY AS u (schemaname, tablename, ord) " + "ON s.schemaname = u.schemaname " + "AND s.tablename = u.tablename " + "WHERE s.tablename = ANY($2) " + "ORDER BY u.ord, s.attname, s.inherited"); ExecuteSqlStatement(fout, query->data); @@ -10642,16 +10749,16 @@ dumpRelationStats_dumper(Archive *fout, const void *userArg) appendPQExpBufferStr(out, "\n);\n"); + /* Fetch the next batch of attribute statistics if needed. */ + if (rownum >= PQntuples(res)) + { + PQclear(res); + res = fetchAttributeStats(fout); + rownum = 0; + } - /* fetch attribute stats */ - appendPQExpBufferStr(query, "EXECUTE getAttributeStats("); - appendStringLiteralAH(query, dobj->namespace->dobj.name, fout); - appendPQExpBufferStr(query, ", "); - appendStringLiteralAH(query, dobj->name, fout); - appendPQExpBufferStr(query, ");"); - - res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK); - + i_schemaname = PQfnumber(res, "schemaname"); + i_tablename = PQfnumber(res, "tablename"); i_attname = PQfnumber(res, "attname"); i_inherited = PQfnumber(res, "inherited"); i_null_frac = PQfnumber(res, "null_frac"); @@ -10669,10 +10776,15 @@ dumpRelationStats_dumper(Archive *fout, const void *userArg) i_range_bounds_histogram = PQfnumber(res, "range_bounds_histogram"); /* restore attribute stats */ - for (int rownum = 0; rownum < PQntuples(res); rownum++) + for (; rownum < PQntuples(res); rownum++) { const char *attname; + /* Stop if the next stat row in our cache isn't for this relation. */ + if (strcmp(dobj->name, PQgetvalue(res, rownum, i_tablename)) != 0 || + strcmp(dobj->namespace->dobj.name, PQgetvalue(res, rownum, i_schemaname)) != 0) + break; + appendPQExpBufferStr(out, "SELECT * FROM pg_catalog.pg_restore_attribute_stats(\n"); appendPQExpBuffer(out, "\t'version', '%u'::integer,\n", fout->remoteVersion); @@ -10762,8 +10874,6 @@ dumpRelationStats_dumper(Archive *fout, const void *userArg) appendPQExpBufferStr(out, "\n);\n"); } - PQclear(res); - destroyPQExpBuffer(query); return out->data; } -- 2.39.5 (Apple Git-154)
>From 3860fc596ea6038c5f4edeb1ab3de85d8a297013 Mon Sep 17 00:00:00 2001 From: Nathan Bossart <nat...@postgresql.org> Date: Thu, 3 Apr 2025 20:13:40 -0500 Subject: [PATCH v12n6 4/4] Partially revert commit a0a4601765. Thanks to commit XXXXXXXXXX, which simplified some code in _tocEntryRestorePass(), we can remove the now-unused ArchiveHandle parameter from _tocEntryRestorePass() and move_to_ready_heap(). Reviewed-by: Jeff Davis <pg...@j-davis.com> Discussion: https://postgr.es/m/Z-3x2AnPCP331JA3%40nathan --- src/bin/pg_dump/pg_backup_archiver.c | 26 ++++++++++++-------------- 1 file changed, 12 insertions(+), 14 deletions(-) diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c index 84f32aef411..5ccedc383bb 100644 --- a/src/bin/pg_dump/pg_backup_archiver.c +++ b/src/bin/pg_dump/pg_backup_archiver.c @@ -72,7 +72,7 @@ static void processEncodingEntry(ArchiveHandle *AH, TocEntry *te); static void processStdStringsEntry(ArchiveHandle *AH, TocEntry *te); static void processSearchPathEntry(ArchiveHandle *AH, TocEntry *te); static int _tocEntryRequired(TocEntry *te, teSection curSection, ArchiveHandle *AH); -static RestorePass _tocEntryRestorePass(ArchiveHandle *AH, TocEntry *te); +static RestorePass _tocEntryRestorePass(TocEntry *te); static bool _tocEntryIsACL(TocEntry *te); static void _disableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te); static void _enableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te); @@ -102,8 +102,7 @@ static void pending_list_append(TocEntry *l, TocEntry *te); static void pending_list_remove(TocEntry *te); static int TocEntrySizeCompareQsort(const void *p1, const void *p2); static int TocEntrySizeCompareBinaryheap(void *p1, void *p2, void *arg); -static void move_to_ready_heap(ArchiveHandle *AH, - TocEntry *pending_list, +static void move_to_ready_heap(TocEntry *pending_list, binaryheap *ready_heap, RestorePass pass); static TocEntry *pop_next_work_item(binaryheap *ready_heap, @@ -749,7 +748,7 @@ RestoreArchive(Archive *AHX) if ((te->reqs & (REQ_SCHEMA | REQ_DATA | REQ_STATS)) == 0) continue; /* ignore if not to be dumped at all */ - switch (_tocEntryRestorePass(AH, te)) + switch (_tocEntryRestorePass(te)) { case RESTORE_PASS_MAIN: (void) restore_toc_entry(AH, te, false); @@ -768,7 +767,7 @@ RestoreArchive(Archive *AHX) for (te = AH->toc->next; te != AH->toc; te = te->next) { if ((te->reqs & (REQ_SCHEMA | REQ_DATA | REQ_STATS)) != 0 && - _tocEntryRestorePass(AH, te) == RESTORE_PASS_ACL) + _tocEntryRestorePass(te) == RESTORE_PASS_ACL) (void) restore_toc_entry(AH, te, false); } } @@ -778,7 +777,7 @@ RestoreArchive(Archive *AHX) for (te = AH->toc->next; te != AH->toc; te = te->next) { if ((te->reqs & (REQ_SCHEMA | REQ_DATA | REQ_STATS)) != 0 && - _tocEntryRestorePass(AH, te) == RESTORE_PASS_POST_ACL) + _tocEntryRestorePass(te) == RESTORE_PASS_POST_ACL) (void) restore_toc_entry(AH, te, false); } } @@ -3261,7 +3260,7 @@ _tocEntryRequired(TocEntry *te, teSection curSection, ArchiveHandle *AH) * See notes with the RestorePass typedef in pg_backup_archiver.h. */ static RestorePass -_tocEntryRestorePass(ArchiveHandle *AH, TocEntry *te) +_tocEntryRestorePass(TocEntry *te) { /* "ACL LANGUAGE" was a crock emitted only in PG 7.4 */ if (strcmp(te->desc, "ACL") == 0 || @@ -4344,7 +4343,7 @@ restore_toc_entries_prefork(ArchiveHandle *AH, TocEntry *pending_list) * not set skipped_some in this case, since by assumption no main-pass * items could depend on these. */ - if (_tocEntryRestorePass(AH, next_work_item) != RESTORE_PASS_MAIN) + if (_tocEntryRestorePass(next_work_item) != RESTORE_PASS_MAIN) do_now = false; if (do_now) @@ -4426,7 +4425,7 @@ restore_toc_entries_parallel(ArchiveHandle *AH, ParallelState *pstate, * process in the current restore pass. */ AH->restorePass = RESTORE_PASS_MAIN; - move_to_ready_heap(AH, pending_list, ready_heap, AH->restorePass); + move_to_ready_heap(pending_list, ready_heap, AH->restorePass); /* * main parent loop @@ -4475,7 +4474,7 @@ restore_toc_entries_parallel(ArchiveHandle *AH, ParallelState *pstate, /* Advance to next restore pass */ AH->restorePass++; /* That probably allows some stuff to be made ready */ - move_to_ready_heap(AH, pending_list, ready_heap, AH->restorePass); + move_to_ready_heap(pending_list, ready_heap, AH->restorePass); /* Loop around to see if anything's now ready */ continue; } @@ -4646,8 +4645,7 @@ TocEntrySizeCompareBinaryheap(void *p1, void *p2, void *arg) * which applies the same logic one-at-a-time.) */ static void -move_to_ready_heap(ArchiveHandle *AH, - TocEntry *pending_list, +move_to_ready_heap(TocEntry *pending_list, binaryheap *ready_heap, RestorePass pass) { @@ -4660,7 +4658,7 @@ move_to_ready_heap(ArchiveHandle *AH, next_te = te->pending_next; if (te->depCount == 0 && - _tocEntryRestorePass(AH, te) == pass) + _tocEntryRestorePass(te) == pass) { /* Remove it from pending_list ... */ pending_list_remove(te); @@ -5054,7 +5052,7 @@ reduce_dependencies(ArchiveHandle *AH, TocEntry *te, * memberships changed. */ if (otherte->depCount == 0 && - _tocEntryRestorePass(AH, otherte) == AH->restorePass && + _tocEntryRestorePass(otherte) == AH->restorePass && otherte->pending_prev != NULL && ready_heap != NULL) { -- 2.39.5 (Apple Git-154)