Hello list,
I implemented --clean support for --data-only, in order to avoid logging
to the WAL while populating the database. The attached patch issues a
TRUNCATE before COPY on each worker process, and provides a significant
speed advantage if the cluster is configure with wal_level=minimal.
It also provides a safer way to load the database, as avoiding WAL logging
also avoids potential and painful ENOSPACE on the WAL partition as I
experienced in [1]. In other words it makes things much better for my use
case.
[1]
https://www.postgresql.org/message-id/flat/076464ad-3d70-dd25-9e8f-e84f27decfba%40gmx.net
But it has some rough edges. I would appreciate guidance and feedback.
* When the table-to-be-TRUNCATEd is referenced as foreign key from other
table, the whole transaction fails with:
ERROR: cannot truncate a table referenced in a foreign key constraint
1. As a first step, when TRUNCATE fails I want to try a DELETE FROM
instead, which has more chances of succeeding, and continuing with
the COPY. How to detect the failure of ahprintf("TRUNCATE") and do
the alternative without failing the whole transaction?
2. Why doesn't --disable-triggers help?
To test this, I have manually issued
ALTER TABLE x DISABLE TRIGGER ALL
to every table and issued manual TRUNCATE still fails. Shouldn't
postgres skip the referential integrity checks?
3. In my tests, all my tables start empty since I have just created the
schema. Then pg_restore --data-only --clean first populates
the /referencing/ tables, which is allowed because of disabled
triggers, and then it tries to load the /referenced/ table.
At this point the referential integrity is already broken. Getting an
error when TRUNCATing the empty /referenced/ table doesn't make
sense.
What do you think?
Thank you in advance,
Dimitris
From ee28b0b9c1d3b78f318f259f4907785db98f31e6 Mon Sep 17 00:00:00 2001
From: Dimitrios Apostolou <ji...@qt.io>
Date: Sat, 12 Apr 2025 01:59:45 +0200
Subject: [PATCH v1] pg_restore --clean --data-only
In parallel restore, it issues a TRUNCATE before COPYing the data into
the tables, within a transaction.
As a result it avoid logging to the WAL, when combined with
wal_level=minimal.
---
doc/src/sgml/ref/pg_restore.sgml | 10 ++++++++++
src/bin/pg_dump/pg_backup.h | 1 +
src/bin/pg_dump/pg_backup_archiver.c | 3 ++-
src/bin/pg_dump/pg_restore.c | 6 +++---
4 files changed, 16 insertions(+), 4 deletions(-)
diff --git a/doc/src/sgml/ref/pg_restore.sgml b/doc/src/sgml/ref/pg_restore.sgml
index 2e3ba802581..697ca25d70a 100644
--- a/doc/src/sgml/ref/pg_restore.sgml
+++ b/doc/src/sgml/ref/pg_restore.sgml
@@ -112,18 +112,28 @@ PostgreSQL documentation
<listitem>
<para>
Before restoring database objects, issue commands
to <command>DROP</command> all the objects that will be restored.
This option is useful for overwriting an existing database.
If any of the objects do not exist in the destination database,
ignorable error messages will be reported,
unless <option>--if-exists</option> is also specified.
</para>
+ <para>
+ In combination with <option>--data-only</option> a TRUNCATE will be
+ attempted instead of DROP, before COPYing the data. So if you want
+ to overwrite an existing database without re-writing the schema, then
+ issue <option>--data-only --clean</option>. Together
+ with <option>--parallel</option> it is a high performance way to load
+ the tables, as it avoids logging to the WAL (if the server is
+ configured with <option>wal_level=minimal</option>). Warning:
+ foreign key constraints might cause table truncation to fail.
+ </para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>-C</option></term>
<term><option>--create</option></term>
<listitem>
<para>
Create the database before restoring into it.
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index fbf5f1c515e..642795f0223 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -97,18 +97,19 @@ typedef struct _restoreOptions
int noOwner; /* Don't try to match original object owner */
int noTableAm; /* Don't issue table-AM-related commands */
int noTablespace; /* Don't issue tablespace-related commands */
int disable_triggers; /* disable triggers during data-only
* restore */
int use_setsessauth; /* Use SET SESSION AUTHORIZATION commands
* instead of OWNER TO */
char *superuser; /* Username to use as superuser */
char *use_role; /* Issue SET ROLE to this */
+ int clean;
int dropSchema;
int disable_dollar_quoting;
int dump_inserts; /* 0 = COPY, otherwise rows per INSERT */
int column_inserts;
int if_exists;
int no_comments; /* Skip comments */
int no_publications; /* Skip publication entries */
int no_security_labels; /* Skip security label entries */
int no_subscriptions; /* Skip subscription entries */
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index f1ffed038b0..b11c84b2d7e 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -976,19 +976,20 @@ restore_toc_entry(ArchiveHandle *AH, TocEntry *te, bool is_parallel)
* to that from using single_txn mode in non-parallel
* restores.
*
* We mustn't do this for load-via-partition-root cases
* because some data might get moved across partition
* boundaries, risking deadlock and/or loss of previously
* loaded data. (We assume that all partitions of a
* partitioned table will be treated the same way.)
*/
- use_truncate = is_parallel && te->created &&
+ use_truncate = is_parallel &&
+ (te->created || (ropt->dataOnly && ropt->clean)) &&
!is_load_via_partition_root(te);
if (use_truncate)
{
/*
* Parallel restore is always talking directly to a
* server, so no need to see if we should issue BEGIN.
*/
StartTransaction(&AH->public);
diff --git a/src/bin/pg_dump/pg_restore.c b/src/bin/pg_dump/pg_restore.c
index df119591cca..8aa6f3db57d 100644
--- a/src/bin/pg_dump/pg_restore.c
+++ b/src/bin/pg_dump/pg_restore.c
@@ -158,19 +158,19 @@ main(int argc, char **argv)
while ((c = getopt_long(argc, argv, "acCd:ef:F:h:I:j:lL:n:N:Op:P:RsS:t:T:U:vwWx1",
cmdopts, NULL)) != -1)
{
switch (c)
{
case 'a': /* Dump data only */
opts->dataOnly = 1;
break;
case 'c': /* clean (i.e., drop) schema prior to create */
- opts->dropSchema = 1;
+ opts->clean = 1;
break;
case 'C':
opts->createDB = 1;
break;
case 'd':
opts->cparams.dbname = pg_strdup(optarg);
break;
case 'e':
opts->exit_on_error = true;
@@ -337,20 +337,20 @@ main(int argc, char **argv)
pg_log_error_hint("Try \"%s --help\" for more information.", progname);
exit_nicely(1);
}
opts->useDB = 1;
}
if (opts->dataOnly && opts->schemaOnly)
pg_fatal("options -s/--schema-only and -a/--data-only cannot be used together");
- if (opts->dataOnly && opts->dropSchema)
- pg_fatal("options -c/--clean and -a/--data-only cannot be used together");
+ if (opts->clean && !opts->dataOnly)
+ opts->dropSchema = 1;
if (opts->single_txn && opts->txn_size > 0)
pg_fatal("options -1/--single-transaction and --transaction-size cannot be used together");
/*
* -C is not compatible with -1, because we can't create a database inside
* a transaction block.
*/
if (opts->createDB && opts->single_txn)
--
2.49.0