Hello list,

I implemented --clean support for --data-only, in order to avoid logging to the WAL while populating the database. The attached patch issues a TRUNCATE before COPY on each worker process, and provides a significant speed advantage if the cluster is configure with wal_level=minimal.

It also provides a safer way to load the database, as avoiding WAL logging also avoids potential and painful ENOSPACE on the WAL partition as I experienced in [1]. In other words it makes things much better for my use case.

[1] 
https://www.postgresql.org/message-id/flat/076464ad-3d70-dd25-9e8f-e84f27decfba%40gmx.net

But it has some rough edges. I would appreciate guidance and feedback.

* When the table-to-be-TRUNCATEd is referenced as foreign key from other
  table, the whole transaction fails with:

    ERROR: cannot truncate a table referenced in a foreign key constraint

  1. As a first step, when TRUNCATE fails I want to try a DELETE FROM
     instead, which has more chances of succeeding, and continuing with
     the COPY. How to detect the failure of ahprintf("TRUNCATE") and do
     the alternative without failing the whole transaction?

  2. Why doesn't --disable-triggers help?
     To test this, I have manually issued

       ALTER TABLE x DISABLE TRIGGER ALL

     to every table and issued manual TRUNCATE still fails. Shouldn't
     postgres skip the referential integrity checks?

  3. In my tests, all my tables start empty since I have just created the
     schema. Then pg_restore --data-only --clean first populates
     the /referencing/ tables, which is allowed because of disabled
     triggers, and then it tries to load the /referenced/ table.

     At this point the referential integrity is already broken. Getting an
     error when TRUNCATing the empty /referenced/ table doesn't make
     sense.


What do you think?

Thank you in advance,
Dimitris
From ee28b0b9c1d3b78f318f259f4907785db98f31e6 Mon Sep 17 00:00:00 2001
From: Dimitrios Apostolou <ji...@qt.io>
Date: Sat, 12 Apr 2025 01:59:45 +0200
Subject: [PATCH v1] pg_restore --clean --data-only

In parallel restore, it issues a TRUNCATE before COPYing the data into
the tables, within a transaction.

As a result it avoid logging to the WAL, when combined with
wal_level=minimal.
---
 doc/src/sgml/ref/pg_restore.sgml     | 10 ++++++++++
 src/bin/pg_dump/pg_backup.h          |  1 +
 src/bin/pg_dump/pg_backup_archiver.c |  3 ++-
 src/bin/pg_dump/pg_restore.c         |  6 +++---
 4 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/ref/pg_restore.sgml b/doc/src/sgml/ref/pg_restore.sgml
index 2e3ba802581..697ca25d70a 100644
--- a/doc/src/sgml/ref/pg_restore.sgml
+++ b/doc/src/sgml/ref/pg_restore.sgml
@@ -112,18 +112,28 @@ PostgreSQL documentation
       <listitem>
        <para>
         Before restoring database objects, issue commands
         to <command>DROP</command> all the objects that will be restored.
         This option is useful for overwriting an existing database.
         If any of the objects do not exist in the destination database,
         ignorable error messages will be reported,
         unless <option>--if-exists</option> is also specified.
        </para>
+       <para>
+         In combination with <option>--data-only</option> a TRUNCATE will be
+         attempted instead of DROP, before COPYing the data.  So if you want
+         to overwrite an existing database without re-writing the schema, then
+         issue <option>--data-only --clean</option>.  Together
+         with <option>--parallel</option> it is a high performance way to load
+         the tables, as it avoids logging to the WAL (if the server is
+         configured with <option>wal_level=minimal</option>).  Warning:
+         foreign key constraints might cause table truncation to fail.
+       </para>
       </listitem>
      </varlistentry>
 
      <varlistentry>
       <term><option>-C</option></term>
       <term><option>--create</option></term>
       <listitem>
        <para>
         Create the database before restoring into it.
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index fbf5f1c515e..642795f0223 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -97,18 +97,19 @@ typedef struct _restoreOptions
 	int			noOwner;		/* Don't try to match original object owner */
 	int			noTableAm;		/* Don't issue table-AM-related commands */
 	int			noTablespace;	/* Don't issue tablespace-related commands */
 	int			disable_triggers;	/* disable triggers during data-only
 									 * restore */
 	int			use_setsessauth;	/* Use SET SESSION AUTHORIZATION commands
 									 * instead of OWNER TO */
 	char	   *superuser;		/* Username to use as superuser */
 	char	   *use_role;		/* Issue SET ROLE to this */
+	int			clean;
 	int			dropSchema;
 	int			disable_dollar_quoting;
 	int			dump_inserts;	/* 0 = COPY, otherwise rows per INSERT */
 	int			column_inserts;
 	int			if_exists;
 	int			no_comments;	/* Skip comments */
 	int			no_publications;	/* Skip publication entries */
 	int			no_security_labels; /* Skip security label entries */
 	int			no_subscriptions;	/* Skip subscription entries */
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index f1ffed038b0..b11c84b2d7e 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -976,19 +976,20 @@ restore_toc_entry(ArchiveHandle *AH, TocEntry *te, bool is_parallel)
 					 * to that from using single_txn mode in non-parallel
 					 * restores.
 					 *
 					 * We mustn't do this for load-via-partition-root cases
 					 * because some data might get moved across partition
 					 * boundaries, risking deadlock and/or loss of previously
 					 * loaded data.  (We assume that all partitions of a
 					 * partitioned table will be treated the same way.)
 					 */
-					use_truncate = is_parallel && te->created &&
+					use_truncate = is_parallel &&
+						(te->created || (ropt->dataOnly && ropt->clean)) &&
 						!is_load_via_partition_root(te);
 
 					if (use_truncate)
 					{
 						/*
 						 * Parallel restore is always talking directly to a
 						 * server, so no need to see if we should issue BEGIN.
 						 */
 						StartTransaction(&AH->public);
diff --git a/src/bin/pg_dump/pg_restore.c b/src/bin/pg_dump/pg_restore.c
index df119591cca..8aa6f3db57d 100644
--- a/src/bin/pg_dump/pg_restore.c
+++ b/src/bin/pg_dump/pg_restore.c
@@ -158,19 +158,19 @@ main(int argc, char **argv)
 	while ((c = getopt_long(argc, argv, "acCd:ef:F:h:I:j:lL:n:N:Op:P:RsS:t:T:U:vwWx1",
 							cmdopts, NULL)) != -1)
 	{
 		switch (c)
 		{
 			case 'a':			/* Dump data only */
 				opts->dataOnly = 1;
 				break;
 			case 'c':			/* clean (i.e., drop) schema prior to create */
-				opts->dropSchema = 1;
+				opts->clean = 1;
 				break;
 			case 'C':
 				opts->createDB = 1;
 				break;
 			case 'd':
 				opts->cparams.dbname = pg_strdup(optarg);
 				break;
 			case 'e':
 				opts->exit_on_error = true;
@@ -337,20 +337,20 @@ main(int argc, char **argv)
 			pg_log_error_hint("Try \"%s --help\" for more information.", progname);
 			exit_nicely(1);
 		}
 		opts->useDB = 1;
 	}
 
 	if (opts->dataOnly && opts->schemaOnly)
 		pg_fatal("options -s/--schema-only and -a/--data-only cannot be used together");
 
-	if (opts->dataOnly && opts->dropSchema)
-		pg_fatal("options -c/--clean and -a/--data-only cannot be used together");
+	if (opts->clean && !opts->dataOnly)
+		opts->dropSchema = 1;
 
 	if (opts->single_txn && opts->txn_size > 0)
 		pg_fatal("options -1/--single-transaction and --transaction-size cannot be used together");
 
 	/*
 	 * -C is not compatible with -1, because we can't create a database inside
 	 * a transaction block.
 	 */
 	if (opts->createDB && opts->single_txn)
-- 
2.49.0

Reply via email to