On Fri, 11 Oct 2024 at 04:01, Daniel Gustafsson <dan...@yesql.se> wrote:
>
> > On 7 Oct 2024, at 22:04, Nathan Bossart <nathandboss...@gmail.com> wrote:
> >
> > On Mon, Oct 07, 2024 at 03:37:35PM -0400, Bruce Momjian wrote:
> >> On Tue, Oct  1, 2024 at 09:28:54AM +0200, Daniel Gustafsson wrote:
> >>> Correct, sorry for being unclear.  The consistency argument would be to 
> >>> expand
> >>> pg_upgrade to report all invalid databases rather than just the first 
> >>> found;
> >>> attempting to fix problems would be a new behavior.
> >>
> >> Yes, historically pg_upgrade will fail if it finds anything unusual,
> >> mostly because what it does normally is already scary enough.  If users
> >> what pg_upgrade to do cleanups, it would be enabled by a separate flag,
> >> or even a new command-line app.
> >
> > While I suspect it's rare that someone CTRL-C's out of an accidental DROP
> > DATABASE and then runs pg_upgrade before trying to recover the data, I
> > agree with the principle of having pg_upgrade fail by default for things
> > like this.  If we did add a new flag, the new invalid database report that
> > Daniel mentions could say something like "try again with
> > --skip-invalid-databases to have pg_upgrade automatically drop invalid
> > databases."
>
> If we are teaching pg_upgrade to handle errors, either by skipping or by
> fixing, then I believe this is the right way to go about it.  A successful run
> should probably also create a report of the databases which were skipped.

In v2 I've made changes to the patch incorporating the suggestions here:

* Default behaviour is to just fail with a report of all invalid databases

* A new option --skip-invalid-databases will then skip the checks, and
would not transfer any invalid database to the new cluster. A warning
with a report file will then follow after a successful run.

Dropping invalid databases in the old cluster will make invalid
databases unrecoverable, so I opted for a skip over invalid databases
approach that would leave invalid databases in the old cluster.

Apart from a missing --skip-invalid-databases test, does this attempt look OK?
From e51f581ddc1158a9fb2840dfc04863618a390887 Mon Sep 17 00:00:00 2001
From: Thomas Krennwallner <tea...@aiven.io>
Date: Fri, 20 Sep 2024 17:33:05 -0400
Subject: [PATCH] pg_upgrade: Add check for invalid databases.

Currently, pg_upgrade fails to connect to the first invalid database
it encounters in get_loadable_libraries() and then aborts.  While we
print a fatal message with a hint what should be done, the output is
terse and does not report further invalid databases present in an
installation.

Instead of just exiting on first error, we collect all
pg_database.datconnlimit values in get_db_infos() and order invalid
databases after valid ones in cluster.dbarr. This allows for skipping
over invalid databases in all checks with --skip-invalid-databases, a
new option where we do not transfer such databases from the old
cluster.

Unless we use --skip-invalid-databases, check_for_invalid_dbs()
collects all invalid databases in a report file invalid_databases.txt.
After a successful pg_upgrade run with skipped invalid databases,
create_invalid_databases_report() will collect them in a report file
skipped_invalid_databases.txt for the completion banner output.
---
 doc/src/sgml/ref/pgupgrade.sgml        |  19 ++++
 src/bin/pg_upgrade/check.c             | 131 +++++++++++++++++++++++--
 src/bin/pg_upgrade/info.c              |  20 +++-
 src/bin/pg_upgrade/option.c            |   7 ++
 src/bin/pg_upgrade/pg_upgrade.c        |  18 +++-
 src/bin/pg_upgrade/pg_upgrade.h        |   8 +-
 src/bin/pg_upgrade/t/002_pg_upgrade.pl |   2 +-
 7 files changed, 187 insertions(+), 18 deletions(-)

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 4777381dac..a9a6b62e7a 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -253,6 +253,25 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--skip-invalid-databases</option></term>
+      <listitem>
+       <para>
+        If a cluster contains invalid databases, the default behaviour of
+        <application>pg_upgrade</application> is to fail with a report of all
+        such databases found in an installation. Invalid databases, which cannot
+        be connected to anymore, are a consequence of interrupted
+        <command>DROP DATABASE</command> and marked as corrupted databases in
+        <link linkend="catalog-pg-database"><structname>pg_database</structname></link>.<structfield>datconnlimit</structfield>.
+       </para>
+       <para>
+        When using option <option>--skip-invalid-databases</option>,
+        <application>pg_upgrade</application> instead will skip all invalid
+        databases and do not transfer them over to the new cluster.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>--sync-method=</option><replaceable>method</replaceable></term>
       <listitem>
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 12735a4268..e372feff1a 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -12,6 +12,7 @@
 #include "catalog/pg_authid_d.h"
 #include "catalog/pg_class_d.h"
 #include "catalog/pg_collation.h"
+#include "catalog/pg_database.h"
 #include "fe_utils/string_utils.h"
 #include "mb/pg_wchar.h"
 #include "pg_upgrade.h"
@@ -19,6 +20,7 @@
 static void check_new_cluster_is_empty(void);
 static void check_is_install_user(ClusterInfo *cluster);
 static void check_proper_datallowconn(ClusterInfo *cluster);
+static void check_for_invalid_dbs(ClusterInfo *cluster);
 static void check_for_prepared_transactions(ClusterInfo *cluster);
 static void check_for_isn_and_int8_passing_mismatch(ClusterInfo *cluster);
 static void check_for_user_defined_postfix_ops(ClusterInfo *cluster);
@@ -598,6 +600,11 @@ check_and_dump_old_cluster(void)
 	 */
 	get_db_rel_and_slot_infos(&old_cluster);
 
+	/*
+	 * Verify that the old cluster does not contain invalid databases.
+	 */
+	check_for_invalid_dbs(&old_cluster);
+
 	init_tablespaces();
 
 	get_loadable_libraries();
@@ -691,6 +698,11 @@ check_new_cluster(void)
 {
 	get_db_rel_and_slot_infos(&new_cluster);
 
+	/*
+	 * Verify that the new cluster does not contain invalid databases.
+	 */
+	check_for_invalid_dbs(&new_cluster);
+
 	check_new_cluster_is_empty();
 
 	check_loadable_libraries();
@@ -763,7 +775,7 @@ issue_warnings_and_set_wal_level(void)
 
 
 void
-output_completion_banner(char *deletion_script_file_name)
+output_completion_banner(char *deletion_script_file_name, char *skipped_invalid_databases_file_name)
 {
 	PQExpBufferData user_specification;
 
@@ -775,6 +787,14 @@ output_completion_banner(char *deletion_script_file_name)
 		appendPQExpBufferChar(&user_specification, ' ');
 	}
 
+	if (skipped_invalid_databases_file_name)
+	{
+		pg_log(PG_REPORT,
+			   "The following invalid databases were not transferred by pg_upgrade:\n"
+			   "    %s",
+			   skipped_invalid_databases_file_name);
+	}
+
 	pg_log(PG_REPORT,
 		   "Optimizer statistics are not transferred by pg_upgrade.\n"
 		   "Once you start the new server, consider running:\n"
@@ -998,7 +1018,7 @@ create_script_for_old_cluster_deletion(char **deletion_script_file_name)
 
 			fprintf(script, "\n");
 
-			for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+			for (dbnum = 0; dbnum < old_cluster.dbarr.ntotaldbs; dbnum++)
 				fprintf(script, RMDIR_CMD " %c%s%c%u%c\n", PATH_QUOTE,
 						fix_path_separator(os_info.old_tablespaces[tblnum]),
 						PATH_SEPARATOR, old_cluster.dbarr.dbs[dbnum].db_oid,
@@ -1087,6 +1107,96 @@ check_is_install_user(ClusterInfo *cluster)
 	check_ok();
 }
 
+/*
+ *	check_for_invalid_dbs
+ *
+ *	Ensure that all databases are valid as connections to invalid databases
+ *	are not allowed.
+ */
+static void
+check_for_invalid_dbs(ClusterInfo *cluster)
+{
+	int			dbnum;
+	FILE	   *script = NULL;
+	char		output_path[MAXPGPATH];
+
+	prep_status("Checking for invalid databases");
+
+	if (user_opts.skip_invalid_dbs)
+	{
+		pg_log(PG_REPORT, "skipped");
+		return;
+	}
+
+	snprintf(output_path, sizeof(output_path), "%s/%s",
+			 log_opts.basedir,
+			 "invalid_databases.txt");
+
+	/*
+	 * If invalid databases exist in cluster they are ordered after the valid
+	 * ones in dbarr.
+	 */
+	for (dbnum = cluster->dbarr.ndbs; dbnum < cluster->dbarr.ntotaldbs; dbnum++)
+	{
+		if (script == NULL && (script = fopen_priv(output_path, "w")) == NULL)
+			pg_fatal("could not open file \"%s\": %m", output_path);
+		fprintf(script, "%s\n", cluster->dbarr.dbs[dbnum].db_name);
+	}
+
+	if (script)
+	{
+		fclose(script);
+		pg_log(PG_REPORT, "fatal");
+		pg_fatal("Your installation contains invalid databases as a consequence of\n"
+				 "interrupted DROP DATABASE.  They are now marked as corrupted databases\n"
+				 "that cannot be connected to anymore.  Consider removing them using\n"
+				 "    DROP DATABASE ...;\n"
+				 "or run pg_upgrade with --skip-invalid-databases to skip transferring\n"
+				 "invalid databases to the new cluster.\n"
+				 "A list of invalid databases is in the file:\n"
+				 "    %s", output_path);
+	}
+	else
+		check_ok();
+}
+
+
+/*
+ * create_invalid_databases_report()
+ *
+ *	This is particularly useful for listing skipped invalid databases.
+ */
+void
+create_invalid_databases_report(char **skipped_invalid_databases_file_name)
+{
+	int			dbnum;
+	FILE	   *script = NULL;
+
+	if (!user_opts.skip_invalid_dbs || old_cluster.dbarr.ndbs == old_cluster.dbarr.ntotaldbs)
+		return;
+
+	*skipped_invalid_databases_file_name = psprintf("%sskipped_invalid_databases.txt",
+													SCRIPT_PREFIX);
+
+	prep_status("Creating report for skipped invalid databases");
+
+	for (dbnum = old_cluster.dbarr.ndbs; dbnum < old_cluster.dbarr.ntotaldbs; dbnum++)
+	{
+		if (script == NULL && (script = fopen_priv(*skipped_invalid_databases_file_name, "w")) == NULL)
+			pg_fatal("could not open file \"%s\": %m", *skipped_invalid_databases_file_name);
+		fprintf(script, "%s\n", old_cluster.dbarr.dbs[dbnum].db_name);
+	}
+
+	if (script != NULL)
+	{
+		report_status(PG_WARNING, "warning");
+		pg_log(PG_WARNING,
+			   "\nWARNING:  old cluster contains invalid databases which have been skipped");
+		fclose(script);
+	}
+	else
+		check_ok();
+}
 
 /*
  *	check_proper_datallowconn
@@ -1115,10 +1225,15 @@ check_proper_datallowconn(ClusterInfo *cluster)
 
 	conn_template1 = connectToServer(cluster, "template1");
 
-	/* get database names */
+	/*
+	 * get database names; skip invalid databases as they are handled in
+	 * check_for_invalid_dbs()
+	 */
 	dbres = executeQueryOrDie(conn_template1,
 							  "SELECT	datname, datallowconn "
-							  "FROM	pg_catalog.pg_database");
+							  "FROM	pg_catalog.pg_database "
+							  "WHERE datconnlimit != %d",
+							  DATCONNLIMIT_INVALID_DB);
 
 	i_datname = PQfnumber(dbres, "datname");
 	i_datallowconn = PQfnumber(dbres, "datallowconn");
@@ -1982,7 +2097,9 @@ check_old_cluster_subscription_state(void)
 
 	/*
 	 * Check that all the subscriptions have their respective replication
-	 * origin.  This check only needs to run once.
+	 * origin.  This check only needs to run once.  Skip invalid databases,
+	 * this check will only be performed when user_opts.skip_invalid_dbs is
+	 * true.
 	 */
 	conn = connectToServer(&old_cluster, old_cluster.dbarr.dbs[0].db_name);
 	res = executeQueryOrDie(conn,
@@ -1992,7 +2109,9 @@ check_old_cluster_subscription_state(void)
 							"	ON o.roname = 'pg_' || s.oid "
 							"INNER JOIN pg_catalog.pg_database d "
 							"	ON d.oid = s.subdbid "
-							"WHERE o.roname IS NULL;");
+							"WHERE o.roname IS NULL "
+							"   AND d.datconnlimit != %d",
+							DATCONNLIMIT_INVALID_DB);
 	ntup = PQntuples(res);
 	for (int i = 0; i < ntup; i++)
 	{
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index f83ded89cb..514ed2d612 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -11,6 +11,7 @@
 
 #include "access/transam.h"
 #include "catalog/pg_class_d.h"
+#include "catalog/pg_database.h"
 #include "pg_upgrade.h"
 #include "pqexpbuffer.h"
 
@@ -407,14 +408,16 @@ get_db_infos(ClusterInfo *cluster)
 	PGresult   *res;
 	int			ntups;
 	int			tupnum;
+	int			valid_tups;
 	DbInfo	   *dbinfos;
 	int			i_datname,
+				i_datconnlimit,
 				i_oid,
 				i_spclocation;
 	char		query[QUERY_ALLOC];
 
 	snprintf(query, sizeof(query),
-			 "SELECT d.oid, d.datname, d.encoding, d.datcollate, d.datctype, ");
+			 "SELECT d.oid, d.datname, d.encoding, d.datcollate, d.datctype, d.datconnlimit, ");
 	if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
 		snprintf(query + strlen(query), sizeof(query) - strlen(query),
 				 "datlocprovider, datlocale, ");
@@ -430,30 +433,36 @@ get_db_infos(ClusterInfo *cluster)
 			 " LEFT OUTER JOIN pg_catalog.pg_tablespace t "
 			 " ON d.dattablespace = t.oid "
 			 "WHERE d.datallowconn = true "
-			 "ORDER BY 1");
+			 "ORDER BY "
+			 " CASE WHEN d.datconnlimit != %d THEN 0 ELSE 1 END, "
+			 " d.oid", DATCONNLIMIT_INVALID_DB);
 
 	res = executeQueryOrDie(conn, "%s", query);
 
 	i_oid = PQfnumber(res, "oid");
 	i_datname = PQfnumber(res, "datname");
+	i_datconnlimit = PQfnumber(res, "datconnlimit");
 	i_spclocation = PQfnumber(res, "spclocation");
 
 	ntups = PQntuples(res);
 	dbinfos = (DbInfo *) pg_malloc0(sizeof(DbInfo) * ntups);
 
-	for (tupnum = 0; tupnum < ntups; tupnum++)
+	for (tupnum = 0, valid_tups = 0; tupnum < ntups; tupnum++)
 	{
 		dbinfos[tupnum].db_oid = atooid(PQgetvalue(res, tupnum, i_oid));
 		dbinfos[tupnum].db_name = pg_strdup(PQgetvalue(res, tupnum, i_datname));
 		snprintf(dbinfos[tupnum].db_tablespace, sizeof(dbinfos[tupnum].db_tablespace), "%s",
 				 PQgetvalue(res, tupnum, i_spclocation));
+		if (atoi(PQgetvalue(res, tupnum, i_datconnlimit)) != DATCONNLIMIT_INVALID_DB)
+			valid_tups++;
 	}
 	PQclear(res);
 
 	PQfinish(conn);
 
 	cluster->dbarr.dbs = dbinfos;
-	cluster->dbarr.ndbs = ntups;
+	cluster->dbarr.ndbs = valid_tups;
+	cluster->dbarr.ntotaldbs = ntups;
 }
 
 
@@ -774,7 +783,7 @@ free_db_and_rel_infos(DbInfoArr *db_arr)
 {
 	int			dbnum;
 
-	for (dbnum = 0; dbnum < db_arr->ndbs; dbnum++)
+	for (dbnum = 0; dbnum < db_arr->ntotaldbs; dbnum++)
 	{
 		free_rel_infos(&db_arr->dbs[dbnum].rel_arr);
 		pg_free(db_arr->dbs[dbnum].db_name);
@@ -782,6 +791,7 @@ free_db_and_rel_infos(DbInfoArr *db_arr)
 	pg_free(db_arr->dbs);
 	db_arr->dbs = NULL;
 	db_arr->ndbs = 0;
+	db_arr->ntotaldbs = 0;
 }
 
 
diff --git a/src/bin/pg_upgrade/option.c b/src/bin/pg_upgrade/option.c
index 6f41d63eed..1937796a72 100644
--- a/src/bin/pg_upgrade/option.c
+++ b/src/bin/pg_upgrade/option.c
@@ -60,6 +60,7 @@ parseCommandLine(int argc, char *argv[])
 		{"copy", no_argument, NULL, 2},
 		{"copy-file-range", no_argument, NULL, 3},
 		{"sync-method", required_argument, NULL, 4},
+		{"skip-invalid-databases", no_argument, NULL, 5},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -69,6 +70,7 @@ parseCommandLine(int argc, char *argv[])
 	DataDirSyncMethod unused;
 
 	user_opts.do_sync = true;
+	user_opts.skip_invalid_dbs = false;
 	user_opts.transfer_mode = TRANSFER_MODE_COPY;
 
 	os_info.progname = get_progname(argv[0]);
@@ -212,6 +214,10 @@ parseCommandLine(int argc, char *argv[])
 				user_opts.sync_method = pg_strdup(optarg);
 				break;
 
+			case 5:
+				user_opts.skip_invalid_dbs = true;
+				break;
+
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
 						os_info.progname);
@@ -306,6 +312,7 @@ usage(void)
 	printf(_("  --clone                       clone instead of copying files to new cluster\n"));
 	printf(_("  --copy                        copy files to new cluster (default)\n"));
 	printf(_("  --copy-file-range             copy files to new cluster with copy_file_range\n"));
+	printf(_("  --skip-invalid-databases      skip transferring invalid databases to the new cluster\n"));
 	printf(_("  --sync-method=METHOD          set method for syncing files to disk\n"));
 	printf(_("  -?, --help                    show this help, then exit\n"));
 	printf(_("\n"
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 663235816f..b0353327b9 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -41,6 +41,7 @@
 #include <time.h>
 
 #include "catalog/pg_class_d.h"
+#include "catalog/pg_database.h"
 #include "common/file_perm.h"
 #include "common/logging.h"
 #include "common/restricted_token.h"
@@ -83,7 +84,8 @@ char	   *output_files[] = {
 int
 main(int argc, char **argv)
 {
-	char	   *deletion_script_file_name = NULL;
+	char	   *deletion_script_file_name = NULL,
+			   *skipped_invalid_databases_file_name = NULL;
 
 	/*
 	 * pg_upgrade doesn't currently use common/logging.c, but initialize it
@@ -219,6 +221,8 @@ main(int argc, char **argv)
 
 	create_script_for_old_cluster_deletion(&deletion_script_file_name);
 
+	create_invalid_databases_report(&skipped_invalid_databases_file_name);
+
 	issue_warnings_and_set_wal_level();
 
 	pg_log(PG_REPORT,
@@ -226,9 +230,10 @@ main(int argc, char **argv)
 		   "Upgrade Complete\n"
 		   "----------------");
 
-	output_completion_banner(deletion_script_file_name);
+	output_completion_banner(deletion_script_file_name, skipped_invalid_databases_file_name);
 
 	pg_free(deletion_script_file_name);
+	pg_free(skipped_invalid_databases_file_name);
 
 	cleanup_output_dirs();
 
@@ -859,10 +864,15 @@ set_frozenxids(bool minmxid_only)
 							  "SET	datminmxid = '%u'",
 							  old_cluster.controldata.chkpnt_nxtmulti));
 
-	/* get database names */
+	/*
+	 * get database names; skip invalid databases as they are handled in
+	 * check_for_invalid_dbs()
+	 */
 	dbres = executeQueryOrDie(conn_template1,
 							  "SELECT	datname, datallowconn "
-							  "FROM	pg_catalog.pg_database");
+							  "FROM	pg_catalog.pg_database "
+							  "WHERE datconnlimit != %d",
+							  DATCONNLIMIT_INVALID_DB);
 
 	i_datname = PQfnumber(dbres, "datname");
 	i_datallowconn = PQfnumber(dbres, "datallowconn");
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 53f693c2d4..8abb1688ca 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -213,7 +213,8 @@ typedef struct
 typedef struct
 {
 	DbInfo	   *dbs;			/* array of db infos */
-	int			ndbs;			/* number of db infos */
+	int			ndbs;			/* number of valid db infos */
+	int			ntotaldbs;		/* total number of db infos */
 } DbInfoArr;
 
 /*
@@ -323,6 +324,8 @@ typedef struct
 	bool		check;			/* check clusters only, don't change any data */
 	bool		live_check;		/* check clusters only, old server is running */
 	bool		do_sync;		/* flush changes to disk */
+	bool		skip_invalid_dbs;	/* skip transferring invalid databases to
+									 * the new cluster */
 	transferMode transfer_mode; /* copy files or link them? */
 	int			jobs;			/* number of processes/threads to use */
 	char	   *socketdir;		/* directory to use for Unix sockets */
@@ -371,10 +374,11 @@ void		check_and_dump_old_cluster(void);
 void		check_new_cluster(void);
 void		report_clusters_compatible(void);
 void		issue_warnings_and_set_wal_level(void);
-void		output_completion_banner(char *deletion_script_file_name);
+void		output_completion_banner(char *deletion_script_file_name, char *skipped_invalid_databases_file_name);
 void		check_cluster_versions(void);
 void		check_cluster_compatibility(void);
 void		create_script_for_old_cluster_deletion(char **deletion_script_file_name);
+void		create_invalid_databases_report(char **skipped_invalid_databases_file_name);
 
 
 /* controldata.c */
diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index 17af2ce61e..c7da6d27b2 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -424,7 +424,7 @@ SKIP:
 			$mode, '--check',
 		],
 		1,
-		[qr/invalid/],    # pg_upgrade prints errors on stdout :(
+		[qr/invalid_databases\.txt/],    # pg_upgrade prints errors on stdout :(
 		[qr/^$/],
 		'invalid database causes failure');
 	rmtree($newnode->data_dir . "/pg_upgrade_output.d");
-- 
2.46.2

Reply via email to