On Fri, 11 Oct 2024 at 04:01, Daniel Gustafsson <dan...@yesql.se> wrote: > > > On 7 Oct 2024, at 22:04, Nathan Bossart <nathandboss...@gmail.com> wrote: > > > > On Mon, Oct 07, 2024 at 03:37:35PM -0400, Bruce Momjian wrote: > >> On Tue, Oct 1, 2024 at 09:28:54AM +0200, Daniel Gustafsson wrote: > >>> Correct, sorry for being unclear. The consistency argument would be to > >>> expand > >>> pg_upgrade to report all invalid databases rather than just the first > >>> found; > >>> attempting to fix problems would be a new behavior. > >> > >> Yes, historically pg_upgrade will fail if it finds anything unusual, > >> mostly because what it does normally is already scary enough. If users > >> what pg_upgrade to do cleanups, it would be enabled by a separate flag, > >> or even a new command-line app. > > > > While I suspect it's rare that someone CTRL-C's out of an accidental DROP > > DATABASE and then runs pg_upgrade before trying to recover the data, I > > agree with the principle of having pg_upgrade fail by default for things > > like this. If we did add a new flag, the new invalid database report that > > Daniel mentions could say something like "try again with > > --skip-invalid-databases to have pg_upgrade automatically drop invalid > > databases." > > If we are teaching pg_upgrade to handle errors, either by skipping or by > fixing, then I believe this is the right way to go about it. A successful run > should probably also create a report of the databases which were skipped.
In v2 I've made changes to the patch incorporating the suggestions here: * Default behaviour is to just fail with a report of all invalid databases * A new option --skip-invalid-databases will then skip the checks, and would not transfer any invalid database to the new cluster. A warning with a report file will then follow after a successful run. Dropping invalid databases in the old cluster will make invalid databases unrecoverable, so I opted for a skip over invalid databases approach that would leave invalid databases in the old cluster. Apart from a missing --skip-invalid-databases test, does this attempt look OK?
From e51f581ddc1158a9fb2840dfc04863618a390887 Mon Sep 17 00:00:00 2001 From: Thomas Krennwallner <tea...@aiven.io> Date: Fri, 20 Sep 2024 17:33:05 -0400 Subject: [PATCH] pg_upgrade: Add check for invalid databases. Currently, pg_upgrade fails to connect to the first invalid database it encounters in get_loadable_libraries() and then aborts. While we print a fatal message with a hint what should be done, the output is terse and does not report further invalid databases present in an installation. Instead of just exiting on first error, we collect all pg_database.datconnlimit values in get_db_infos() and order invalid databases after valid ones in cluster.dbarr. This allows for skipping over invalid databases in all checks with --skip-invalid-databases, a new option where we do not transfer such databases from the old cluster. Unless we use --skip-invalid-databases, check_for_invalid_dbs() collects all invalid databases in a report file invalid_databases.txt. After a successful pg_upgrade run with skipped invalid databases, create_invalid_databases_report() will collect them in a report file skipped_invalid_databases.txt for the completion banner output. --- doc/src/sgml/ref/pgupgrade.sgml | 19 ++++ src/bin/pg_upgrade/check.c | 131 +++++++++++++++++++++++-- src/bin/pg_upgrade/info.c | 20 +++- src/bin/pg_upgrade/option.c | 7 ++ src/bin/pg_upgrade/pg_upgrade.c | 18 +++- src/bin/pg_upgrade/pg_upgrade.h | 8 +- src/bin/pg_upgrade/t/002_pg_upgrade.pl | 2 +- 7 files changed, 187 insertions(+), 18 deletions(-) diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml index 4777381dac..a9a6b62e7a 100644 --- a/doc/src/sgml/ref/pgupgrade.sgml +++ b/doc/src/sgml/ref/pgupgrade.sgml @@ -253,6 +253,25 @@ PostgreSQL documentation </listitem> </varlistentry> + <varlistentry> + <term><option>--skip-invalid-databases</option></term> + <listitem> + <para> + If a cluster contains invalid databases, the default behaviour of + <application>pg_upgrade</application> is to fail with a report of all + such databases found in an installation. Invalid databases, which cannot + be connected to anymore, are a consequence of interrupted + <command>DROP DATABASE</command> and marked as corrupted databases in + <link linkend="catalog-pg-database"><structname>pg_database</structname></link>.<structfield>datconnlimit</structfield>. + </para> + <para> + When using option <option>--skip-invalid-databases</option>, + <application>pg_upgrade</application> instead will skip all invalid + databases and do not transfer them over to the new cluster. + </para> + </listitem> + </varlistentry> + <varlistentry> <term><option>--sync-method=</option><replaceable>method</replaceable></term> <listitem> diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c index 12735a4268..e372feff1a 100644 --- a/src/bin/pg_upgrade/check.c +++ b/src/bin/pg_upgrade/check.c @@ -12,6 +12,7 @@ #include "catalog/pg_authid_d.h" #include "catalog/pg_class_d.h" #include "catalog/pg_collation.h" +#include "catalog/pg_database.h" #include "fe_utils/string_utils.h" #include "mb/pg_wchar.h" #include "pg_upgrade.h" @@ -19,6 +20,7 @@ static void check_new_cluster_is_empty(void); static void check_is_install_user(ClusterInfo *cluster); static void check_proper_datallowconn(ClusterInfo *cluster); +static void check_for_invalid_dbs(ClusterInfo *cluster); static void check_for_prepared_transactions(ClusterInfo *cluster); static void check_for_isn_and_int8_passing_mismatch(ClusterInfo *cluster); static void check_for_user_defined_postfix_ops(ClusterInfo *cluster); @@ -598,6 +600,11 @@ check_and_dump_old_cluster(void) */ get_db_rel_and_slot_infos(&old_cluster); + /* + * Verify that the old cluster does not contain invalid databases. + */ + check_for_invalid_dbs(&old_cluster); + init_tablespaces(); get_loadable_libraries(); @@ -691,6 +698,11 @@ check_new_cluster(void) { get_db_rel_and_slot_infos(&new_cluster); + /* + * Verify that the new cluster does not contain invalid databases. + */ + check_for_invalid_dbs(&new_cluster); + check_new_cluster_is_empty(); check_loadable_libraries(); @@ -763,7 +775,7 @@ issue_warnings_and_set_wal_level(void) void -output_completion_banner(char *deletion_script_file_name) +output_completion_banner(char *deletion_script_file_name, char *skipped_invalid_databases_file_name) { PQExpBufferData user_specification; @@ -775,6 +787,14 @@ output_completion_banner(char *deletion_script_file_name) appendPQExpBufferChar(&user_specification, ' '); } + if (skipped_invalid_databases_file_name) + { + pg_log(PG_REPORT, + "The following invalid databases were not transferred by pg_upgrade:\n" + " %s", + skipped_invalid_databases_file_name); + } + pg_log(PG_REPORT, "Optimizer statistics are not transferred by pg_upgrade.\n" "Once you start the new server, consider running:\n" @@ -998,7 +1018,7 @@ create_script_for_old_cluster_deletion(char **deletion_script_file_name) fprintf(script, "\n"); - for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++) + for (dbnum = 0; dbnum < old_cluster.dbarr.ntotaldbs; dbnum++) fprintf(script, RMDIR_CMD " %c%s%c%u%c\n", PATH_QUOTE, fix_path_separator(os_info.old_tablespaces[tblnum]), PATH_SEPARATOR, old_cluster.dbarr.dbs[dbnum].db_oid, @@ -1087,6 +1107,96 @@ check_is_install_user(ClusterInfo *cluster) check_ok(); } +/* + * check_for_invalid_dbs + * + * Ensure that all databases are valid as connections to invalid databases + * are not allowed. + */ +static void +check_for_invalid_dbs(ClusterInfo *cluster) +{ + int dbnum; + FILE *script = NULL; + char output_path[MAXPGPATH]; + + prep_status("Checking for invalid databases"); + + if (user_opts.skip_invalid_dbs) + { + pg_log(PG_REPORT, "skipped"); + return; + } + + snprintf(output_path, sizeof(output_path), "%s/%s", + log_opts.basedir, + "invalid_databases.txt"); + + /* + * If invalid databases exist in cluster they are ordered after the valid + * ones in dbarr. + */ + for (dbnum = cluster->dbarr.ndbs; dbnum < cluster->dbarr.ntotaldbs; dbnum++) + { + if (script == NULL && (script = fopen_priv(output_path, "w")) == NULL) + pg_fatal("could not open file \"%s\": %m", output_path); + fprintf(script, "%s\n", cluster->dbarr.dbs[dbnum].db_name); + } + + if (script) + { + fclose(script); + pg_log(PG_REPORT, "fatal"); + pg_fatal("Your installation contains invalid databases as a consequence of\n" + "interrupted DROP DATABASE. They are now marked as corrupted databases\n" + "that cannot be connected to anymore. Consider removing them using\n" + " DROP DATABASE ...;\n" + "or run pg_upgrade with --skip-invalid-databases to skip transferring\n" + "invalid databases to the new cluster.\n" + "A list of invalid databases is in the file:\n" + " %s", output_path); + } + else + check_ok(); +} + + +/* + * create_invalid_databases_report() + * + * This is particularly useful for listing skipped invalid databases. + */ +void +create_invalid_databases_report(char **skipped_invalid_databases_file_name) +{ + int dbnum; + FILE *script = NULL; + + if (!user_opts.skip_invalid_dbs || old_cluster.dbarr.ndbs == old_cluster.dbarr.ntotaldbs) + return; + + *skipped_invalid_databases_file_name = psprintf("%sskipped_invalid_databases.txt", + SCRIPT_PREFIX); + + prep_status("Creating report for skipped invalid databases"); + + for (dbnum = old_cluster.dbarr.ndbs; dbnum < old_cluster.dbarr.ntotaldbs; dbnum++) + { + if (script == NULL && (script = fopen_priv(*skipped_invalid_databases_file_name, "w")) == NULL) + pg_fatal("could not open file \"%s\": %m", *skipped_invalid_databases_file_name); + fprintf(script, "%s\n", old_cluster.dbarr.dbs[dbnum].db_name); + } + + if (script != NULL) + { + report_status(PG_WARNING, "warning"); + pg_log(PG_WARNING, + "\nWARNING: old cluster contains invalid databases which have been skipped"); + fclose(script); + } + else + check_ok(); +} /* * check_proper_datallowconn @@ -1115,10 +1225,15 @@ check_proper_datallowconn(ClusterInfo *cluster) conn_template1 = connectToServer(cluster, "template1"); - /* get database names */ + /* + * get database names; skip invalid databases as they are handled in + * check_for_invalid_dbs() + */ dbres = executeQueryOrDie(conn_template1, "SELECT datname, datallowconn " - "FROM pg_catalog.pg_database"); + "FROM pg_catalog.pg_database " + "WHERE datconnlimit != %d", + DATCONNLIMIT_INVALID_DB); i_datname = PQfnumber(dbres, "datname"); i_datallowconn = PQfnumber(dbres, "datallowconn"); @@ -1982,7 +2097,9 @@ check_old_cluster_subscription_state(void) /* * Check that all the subscriptions have their respective replication - * origin. This check only needs to run once. + * origin. This check only needs to run once. Skip invalid databases, + * this check will only be performed when user_opts.skip_invalid_dbs is + * true. */ conn = connectToServer(&old_cluster, old_cluster.dbarr.dbs[0].db_name); res = executeQueryOrDie(conn, @@ -1992,7 +2109,9 @@ check_old_cluster_subscription_state(void) " ON o.roname = 'pg_' || s.oid " "INNER JOIN pg_catalog.pg_database d " " ON d.oid = s.subdbid " - "WHERE o.roname IS NULL;"); + "WHERE o.roname IS NULL " + " AND d.datconnlimit != %d", + DATCONNLIMIT_INVALID_DB); ntup = PQntuples(res); for (int i = 0; i < ntup; i++) { diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c index f83ded89cb..514ed2d612 100644 --- a/src/bin/pg_upgrade/info.c +++ b/src/bin/pg_upgrade/info.c @@ -11,6 +11,7 @@ #include "access/transam.h" #include "catalog/pg_class_d.h" +#include "catalog/pg_database.h" #include "pg_upgrade.h" #include "pqexpbuffer.h" @@ -407,14 +408,16 @@ get_db_infos(ClusterInfo *cluster) PGresult *res; int ntups; int tupnum; + int valid_tups; DbInfo *dbinfos; int i_datname, + i_datconnlimit, i_oid, i_spclocation; char query[QUERY_ALLOC]; snprintf(query, sizeof(query), - "SELECT d.oid, d.datname, d.encoding, d.datcollate, d.datctype, "); + "SELECT d.oid, d.datname, d.encoding, d.datcollate, d.datctype, d.datconnlimit, "); if (GET_MAJOR_VERSION(cluster->major_version) >= 1700) snprintf(query + strlen(query), sizeof(query) - strlen(query), "datlocprovider, datlocale, "); @@ -430,30 +433,36 @@ get_db_infos(ClusterInfo *cluster) " LEFT OUTER JOIN pg_catalog.pg_tablespace t " " ON d.dattablespace = t.oid " "WHERE d.datallowconn = true " - "ORDER BY 1"); + "ORDER BY " + " CASE WHEN d.datconnlimit != %d THEN 0 ELSE 1 END, " + " d.oid", DATCONNLIMIT_INVALID_DB); res = executeQueryOrDie(conn, "%s", query); i_oid = PQfnumber(res, "oid"); i_datname = PQfnumber(res, "datname"); + i_datconnlimit = PQfnumber(res, "datconnlimit"); i_spclocation = PQfnumber(res, "spclocation"); ntups = PQntuples(res); dbinfos = (DbInfo *) pg_malloc0(sizeof(DbInfo) * ntups); - for (tupnum = 0; tupnum < ntups; tupnum++) + for (tupnum = 0, valid_tups = 0; tupnum < ntups; tupnum++) { dbinfos[tupnum].db_oid = atooid(PQgetvalue(res, tupnum, i_oid)); dbinfos[tupnum].db_name = pg_strdup(PQgetvalue(res, tupnum, i_datname)); snprintf(dbinfos[tupnum].db_tablespace, sizeof(dbinfos[tupnum].db_tablespace), "%s", PQgetvalue(res, tupnum, i_spclocation)); + if (atoi(PQgetvalue(res, tupnum, i_datconnlimit)) != DATCONNLIMIT_INVALID_DB) + valid_tups++; } PQclear(res); PQfinish(conn); cluster->dbarr.dbs = dbinfos; - cluster->dbarr.ndbs = ntups; + cluster->dbarr.ndbs = valid_tups; + cluster->dbarr.ntotaldbs = ntups; } @@ -774,7 +783,7 @@ free_db_and_rel_infos(DbInfoArr *db_arr) { int dbnum; - for (dbnum = 0; dbnum < db_arr->ndbs; dbnum++) + for (dbnum = 0; dbnum < db_arr->ntotaldbs; dbnum++) { free_rel_infos(&db_arr->dbs[dbnum].rel_arr); pg_free(db_arr->dbs[dbnum].db_name); @@ -782,6 +791,7 @@ free_db_and_rel_infos(DbInfoArr *db_arr) pg_free(db_arr->dbs); db_arr->dbs = NULL; db_arr->ndbs = 0; + db_arr->ntotaldbs = 0; } diff --git a/src/bin/pg_upgrade/option.c b/src/bin/pg_upgrade/option.c index 6f41d63eed..1937796a72 100644 --- a/src/bin/pg_upgrade/option.c +++ b/src/bin/pg_upgrade/option.c @@ -60,6 +60,7 @@ parseCommandLine(int argc, char *argv[]) {"copy", no_argument, NULL, 2}, {"copy-file-range", no_argument, NULL, 3}, {"sync-method", required_argument, NULL, 4}, + {"skip-invalid-databases", no_argument, NULL, 5}, {NULL, 0, NULL, 0} }; @@ -69,6 +70,7 @@ parseCommandLine(int argc, char *argv[]) DataDirSyncMethod unused; user_opts.do_sync = true; + user_opts.skip_invalid_dbs = false; user_opts.transfer_mode = TRANSFER_MODE_COPY; os_info.progname = get_progname(argv[0]); @@ -212,6 +214,10 @@ parseCommandLine(int argc, char *argv[]) user_opts.sync_method = pg_strdup(optarg); break; + case 5: + user_opts.skip_invalid_dbs = true; + break; + default: fprintf(stderr, _("Try \"%s --help\" for more information.\n"), os_info.progname); @@ -306,6 +312,7 @@ usage(void) printf(_(" --clone clone instead of copying files to new cluster\n")); printf(_(" --copy copy files to new cluster (default)\n")); printf(_(" --copy-file-range copy files to new cluster with copy_file_range\n")); + printf(_(" --skip-invalid-databases skip transferring invalid databases to the new cluster\n")); printf(_(" --sync-method=METHOD set method for syncing files to disk\n")); printf(_(" -?, --help show this help, then exit\n")); printf(_("\n" diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c index 663235816f..b0353327b9 100644 --- a/src/bin/pg_upgrade/pg_upgrade.c +++ b/src/bin/pg_upgrade/pg_upgrade.c @@ -41,6 +41,7 @@ #include <time.h> #include "catalog/pg_class_d.h" +#include "catalog/pg_database.h" #include "common/file_perm.h" #include "common/logging.h" #include "common/restricted_token.h" @@ -83,7 +84,8 @@ char *output_files[] = { int main(int argc, char **argv) { - char *deletion_script_file_name = NULL; + char *deletion_script_file_name = NULL, + *skipped_invalid_databases_file_name = NULL; /* * pg_upgrade doesn't currently use common/logging.c, but initialize it @@ -219,6 +221,8 @@ main(int argc, char **argv) create_script_for_old_cluster_deletion(&deletion_script_file_name); + create_invalid_databases_report(&skipped_invalid_databases_file_name); + issue_warnings_and_set_wal_level(); pg_log(PG_REPORT, @@ -226,9 +230,10 @@ main(int argc, char **argv) "Upgrade Complete\n" "----------------"); - output_completion_banner(deletion_script_file_name); + output_completion_banner(deletion_script_file_name, skipped_invalid_databases_file_name); pg_free(deletion_script_file_name); + pg_free(skipped_invalid_databases_file_name); cleanup_output_dirs(); @@ -859,10 +864,15 @@ set_frozenxids(bool minmxid_only) "SET datminmxid = '%u'", old_cluster.controldata.chkpnt_nxtmulti)); - /* get database names */ + /* + * get database names; skip invalid databases as they are handled in + * check_for_invalid_dbs() + */ dbres = executeQueryOrDie(conn_template1, "SELECT datname, datallowconn " - "FROM pg_catalog.pg_database"); + "FROM pg_catalog.pg_database " + "WHERE datconnlimit != %d", + DATCONNLIMIT_INVALID_DB); i_datname = PQfnumber(dbres, "datname"); i_datallowconn = PQfnumber(dbres, "datallowconn"); diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h index 53f693c2d4..8abb1688ca 100644 --- a/src/bin/pg_upgrade/pg_upgrade.h +++ b/src/bin/pg_upgrade/pg_upgrade.h @@ -213,7 +213,8 @@ typedef struct typedef struct { DbInfo *dbs; /* array of db infos */ - int ndbs; /* number of db infos */ + int ndbs; /* number of valid db infos */ + int ntotaldbs; /* total number of db infos */ } DbInfoArr; /* @@ -323,6 +324,8 @@ typedef struct bool check; /* check clusters only, don't change any data */ bool live_check; /* check clusters only, old server is running */ bool do_sync; /* flush changes to disk */ + bool skip_invalid_dbs; /* skip transferring invalid databases to + * the new cluster */ transferMode transfer_mode; /* copy files or link them? */ int jobs; /* number of processes/threads to use */ char *socketdir; /* directory to use for Unix sockets */ @@ -371,10 +374,11 @@ void check_and_dump_old_cluster(void); void check_new_cluster(void); void report_clusters_compatible(void); void issue_warnings_and_set_wal_level(void); -void output_completion_banner(char *deletion_script_file_name); +void output_completion_banner(char *deletion_script_file_name, char *skipped_invalid_databases_file_name); void check_cluster_versions(void); void check_cluster_compatibility(void); void create_script_for_old_cluster_deletion(char **deletion_script_file_name); +void create_invalid_databases_report(char **skipped_invalid_databases_file_name); /* controldata.c */ diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl index 17af2ce61e..c7da6d27b2 100644 --- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl +++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl @@ -424,7 +424,7 @@ SKIP: $mode, '--check', ], 1, - [qr/invalid/], # pg_upgrade prints errors on stdout :( + [qr/invalid_databases\.txt/], # pg_upgrade prints errors on stdout :( [qr/^$/], 'invalid database causes failure'); rmtree($newnode->data_dir . "/pg_upgrade_output.d"); -- 2.46.2