Hi Thomas,

Few review comments on 0003-Add-undo-log-manager.patch:
1) Upgrade may fail
+/*
+ * Compute the new redo, and move the pg_undo file to match if necessary.
+ * Rather than renaming it, we'll create a new copy, so that a failure that
+ * occurs before the controlfile is rewritten won't be fatal.
+ */
+static void
+AdjustRedoLocation(const char *DataDir)
+{
+ uint64 old_redo = ControlFile.checkPointCopy.redo;
+ char old_pg_undo_path[MAXPGPATH];
+ char new_pg_undo_path[MAXPGPATH];
+ int old_fd;
+ int new_fd;
+ ssize_t nread;
+ ssize_t nwritten;
+ char buffer[1024];
+
+ /*
+ * Adjust fields as needed to force an empty XLOG starting at
+ * newXlogSegNo.
+ */

During the upgrade we delete the undo files present in the new cluster
and copy the undo files from the old cluster to the new cluster.
Then we try to readjust the redo location using pg_resetwal.
While trying to readjust we get the current control file details
from current cluster. We try to open the current undo file
present in the cluster using the details from the current cluster.
As the undo files from the current cluster have been removed
and replaced with the old cluster contents, the file open will fail.

Attached a patch to solve this problem.

2)  drop table space failure in corner case.

+ else
+ {
+ /*
+ * There is data we need in this undo log.  We can't force it to
+ * be detached.
+ */
+ ok = false;
+ }
+ LWLockRelease(&slot->mutex);

+ /* If we failed, then give up now and report failure. */
+ if (!ok)
+ return false;

One thought, can we discard the current tablespace entries
and try not to fail.

3) There will be a problem if some files deletion is successful and some
 file deletion fails, the meta contents having end details also need to be
applied or to handle the case where the undo is created further after
rollback

+ while ((de = ReadDirExtended(dir, undo_path, LOG)) != NULL)
+ {
+ char segment_path[MAXPGPATH];
+
+ if (strcmp(de->d_name, ".") == 0 ||
+ strcmp(de->d_name, "..") == 0)
+ continue;
+ snprintf(segment_path, sizeof(segment_path), "%s/%s",
+ undo_path, de->d_name);
+ if (unlink(segment_path) < 0)
+ elog(LOG, "couldn't unlink file \"%s\": %m", segment_path);
+ }

4)  In error case unlinked undo segment message will be logged
+ while ((de = ReadDirExtended(dir, undo_path, LOG)) != NULL)
+ {
+ char segment_path[MAXPGPATH];
+
+ if (strncmp(de->d_name, segment_prefix, segment_prefix_size) != 0)
+ continue;
+ snprintf(segment_path, sizeof(segment_path), "%s/%s",
+ undo_path, de->d_name);
+ elog(DEBUG1, "unlinked undo segment \"%s\"", segment_path);
+ if (unlink(segment_path) < 0)
+ elog(LOG, "couldn't unlink file \"%s\": %m", segment_path);
+ }
+ FreeDir(dir);

In error case the success message will be logged.

5) UndoRecPtrIsValid can be used to check InvalidUndoRecPtr
+ /*
+ * 'size' is expressed in usable non-header bytes.  Figure out how far we
+ * have to move insert to create space for 'size' usable bytes, stepping
+ * over any intervening headers.
+ */
+ Assert(slot->meta.unlogged.insert % BLCKSZ >= UndoLogBlockHeaderSize);
+ if (context->try_location != InvalidUndoRecPtr)

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
On Thu, Jul 25, 2019 at 9:30 AM vignesh C <vignes...@gmail.com> wrote:
>
> On Thu, Jul 25, 2019 at 7:48 AM Amit Kapila <amit.kapil...@gmail.com> wrote:
> >
> > On Wed, Jul 24, 2019 at 11:04 PM vignesh C <vignes...@gmail.com> wrote:
> > >
> > > Hi,
> > >
> > > I have done some review of undolog patch series
> > > and here are my comments:
> > > 0003-Add-undo-log-manager.patch
> > >
> > > 1) As undo log is being created in tablespace,
> > >  if the tablespace is dropped later, will it have any impact?
>
> Thanks Amit, that clarifies the problem I was thinking.
> I  have another question regarding drop table space failure, but I
> don't have a better solution for that problem.
> Let me think more about it and discuss.
> >
> > Yes, it drops the undo logs present in tablespace being dropped.  See
> > DropUndoLogsInTablespace() in the same patch.
> >
> > >
> > > 4) Should we add a readme file for undolog as it does a fair amount of 
> > > work
> > >     and is core part of the undo system?
> > >
> Thanks Amit, I could get the details of readme.
> >
> > The Readme is already present in the patch series posted by Thomas.
> > See 0019-Add-developer-documentation-for-the-undo-log-storage.patch in
> > email [1].
> >
> > [1] - 
> > https://www.postgresql.org/message-id/CA%2BhUKGKni7EEU4FT71vZCCwPeaGb2PQOeKOFjQJavKnD577UMQ%40mail.gmail.com
> >
> > --
> > With Regards,
> > Amit Kapila.
> > EnterpriseDB: http://www.enterprisedb.com
>
> --
> Regards,
> Vignesh
> EnterpriseDB: http://www.enterprisedb.com
From 66578336861e02b9dc5544b7d2fed46261681998 Mon Sep 17 00:00:00 2001
From: Vigneshwaran c <vignesh21@gmail.com>
Date: Mon, 8 Jul 2019 22:49:18 +0530
Subject: [PATCH]     pg_upgrade failure fix.

    For adjusting undo the old cluster redo need to be passed.

    Patch by Vigneshwaran C, reviewed by Dilip Kumar.
---
 src/bin/pg_resetwal/pg_resetwal.c | 34 ++++++++++++++++++++++++++++------
 src/bin/pg_upgrade/pg_upgrade.c   |  4 ++--
 2 files changed, 30 insertions(+), 8 deletions(-)

diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index 0610a8c..396613b 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -85,7 +85,7 @@ static bool ReadControlFile(void);
 static void GuessControlValues(void);
 static void PrintControlValues(bool guessed);
 static void PrintNewControlValues(void);
-static void AdjustRedoLocation(const char *DataDir);
+static void AdjustRedoLocation(const char *DataDir, uint64 prev_redo);
 static void RewriteControlFile(void);
 static void FindEndOfXLOG(void);
 static void KillExistingXLOG(void);
@@ -109,6 +109,7 @@ main(int argc, char *argv[])
 		{"next-oid", required_argument, NULL, 'o'},
 		{"multixact-offset", required_argument, NULL, 'O'},
 		{"next-transaction-id", required_argument, NULL, 'x'},
+		{"old-redo", required_argument, NULL, 'r'},
 		{"wal-segsize", required_argument, NULL, 1},
 		{NULL, 0, NULL, 0}
 	};
@@ -121,6 +122,7 @@ main(int argc, char *argv[])
 	char	   *endptr2;
 	char	   *DataDir = NULL;
 	char	   *log_fname = NULL;
+	uint64	   old_redo = 0;
 	int			fd;
 	char		latest_undo_checkpoint_file[MAXPGPATH];
 	char		new_undo_checkpoint_file[MAXPGPATH];
@@ -145,7 +147,7 @@ main(int argc, char *argv[])
 	}
 
 
-	while ((c = getopt_long(argc, argv, "c:D:e:fl:m:no:O:x:", long_options, NULL)) != -1)
+	while ((c = getopt_long(argc, argv, "c:D:e:fl:m:no:r:O:x:", long_options, NULL)) != -1)
 	{
 		switch (c)
 		{
@@ -239,6 +241,21 @@ main(int argc, char *argv[])
 				}
 				break;
 
+			case 'r':
+				old_redo = strtoull(optarg, &endptr, 0);
+				if (endptr == optarg || *endptr != '\0')
+				{
+					pg_log_error("invalid argument for option %s", "-r");
+					fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
+					exit(1);
+				}
+				if (old_redo == 0)
+				{
+					pg_log_error("Old Redo (-r) must not be 0");
+					exit(1);
+				}
+				break;
+
 			case 'm':
 				set_mxid = strtoul(optarg, &endptr, 0);
 				if (endptr == optarg || *endptr != ',')
@@ -521,7 +538,7 @@ main(int argc, char *argv[])
 	/*
 	 * Else, do the dirty deed.
 	 */
-	AdjustRedoLocation(DataDir);
+	AdjustRedoLocation(DataDir, old_redo);
 	RewriteControlFile();
 
 	/*
@@ -800,6 +817,9 @@ PrintControlValues(bool guessed)
 		   ControlFile.catalog_version_no);
 	printf(_("Database system identifier:           %s\n"),
 		   sysident_str);
+	printf(_("Latest checkpoint's REDO location:    %X/%X\n"),
+		   (uint32) (ControlFile.checkPointCopy.redo >> 32),
+		   (uint32) ControlFile.checkPointCopy.redo);
 	printf(_("Latest checkpoint's TimeLineID:       %u\n"),
 		   ControlFile.checkPointCopy.ThisTimeLineID);
 	printf(_("Latest checkpoint's full_page_writes: %s\n"),
@@ -940,9 +960,10 @@ PrintNewControlValues(void)
  * occurs before the controlfile is rewritten won't be fatal.
  */
 static void
-AdjustRedoLocation(const char *DataDir)
+AdjustRedoLocation(const char *DataDir, uint64 prev_redo)
 {
-	uint64		old_redo = ControlFile.checkPointCopy.redo;
+	uint64		old_redo = (prev_redo != 0) ? prev_redo:
+				  ControlFile.checkPointCopy.redo;
 	char		old_pg_undo_path[MAXPGPATH];
 	char		new_pg_undo_path[MAXPGPATH];
 	int			old_fd;
@@ -985,7 +1006,7 @@ AdjustRedoLocation(const char *DataDir)
 	new_fd = open(new_pg_undo_path, O_RDWR | O_CREAT, 0644);
 	if (new_fd < 0)
 	{
-		pg_log_error("could not open \"%s\": %m", old_pg_undo_path);
+		pg_log_error("could not open \"%s\": %m", new_pg_undo_path);
 		exit(1);
 	}
 	while ((nread = read(old_fd, buffer, sizeof(buffer))) > 0)
@@ -1399,6 +1420,7 @@ usage(void)
 	printf(_("  -n, --dry-run                  no update, just show what would be done\n"));
 	printf(_("  -o, --next-oid=OID             set next OID\n"));
 	printf(_("  -O, --multixact-offset=OFFSET  set next multitransaction offset\n"));
+	printf(_("  -r, --old-redo=REDO            use old redo for adjusting the UNDO\n"));
 	printf(_("  -V, --version                  output version information, then exit\n"));
 	printf(_("  -x, --next-transaction-id=XID  set next transaction ID\n"));
 	printf(_("      --wal-segsize=SIZE         size of WAL segments, in megabytes\n"));
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index c611203..3a6ade9 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -484,9 +484,9 @@ copy_xact_xlog_xid(void)
 	/* set the next transaction id and epoch of the new cluster */
 	prep_status("Setting next transaction ID and epoch for new cluster");
 	exec_prog(UTILITY_LOG_FILE, NULL, true, true,
-			  "\"%s/pg_resetwal\" -f -x %u \"%s\"",
+			  "\"%s/pg_resetwal\" -f -x %u  -r "UINT64_FORMAT" \"%s\"",
 			  new_cluster.bindir, old_cluster.controldata.chkpnt_nxtxid,
-			  new_cluster.pgdata);
+			  old_cluster.controldata.redo_location, new_cluster.pgdata);
 	exec_prog(UTILITY_LOG_FILE, NULL, true, true,
 			  "\"%s/pg_resetwal\" -f -e %u \"%s\"",
 			  new_cluster.bindir, old_cluster.controldata.chkpnt_nxtepoch,
-- 
1.8.3.1

Reply via email to