Hello,

While doing some work/research on the new incremental backup feature
some limitations were not listed in the docs. Mainly the fact that
pg_combienbackup works with plain format and not tar.

Around the same time, Tomas Vondra tested incremental backups with a
cluster where he enabled checksums after taking the previous full
backup. After combining the backups the synthetic backup had pages
with checksums and other pages without checksums which ended in
checksum errors.

I've attached two patches, the first one is just neat-picking things I
found when I first read the docs. The second has a note on the two
limitations listed above. The limitation on incremental backups of a
cluster that had checksums enabled after the previous backup, I was
not sure if that should go in pg_basebackup or pg_combienbackup
reference documentation. Or maybe somewhere else.

Kind regards, Martín

-- 
Martín Marqués
It’s not that I have something to hide,
it’s that I have nothing I want you to see
From eda9f0c811ba115edf47b4f81200073a41d10cc3 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Mart=C3=ADn=20Marqu=C3=A9s?= <martin.marq...@gmail.com>
Date: Sat, 6 Apr 2024 19:30:23 +0200
Subject: [PATCH 1/2] Remove unneeded wording in pg_combinebackup documentation

---
 doc/src/sgml/ref/pg_combinebackup.sgml | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/ref/pg_combinebackup.sgml b/doc/src/sgml/ref/pg_combinebackup.sgml
index 658e9a759c..19b6d159ce 100644
--- a/doc/src/sgml/ref/pg_combinebackup.sgml
+++ b/doc/src/sgml/ref/pg_combinebackup.sgml
@@ -37,10 +37,10 @@ PostgreSQL documentation
   </para>
 
   <para>
-   Specify all of the required backups on the command line from oldest to newest.
+   Specify all required backups on the command line from oldest to newest.
    That is, the first backup directory should be the path to the full backup, and
    the last should be the path to the final incremental backup
-   that you wish to restore. The reconstructed backup will be written to the
+   you wish to restore. The reconstructed backup will be written to the
    output directory specified by the <option>-o</option> option.
   </para>
 
@@ -48,7 +48,7 @@ PostgreSQL documentation
    Although <application>pg_combinebackup</application> will attempt to verify
    that the backups you specify form a legal backup chain from which a correct
    full backup can be reconstructed, it is not designed to help you keep track
-   of which backups depend on which other backups. If you remove the one or
+   of which backups depend on which other backups. If you remove one or
    more of the previous backups upon which your incremental
    backup relies, you will not be able to restore it.
   </para>
-- 
2.39.3

From 0fc5ea63d7a2700ea841c56dc766a11d8f4182ff Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Mart=C3=ADn=20Marqu=C3=A9s?= <martin.marq...@gmail.com>
Date: Tue, 9 Apr 2024 09:34:21 +0200
Subject: [PATCH 2/2] Add note of restrictions for combining incremental
 backups

When taking incremental backups the user must be warned that the
backup format has to be plain for pg_combinebackup to work properly.

Another thing to consider is if a cluster had checksums enabled after
the previous backup, an incremental backup will yield a possible
valid cluster but with files from the previous backup that don't have
checksums, giving checksum errors when replying subsequent changes to
those blocks. This behavior was brought up by Tomas Vondra while
testing.
---
 doc/src/sgml/ref/pg_combinebackup.sgml | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/doc/src/sgml/ref/pg_combinebackup.sgml b/doc/src/sgml/ref/pg_combinebackup.sgml
index 19b6d159ce..1cafc0ab07 100644
--- a/doc/src/sgml/ref/pg_combinebackup.sgml
+++ b/doc/src/sgml/ref/pg_combinebackup.sgml
@@ -60,6 +60,27 @@ PostgreSQL documentation
    be specified on the command line in lieu of the chain of backups from which
    it was reconstructed.
   </para>
+
+  <para>
+   Note that there are limitations in combining backups:
+   <itemizedlist>
+    <listitem>
+     <para>
+      <application>pg_combinebackup</application> works with plain format only.
+      In order to combine backups in tar format, they need to be untar first.
+     </para>
+    </listitem>
+    <listitem>
+     <para>
+      If an incremental backup is taken from a cluster where checksums were enabled
+      after the reference backup finished, the resulting data may be valid, but
+      the checksums wouldn't validate for files from the reference backup.
+      In case of enabling checksums on an existing cluster, the next backup must be
+      a full backup.
+     </para>
+    </listitem>
+   </itemizedlist>
+  </para>
  </refsect1>
 
  <refsect1>
-- 
2.39.3

Reply via email to