Re: block-level incremental backup

Robert Haas Wed, 10 Apr 2019 08:32:29 -0700

On Tue, Apr 9, 2019 at 5:28 PM Alvaro Herrera <[email protected]> wrote:
> On 2019-Apr-09, Peter Eisentraut wrote:
> > On 2019-04-09 17:48, Robert Haas wrote:
> > > 3. There should be a new tool that knows how to merge a full backup
> > > with any number of incremental backups and produce a complete data
> > > directory with no remaining partial files.
> >
> > Are there by any chance standard file formats and tools that describe a
> > binary difference between directories?  That would be really useful here.
>
> VCDIFF? https://tools.ietf.org/html/rfc3284


I don't understand VCDIFF very well, but I see some potential problems
with going in this direction.

First, suppose we take a full backup on Monday.  Then, on Tuesday, we
want to take an incremental backup.  In my proposal, the backup server
only needs to provide the database with one piece of information: the
start-LSN of the previous backup.  The server determines which blocks
are recently modified and sends them to the client, which stores them.
The end.  On the other hand, storing a maximally compact VCDIFF seems
to require that, for each block modified in the Tuesday backup, we go
read the corresponding block as it existed on Monday.  Assuming that
the server is using some efficient method of locating modified blocks,
this will approximately double the amount of read I/O required to
complete the backup: either the server or the client must now read not
only the current version of the block but the previous versions.  If
the previous backup is an incremental backup that does not contain
full block images but only VCDIFF content, whoever is performing the
VCDIFF calculation will need to walk the entire backup chain and
reconstruct the previous contents of the previous block so that it can
compute the newest VCDIFF.  A customer who does an incremental backup
every day and maintains a synthetic full backup from 1 week prior will
see a roughly eightfold increase in read I/O compared to the design I
proposed.

The same problem exists at restore time.  In my design, the total read
I/O required is equal to the size of the database, plus however much
metadata needs to be read from older delta files -- and that should be
fairly small compared to the actual data being read, at least in
normal, non-extreme cases.  But if we are going to proceed by applying
a series of delta files, we're going to need to read every older
backup in its entirety.  If the turnover percentage is significant,
say 20%/day, and if the backup chain is say 7 backups long to get back
to a full backup, this is a huge difference.  Instead of having to
read ~100% of the database size, as in my proposal, we'll need to read
100% + (6 * 20%) = 220% of the database size.

Since VCDIFF uses an add-copy-run language to described differences,
we could try to work around the problem that I just described by
describing each changed data block as an 8192-byte add, and unchanged
blocks as an 8192-byte copy.  If we did that, then I think that the
problem at backup time goes away: we can write out a VCDIFF-format
file for the changed blocks based just on knowing that those are the
blocks that have changed, without needing to access the older file. Of
course, if we do it this way, the file will be larger than it would be
if we actually compared the old and new block contents and wrote out a
minimal VCDIFF, but it does make taking a backup a lot simpler.  Even
with this proposal, though, I think we still have trouble with restore
time.  I proposed putting the metadata about which blocks are included
in a delta file at the beginning of the file, which allows a restore
of a new incremental backup to relatively efficiently flip through
older backups to find just the blocks that it needs, without having to
read the whole file.  But I think (although I am not quite sure) that
in the VCDIFF format, the payload for an ADD instruction is stored
near the payload.  The result would be that you'd have to basically
read the whole file at restore time to figure out which blocks were
available from that file and which ones needed to be retrieved from an
older backup.  So while this approach would fix the backup-time
problem, I believe that it would still require significantly more read
I/O at restore time than my proposal.

Furthermore, if, at backup time, we have to do anything that requires
access to the old data, either the client or the server needs to have
access to that data.  Nonwithstanding the costs of reading it, that
doesn't seem very desirable.  The server is quite unlikely to have
access to the backups, because most users want to back up to a
different server in order to guard against a hardware failure.  The
client is more likely to be running on a machine where it has access
to the data, because many users back up to the same machine every day,
so the machine that is taking the current backup probably has the
older one.  However, accessing that old backup might not be cheap.  It
could be located in an object store in the cloud someplace, or it
could have been written out to a tape drive and the tape removed from
the drive.  In the design I'm proposing, that stuff doesn't matter,
but if you want to run diffs, then it does.  Even if the client has
efficient access to the data and even if it has so much read I/O
bandwidth that the costs of reading that old data to run diffs doesn't
matter, it's still pretty awkward for a tar-format backup.  The client
would have to take the tar archive sent by the server apart and form a
new one.

Another advantage of storing whole blocks in the incremental backup is
that there's no tight coupling between the full backup and the
incremental backup.  Suppose you take a full backup A on S1, and then
another full backup B, and then an incremental backup C based on A,
and then an incremental backup D based on B.  If backup B is destroyed
beyond retrieval, you can restore the chain A-C-D and get back to the
same place that restoring B-D would have gotten you.  Backup D doesn't
really know or care that it happens to be based on B.  It just knows
that it can only give you those blocks that have LSN >= LSN_B.  You
can get those blocks from anywhere that you like.  If D instead stored
deltas between the blocks as they exist in backup B, then those deltas
would have to be applied specifically to backup B, not some
possibly-later version.

I think the way to think about this problem, or at least the way I
think about this problem, is that we need to decide whether want
file-level incremental backup, block-level incremental backup, or
byte-level incremental backup.  pgbackrest implements file-level
incremental backup: if the file has changed, copy the whole thing.
That has an appealing simplicity but risks copying 1GB of data for a
1-byte change. What I'm proposing here is block-level incremental
backup, which is more complicated and still risks copying 8kB of data
for a 1-byte change.  Using VCDIFF would, I think, give us byte-level
incremental backup.  That would probably do an excellent job of making
incremental backups as small as they can possibly be, because we would
not need to include in the backup image even a single byte of
unmodified data.  It also seems like it does some other compression
tricks which could shrink incremental backups further.  However, my
intuition is that we won't gain enough in terms of backup size to make
up for the downsides listed above.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: block-level incremental backup

Reply via email to