Re: [HACKERS] Proposal: Incremental Backup

Stephen Frost Tue, 12 Aug 2014 16:28:22 -0700

Claudio,

* Claudio Freire (klaussfre...@gmail.com) wrote:
> I'm not talking about malicious attacks, with big enough data sets,
> checksum collisions are much more likely to happen than with smaller
> ones, and incremental backups are supposed to work for the big sets.


This is an issue when you're talking about de-duplication, not when
you're talking about testing if two files are the same or not for
incremental backup purposes.  The size of the overall data set in this
case is not relevant as you're only ever looking at the same (at most
1G) specific file in the PostgreSQL data directory.  Were you able to
actually produce a file with a colliding checksum as an existing PG
file, the chance that you'd be able to construct one which *also* has
a valid page layout sufficient that it wouldn't be obviously massivly
corrupted is very quickly approaching zero.

> You could use strong cryptographic checksums, but such strong
> checksums still aren't perfect, and even if you accept the slim chance
> of collision, they are quite expensive to compute, so it's bound to be
> a bottleneck with good I/O subsystems. Checking the LSN is much
> cheaper.

For my 2c on this- I'm actually behind the idea of using the LSN (though
I have not followed this thread in any detail), but there's plenty of
existing incremental backup solutions (PG specific and not) which work
just fine by doing checksums.  If you truely feel that this is a real
concern, I'd suggest you review the rsync binary diff protocol which is
used extensively around the world and show reports of it failing in the
field.

        Thanks,

                Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Proposal: Incremental Backup

Reply via email to