Claudio, * Claudio Freire (klaussfre...@gmail.com) wrote: > I'm not talking about malicious attacks, with big enough data sets, > checksum collisions are much more likely to happen than with smaller > ones, and incremental backups are supposed to work for the big sets.
This is an issue when you're talking about de-duplication, not when you're talking about testing if two files are the same or not for incremental backup purposes. The size of the overall data set in this case is not relevant as you're only ever looking at the same (at most 1G) specific file in the PostgreSQL data directory. Were you able to actually produce a file with a colliding checksum as an existing PG file, the chance that you'd be able to construct one which *also* has a valid page layout sufficient that it wouldn't be obviously massivly corrupted is very quickly approaching zero. > You could use strong cryptographic checksums, but such strong > checksums still aren't perfect, and even if you accept the slim chance > of collision, they are quite expensive to compute, so it's bound to be > a bottleneck with good I/O subsystems. Checking the LSN is much > cheaper. For my 2c on this- I'm actually behind the idea of using the LSN (though I have not followed this thread in any detail), but there's plenty of existing incremental backup solutions (PG specific and not) which work just fine by doing checksums. If you truely feel that this is a real concern, I'd suggest you review the rsync binary diff protocol which is used extensively around the world and show reports of it failing in the field. Thanks, Stephen
signature.asc
Description: Digital signature