Tatsuo, * Tatsuo Ishii (is...@postgresql.org) wrote: > I'm thinking of implementing an incremental backup tool for > PostgreSQL. The use case for the tool would be taking a backup of huge > database. For that size of database, pg_dump is too slow, even WAL > archive is too slow/ineffective as well. However even in a TB > database, sometimes actual modified blocks are not that big, may be > even several GB. So if we can backup those modified blocks only, > that would be an effective incremental backup method.
I'm trying to figure out how that's actually different from WAL..? It sounds like you'd get what you're suggesting with simply increasing the checkpoint timeout until the WAL stream is something which you can keep up with. Of course, the downside there is that you'd have to replay more WAL when recovering. What about a tool which receives WALs but then "compresses" them across a longer period of time than the normal checkpointing by simply keeping in memory the current set of blocks modified and applying each WAL record against that block in memory as it reads the WAL? It would then purge that block out using a full-page WAL write at some pre-defined point, perhaps at the end of the overall backup? Consider this: connect the WAL-compressor to a PG backend, issue a 'start backup', which the WAL-compressor detects and then starts keeping track of every block changed in memory, applying the WAL stream of full page and non-full-page changes to the in memory set, until the 'stop backup' is called, at which point the WAL-compressor simply dumps all the records as full page writes into this new WAL stream. Or perhaps some combination of an 'always running' WAL compressor which simply reduces the overall size of the WAL stream with coordination around full backups. Thanks, Stephen
signature.asc
Description: Digital signature