Hi, I'm thinking of implementing an incremental backup tool for PostgreSQL. The use case for the tool would be taking a backup of huge database. For that size of database, pg_dump is too slow, even WAL archive is too slow/ineffective as well. However even in a TB database, sometimes actual modified blocks are not that big, may be even several GB. So if we can backup those modified blocks only, that would be an effective incremental backup method.
For now, my idea is pretty vague. - Record info about modified blocks. We don't need to remember the whole history of a block if the block was modified multiple times. We just remember that the block was modified since the last incremental backup was taken. - The info could be obtained by trapping calls to mdwrite() etc. We need to be careful to avoid such blocks used in xlogs and temporary tables to not waste resource. - If many blocks were modified in a file, we may be able to condense the info as "the whole file was modified" to reduce the amount of info. - How to take a consistent incremental backup is an issue. I can't think of a clean way other than "locking whole cluster", which is obviously unacceptable. Maybe we should give up "hot backup"? Comments, thoughts are welcome. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers