Hi,

I'm thinking of implementing an incremental backup tool for
PostgreSQL. The use case for the tool would be taking a backup of huge
database. For that size of database, pg_dump is too slow, even WAL
archive is too slow/ineffective as well. However even in a TB
database, sometimes actual modified blocks are not that big, may be
even several GB. So if we can backup those modified blocks only,
that would be an effective incremental backup method.

For now, my idea is pretty vague.

- Record info about modified blocks. We don't need to remember the
  whole history of a block if the block was modified multiple times.
  We just remember that the block was modified since the last
  incremental backup was taken.

- The info could be obtained by trapping calls to mdwrite() etc. We need
  to be careful to avoid such blocks used in xlogs and temporary
  tables to not waste resource.

- If many blocks were modified in a file, we may be able to condense
  the info as "the whole file was modified" to reduce the amount of
  info.

- How to take a consistent incremental backup is an issue. I can't
  think of a clean way other than "locking whole cluster", which is
  obviously unacceptable. Maybe we should give up "hot backup"?

Comments, thoughts are welcome.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to