On Wed, Jul 10, 2019 at 2:17 PM Anastasia Lubennikova <a.lubennik...@postgrespro.ru> wrote: > In attachments, you can find a prototype of incremental pg_basebackup, > which consists of 2 features: > > 1) To perform incremental backup one should call pg_basebackup with a > new argument: > > pg_basebackup -D 'basedir' --prev-backup-start-lsn 'lsn' > > where lsn is a start_lsn of parent backup (can be found in > "backup_label" file) > > It calls BASE_BACKUP replication command with a new argument > PREV_BACKUP_START_LSN 'lsn'. > > For datafiles, only pages with LSN > prev_backup_start_lsn will be > included in the backup. > They are saved into 'filename.partial' file, 'filename.blockmap' file > contains an array of BlockNumbers. > For example, if we backuped blocks 1,3,5, filename.partial will contain > 3 blocks, and 'filename.blockmap' will contain array {1,3,5}.
I think it's better to keep both the information about changed blocks and the contents of the changed blocks in a single file. The list of changed blocks is probably quite short, and I don't really want to double the number of files in the backup if there's no real need. I suspect it's just overall a bit simpler to keep everything together. I don't think this is a make-or-break thing, and welcome contrary arguments, but that's my preference. > 2) To merge incremental backup into a full backup call > > pg_basebackup -D 'basedir' --incremental-pgdata 'incremental_basedir' > --merge-backups > > It will move all files from 'incremental_basedir' to 'basedir' handling > '.partial' files correctly. This, to me, looks like it's much worse than the design that I proposed originally. It means that: 1. You can't take an incremental backup without having the full backup available at the time you want to take the incremental backup. 2. You're always storing a full backup, which means that you need more disk space, and potentially much more I/O while taking the backup. You save on transfer bandwidth, but you add a lot of disk reads and writes, costs which have to be paid even if the backup is never restored. > 1) Whether we collect block maps using simple "read everything page by > page" approach > or WAL scanning or any other page tracking algorithm, we must choose a > map format. > I implemented the simplest one, while there are more ideas: I think we should start simple. I haven't had a chance to look at Jeevan's patch at all, or yours in any detail, as yet, so these are just some very preliminary comments. It will be good, however, if we can agree on who is going to do what part of this as we try to drive this forward together. I'm sorry that I didn't communicate EDB's plans to work on this more clearly; duplicated effort serves nobody well. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company