WIP/PoC for parallel backup

Asif Rehman Wed, 21 Aug 2019 06:54:40 -0700

Hi Hackers,

I have been looking into adding parallel backup feature in pg_basebackup.
Currently pg_basebackup sends BASE_BACKUP command for taking full backup,
server scans the PGDATA and sends the files to pg_basebackup. In general,
server takes the following steps on BASE_BACKUP command:


- do pg_start_backup
- scans PGDATA, creates and send header containing information of
tablespaces.
- sends each tablespace to pg_basebackup.
- and then do pg_stop_backup

All these steps are executed sequentially by a single process. The idea I
am working on is to separate these steps into multiple commands in
replication grammer. Add worker processes to the pg_basebackup where they
can copy the contents of PGDATA in parallel.

The command line interface syntax would be like:
pg_basebackup --jobs=WORKERS


Replication commands:

- BASE_BACKUP [PARALLEL] - returns a list of files in PGDATA
If the parallel option is there, then it will only do pg_start_backup,
scans PGDATA and sends a list of file names.

- SEND_FILES_CONTENTS (file1, file2,...) - returns the files in given list.
pg_basebackup will then send back a list of filenames in this command. This
commands will be send by each worker and that worker will be getting the
said files.

- STOP_BACKUP
when all workers finish then, pg_basebackup will send STOP_BACKUP command.

The pg_basebackup can start by sending "BASE_BACKUP PARALLEL" command and
getting a list of filenames from the server in response. It should then
divide this list as per --jobs parameter. (This division can be based on
file sizes). Each of the worker process will issue a SEND_FILES_CONTENTS
(file1, file2,...) command. In response, the server will send the files
mentioned in the list back to the requesting worker process.

Once all the files are copied, then pg_basebackup will send the STOP_BACKUP
command. Similar idea has been been discussed by Robert, on the incremental
backup thread a while ago. This is similar to that but instead of
START_BACKUP and SEND_FILE_LIST, I have combined them into BASE_BACKUP
PARALLEL.

I have done a basic proof of concenpt (POC), which is also attached. I
would appreciate some input on this. So far, I am simply dividing the list
equally and assigning them to worker processes. I intend to fine tune this
by taking into consideration file sizes. Further to add tar format support,
I am considering that each worker process, processes all files belonging to
a tablespace in its list (i.e. creates and copies tar file), before it
processes the next tablespace. As a result, this will create tar files that
are disjointed with respect tablespace data. For example:

Say, tablespace t1 has 20 files and we have 5 worker processes and
tablespace t2 has 10. Ignoring all other factors for the sake of this
example, each worker process will get a group of 4 files of t1 and 2 files
of t2. Each process will create 2 tar files, one for t1 containing 4 files
and another for t2 containing 2 files.


Regards,
Asif

0001-Initial-POC-on-parallel-backup.patch
Description: Binary data

WIP/PoC for parallel backup

Reply via email to