Re: WIP/PoC for parallel backup

Kashif Zeeshan Tue, 14 Apr 2020 06:34:05 -0700

Hi Asif

Getting the following error on Parallel backup when --no-manifest option is
used.


[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$  ./pg_basebackup -v -j 5  -D
 /home/edb/Desktop/backup/ --no-manifest
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_10223"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: write-ahead log end point: 0/2000100
pg_basebackup: error: could not get data for 'BUILD_MANIFEST': ERROR:
 could not open file
"base/pgsql_tmp/pgsql_tmp_b4ef5ac0fd150b2a28caf626bbb1bef2.1": No such file
or directory
pg_basebackup: removing contents of data directory
"/home/edb/Desktop/backup/"
[edb@localhost bin]$

Thanks

On Tue, Apr 14, 2020 at 5:33 PM Asif Rehman <asifr.reh...@gmail.com> wrote:

>
>
> On Wed, Apr 8, 2020 at 6:53 PM Kashif Zeeshan <
> kashif.zees...@enterprisedb.com> wrote:
>
>>
>>
>> On Tue, Apr 7, 2020 at 9:44 PM Asif Rehman <asifr.reh...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> Thanks, Kashif and Rajkumar. I have fixed the reported issues.
>>>
>>> I have added the shared state as previously described. The new grammar
>>> changes
>>> are as follows:
>>>
>>> START_BACKUP [LABEL '<label>'] [FAST] [MAX_RATE %d]
>>>     - This will generate a unique backupid using pg_strong_random(16)
>>> and hex-encoded
>>>       it. which is then returned as the result set.
>>>     - It will also create a shared state and add it to the hashtable.
>>> The hash table size is set
>>>       to BACKUP_HASH_SIZE=10, but since hashtable can expand
>>> dynamically, I think it's
>>>       sufficient initial size. max_wal_senders is not used, because it
>>> can be set to quite a
>>>       large values.
>>>
>>> JOIN_BACKUP 'backup_id'
>>>     - finds 'backup_id' in hashtable and attaches it to server process.
>>>
>>>
>>> SEND_FILE '(' 'FILE' ')' [NOVERIFY_CHECKSUMS]
>>>     - renamed SEND_FILES to SEND_FILE
>>>     - removed START_WAL_LOCATION from this because 'startptr' is now
>>> accessible through
>>>       shared state.
>>>
>>> There is no change in other commands:
>>> STOP_BACKUP [NOWAIT]
>>> LIST_TABLESPACES [PROGRESS]
>>> LIST_FILES [TABLESPACE]
>>> LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
>>>
>>> The current patches (v11) have been rebased to the latest master. The
>>> backup manifest is enabled
>>> by default, so I have disabled it for parallel backup mode and have
>>> generated a warning so that
>>> user is aware of it and not expect it in the backup.
>>>
>>> Hi Asif
>>
>> I have verified the bug fixes, one bug is fixed and working now as
>> expected
>>
>> For the verification of the other bug fixes faced following issues,
>> please have a look.
>>
>>
>> 1) Following bug fixes mentioned below are generating segmentation fault.
>>
>> Please note for reference I have added a description only as steps were
>> given in previous emails of each bug I tried to verify the fix. Backtrace
>> is also added with each case which points to one bug for both the cases.
>>
>> a) The backup failed with errors "error: could not connect to server:
>> could not look up local user ID 1000: Too many open files" when the
>> max_wal_senders was set to 2000.
>>
>>
>> [edb@localhost bin]$ ./pg_basebackup -v -j 1990 -D
>>  /home/edb/Desktop/backup/
>> pg_basebackup: warning: backup manifest is disabled in parallel backup
>> mode
>> pg_basebackup: initiating base backup, waiting for checkpoint to complete
>> pg_basebackup: checkpoint completed
>> pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
>> pg_basebackup: starting background WAL receiver
>> pg_basebackup: created temporary replication slot "pg_basebackup_9925"
>> pg_basebackup: backup worker (0) created
>> pg_basebackup: backup worker (1) created
>> pg_basebackup: backup worker (2) created
>> pg_basebackup: backup worker (3) created
>> ….
>> ….
>> pg_basebackup: backup worker (1014) created
>> pg_basebackup: backup worker (1015) created
>> pg_basebackup: backup worker (1016) created
>> pg_basebackup: backup worker (1017) created
>> pg_basebackup: error: could not connect to server: could not look up
>> local user ID 1000: Too many open files
>> Segmentation fault
>> [edb@localhost bin]$
>>
>>
>> [edb@localhost bin]$
>> [edb@localhost bin]$ gdb pg_basebackup
>> /tmp/cores/core.pg_basebackup.13219.localhost.localdomain.1586349551
>> GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
>> Copyright (C) 2013 Free Software Foundation, Inc.
>> License GPLv3+: GNU GPL version 3 or later <
>> http://gnu.org/licenses/gpl.html>
>> This is free software: you are free to change and redistribute it.
>> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
>> and "show warranty" for details.
>> This GDB was configured as "x86_64-redhat-linux-gnu".
>> For bug reporting instructions, please see:
>> <http://www.gnu.org/software/gdb/bugs/>...
>> Reading symbols from
>> /home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
>> [New LWP 13219]
>> [New LWP 13222]
>> [Thread debugging using libthread_db enabled]
>> Using host libthread_db library "/lib64/libthread_db.so.1".
>> Core was generated by `./pg_basebackup -v -j 1990 -D
>> /home/edb/Desktop/backup/'.
>> Program terminated with signal 11, Segmentation fault.
>> #0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
>> 47  if (INVALID_NOT_TERMINATED_TD_P (pd))
>> (gdb) bt
>> #0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
>> #1  0x000000000040904a in cleanup_workers () at pg_basebackup.c:2978
>> #2  0x0000000000403806 in disconnect_atexit () at pg_basebackup.c:332
>> #3  0x00007f2226f76a49 in __run_exit_handlers (status=1,
>> listp=0x7f22272f86c8 <__exit_funcs>, 
>> run_list_atexit=run_list_atexit@entry=true)
>> at exit.c:77
>> #4  0x00007f2226f76a95 in __GI_exit (status=<optimized out>) at exit.c:99
>> #5  0x0000000000408c54 in create_parallel_workers (backupinfo=0x952ca0)
>> at pg_basebackup.c:2811
>> #6  0x000000000040798f in BaseBackup () at pg_basebackup.c:2211
>> #7  0x0000000000408b4d in main (argc=6, argv=0x7ffe3dabc718) at
>> pg_basebackup.c:2765
>> (gdb)
>>
>>
>>
>>
>> b) When executing two backups at the same time, getting FATAL error due
>> to max_wal_senders and instead of exit  Backup got completed.
>>
>> [edb@localhost bin]$
>> [edb@localhost bin]$
>> [edb@localhost bin]$  ./pg_basebackup -v -j 8 -D
>>  /home/edb/Desktop/backup1/
>> pg_basebackup: warning: backup manifest is disabled in parallel backup
>> mode
>> pg_basebackup: initiating base backup, waiting for checkpoint to complete
>> pg_basebackup: checkpoint completed
>> pg_basebackup: write-ahead log start point: 1/DA000028 on timeline 1
>> pg_basebackup: starting background WAL receiver
>> pg_basebackup: created temporary replication slot "pg_basebackup_17066"
>> pg_basebackup: backup worker (0) created
>> pg_basebackup: backup worker (1) created
>> pg_basebackup: backup worker (2) created
>> pg_basebackup: backup worker (3) created
>> pg_basebackup: backup worker (4) created
>> pg_basebackup: backup worker (5) created
>> pg_basebackup: backup worker (6) created
>> pg_basebackup: error: could not connect to server: FATAL:  number of
>> requested standby connections exceeds max_wal_senders (currently 10)
>> Segmentation fault (core dumped)
>> [edb@localhost bin]$
>> [edb@localhost bin]$
>> [edb@localhost bin]$ gdb pg_basebackup
>> /tmp/cores/core.pg_basebackup.17041.localhost.localdomain.1586353696
>> GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
>> Copyright (C) 2013 Free Software Foundation, Inc.
>> License GPLv3+: GNU GPL version 3 or later <
>> http://gnu.org/licenses/gpl.html>
>> This is free software: you are free to change and redistribute it.
>> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
>> and "show warranty" for details.
>> This GDB was configured as "x86_64-redhat-linux-gnu".
>> For bug reporting instructions, please see:
>> <http://www.gnu.org/software/gdb/bugs/>...
>> Reading symbols from
>> /home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
>> [New LWP 17041]
>> [New LWP 17067]
>> [Thread debugging using libthread_db enabled]
>> Using host libthread_db library "/lib64/libthread_db.so.1".
>> Core was generated by `./pg_basebackup -v -j 8 -D
>> /home/edb/Desktop/backup1/'.
>> Program terminated with signal 11, Segmentation fault.
>> #0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
>> 47  if (INVALID_NOT_TERMINATED_TD_P (pd))
>> (gdb) bt
>> #0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
>> #1  0x000000000040904a in cleanup_workers () at pg_basebackup.c:2978
>> #2  0x0000000000403806 in disconnect_atexit () at pg_basebackup.c:332
>> #3  0x00007f051edc1a49 in __run_exit_handlers (status=1,
>> listp=0x7f051f1436c8 <__exit_funcs>, 
>> run_list_atexit=run_list_atexit@entry=true)
>> at exit.c:77
>> #4  0x00007f051edc1a95 in __GI_exit (status=<optimized out>) at exit.c:99
>> #5  0x0000000000408c54 in create_parallel_workers (backupinfo=0x1c6dca0)
>> at pg_basebackup.c:2811
>> #6  0x000000000040798f in BaseBackup () at pg_basebackup.c:2211
>> #7  0x0000000000408b4d in main (argc=6, argv=0x7ffdb76a6d68) at
>> pg_basebackup.c:2765
>> (gdb)
>>
>>
>>
>>
>> 2) The following bug is not fixed yet
>>
>> A similar case is when DB Server is shut down while the Parallel Backup
>> is in progress then the correct error is displayed but then the backup
>> folder is not cleaned and leaves a corrupt backup.
>>
>> [edb@localhost bin]$
>> [edb@localhost bin]$ ./pg_basebackup -v -D  /home/edb/Desktop/backup/ -j
>> 8
>> pg_basebackup: warning: backup manifest is disabled in parallel backup
>> mode
>> pg_basebackup: initiating base backup, waiting for checkpoint to complete
>> pg_basebackup: checkpoint completed
>> pg_basebackup: write-ahead log start point: 0/A0000028 on timeline 1
>> pg_basebackup: starting background WAL receiver
>> pg_basebackup: created temporary replication slot "pg_basebackup_16235"
>> pg_basebackup: backup worker (0) created
>> pg_basebackup: backup worker (1) created
>> pg_basebackup: backup worker (2) created
>> pg_basebackup: backup worker (3) created
>> pg_basebackup: backup worker (4) created
>> pg_basebackup: backup worker (5) created
>> pg_basebackup: backup worker (6) created
>> pg_basebackup: backup worker (7) created
>> pg_basebackup: error: could not read COPY data: server closed the
>> connection unexpectedly
>> This probably means the server terminated abnormally
>> before or while processing the request.
>> pg_basebackup: error: could not read COPY data: server closed the
>> connection unexpectedly
>> This probably means the server terminated abnormally
>> before or while processing the request.
>> pg_basebackup: removing contents of data directory
>> "/home/edb/Desktop/backup/"
>> pg_basebackup: error: could not read COPY data: server closed the
>> connection unexpectedly
>> This probably means the server terminated abnormally
>> before or while processing the request.
>> [edb@localhost bin]$
>> [edb@localhost bin]$
>> [edb@localhost bin]$
>>
>>
>>
>> [edb@localhost bin]$
>> [edb@localhost bin]$ ls /home/edb/Desktop/backup
>> base         pg_hba.conf    pg_logical    pg_notify    pg_serial
>> pg_stat      pg_subtrans  pg_twophase  pg_xact               postgresql.conf
>> pg_dynshmem  pg_ident.conf  pg_multixact  pg_replslot  pg_snapshots
>>  pg_stat_tmp  pg_tblspc    PG_VERSION   postgresql.auto.conf
>> [edb@localhost bin]$
>> [edb@localhost bin]$
>>
>>
>>
>>
>> Thanks
>> Kashif Zeeshan
>>
>>>
>>>
>>> On Tue, Apr 7, 2020 at 4:03 PM Kashif Zeeshan <
>>> kashif.zees...@enterprisedb.com> wrote:
>>>
>>>>
>>>>
>>>> On Fri, Apr 3, 2020 at 3:01 PM Kashif Zeeshan <
>>>> kashif.zees...@enterprisedb.com> wrote:
>>>>
>>>>> Hi Asif
>>>>>
>>>>> When a non-existent slot is used with tablespace then correct error is
>>>>> displayed but then the backup folder is not cleaned and leaves a corrupt
>>>>> backup.
>>>>>
>>>>> Steps
>>>>> =======
>>>>>
>>>>> edb@localhost bin]$
>>>>> [edb@localhost bin]$ mkdir /home/edb/tbl1
>>>>> [edb@localhost bin]$ mkdir /home/edb/tbl_res
>>>>> [edb@localhost bin]$
>>>>> postgres=#  create tablespace tbl1 location '/home/edb/tbl1';
>>>>> CREATE TABLESPACE
>>>>> postgres=#
>>>>> postgres=# create table t1 (a int) tablespace tbl1;
>>>>> CREATE TABLE
>>>>> postgres=# insert into t1 values(100);
>>>>> INSERT 0 1
>>>>> postgres=# insert into t1 values(200);
>>>>> INSERT 0 1
>>>>> postgres=# insert into t1 values(300);
>>>>> INSERT 0 1
>>>>> postgres=#
>>>>>
>>>>>
>>>>> [edb@localhost bin]$
>>>>> [edb@localhost bin]$  ./pg_basebackup -v -j 2 -D
>>>>>  /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test
>>>>> pg_basebackup: initiating base backup, waiting for checkpoint to
>>>>> complete
>>>>> pg_basebackup: checkpoint completed
>>>>> pg_basebackup: write-ahead log start point: 0/2E000028 on timeline 1
>>>>> pg_basebackup: starting background WAL receiver
>>>>> pg_basebackup: error: could not send replication command
>>>>> "START_REPLICATION": ERROR:  replication slot "test" does not exist
>>>>> pg_basebackup: backup worker (0) created
>>>>> pg_basebackup: backup worker (1) created
>>>>> pg_basebackup: write-ahead log end point: 0/2E000100
>>>>> pg_basebackup: waiting for background process to finish streaming ...
>>>>> pg_basebackup: error: child thread exited with error 1
>>>>> [edb@localhost bin]$
>>>>>
>>>>> backup folder not cleaned
>>>>>
>>>>> [edb@localhost bin]$
>>>>> [edb@localhost bin]$
>>>>> [edb@localhost bin]$
>>>>> [edb@localhost bin]$ ls /home/edb/Desktop/backup
>>>>> backup_label  global        pg_dynshmem  pg_ident.conf  pg_multixact
>>>>>  pg_replslot  pg_snapshots  pg_stat_tmp  pg_tblspc    PG_VERSION  pg_xact
>>>>>             postgresql.conf
>>>>> base          pg_commit_ts  pg_hba.conf  pg_logical     pg_notify
>>>>> pg_serial    pg_stat       pg_subtrans  pg_twophase  pg_wal
>>>>>  postgresql.auto.conf
>>>>> [edb@localhost bin]$
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> If the same case is executed without the parallel backup patch then
>>>>> the backup folder is cleaned after the error is displayed.
>>>>>
>>>>> [edb@localhost bin]$ ./pg_basebackup -v -D  /home/edb/Desktop/backup/
>>>>> -T /home/edb/tbl1=/home/edb/tbl_res -S test999
>>>>> pg_basebackup: initiating base backup, waiting for checkpoint to
>>>>> complete
>>>>> pg_basebackup: checkpoint completed
>>>>> pg_basebackup: write-ahead log start point: 0/2B000028 on timeline 1
>>>>> pg_basebackup: starting background WAL receiver
>>>>> pg_basebackup: error: could not send replication command
>>>>> "START_REPLICATION": ERROR:  replication slot "test999" does not exist
>>>>> pg_basebackup: write-ahead log end point: 0/2B000100
>>>>> pg_basebackup: waiting for background process to finish streaming ...
>>>>> pg_basebackup: error: child process exited with exit code 1
>>>>> *pg_basebackup: removing data directory " /home/edb/Desktop/backup"*
>>>>> pg_basebackup: changes to tablespace directories will not be undone
>>>>>
>>>>
>>>>
>>>> Hi Asif
>>>>
>>>> A similar case is when DB Server is shut down while the Parallel Backup
>>>> is in progress then the correct error is displayed but then the backup
>>>> folder is not cleaned and leaves a corrupt backup. I think one bug fix will
>>>> solve all these cases where clean up is not done when parallel backup is
>>>> failed.
>>>>
>>>> [edb@localhost bin]$
>>>> [edb@localhost bin]$
>>>> [edb@localhost bin]$  ./pg_basebackup -v -D  /home/edb/Desktop/backup/
>>>> -j 8
>>>> pg_basebackup: initiating base backup, waiting for checkpoint to
>>>> complete
>>>> pg_basebackup: checkpoint completed
>>>> pg_basebackup: write-ahead log start point: 0/C1000028 on timeline 1
>>>> pg_basebackup: starting background WAL receiver
>>>> pg_basebackup: created temporary replication slot "pg_basebackup_57337"
>>>> pg_basebackup: backup worker (0) created
>>>> pg_basebackup: backup worker (1) created
>>>> pg_basebackup: backup worker (2) created
>>>> pg_basebackup: backup worker (3) created
>>>> pg_basebackup: backup worker (4) created
>>>> pg_basebackup: backup worker (5) created
>>>> pg_basebackup: backup worker (6) created
>>>> pg_basebackup: backup worker (7) created
>>>> pg_basebackup: error: could not read COPY data: server closed the
>>>> connection unexpectedly
>>>> This probably means the server terminated abnormally
>>>> before or while processing the request.
>>>> pg_basebackup: error: could not read COPY data: server closed the
>>>> connection unexpectedly
>>>> This probably means the server terminated abnormally
>>>> before or while processing the request.
>>>> [edb@localhost bin]$
>>>> [edb@localhost bin]$
>>>>
>>>> Same case when executed on pg_basebackup without the Parallel backup
>>>> patch then proper clean up is done.
>>>>
>>>> [edb@localhost bin]$
>>>> [edb@localhost bin]$  ./pg_basebackup -v -D  /home/edb/Desktop/backup/
>>>> pg_basebackup: initiating base backup, waiting for checkpoint to
>>>> complete
>>>> pg_basebackup: checkpoint completed
>>>> pg_basebackup: write-ahead log start point: 0/C5000028 on timeline 1
>>>> pg_basebackup: starting background WAL receiver
>>>> pg_basebackup: created temporary replication slot "pg_basebackup_5590"
>>>> pg_basebackup: error: could not read COPY data: server closed the
>>>> connection unexpectedly
>>>> This probably means the server terminated abnormally
>>>> before or while processing the request.
>>>> pg_basebackup: removing contents of data directory
>>>> "/home/edb/Desktop/backup/"
>>>> [edb@localhost bin]$
>>>>
>>>> Thanks
>>>>
>>>>
>>>>>
>>>>> On Fri, Apr 3, 2020 at 1:46 PM Asif Rehman <asifr.reh...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Apr 2, 2020 at 8:45 PM Robert Haas <robertmh...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <asifr.reh...@gmail.com>
>>>>>>> wrote:
>>>>>>> >> Why would you need to do that? As long as the process where
>>>>>>> >> STOP_BACKUP can do the check, that seems good enough.
>>>>>>> >
>>>>>>> > Yes, but the user will get the error only after the STOP_BACKUP,
>>>>>>> not while the backup is
>>>>>>> > in progress. So if the backup is a large one, early error
>>>>>>> detection would be much beneficial.
>>>>>>> > This is the current behavior of non-parallel backup as well.
>>>>>>>
>>>>>>> Because non-parallel backup does not feature early detection of this
>>>>>>> error, it is not necessary to make parallel backup do so. Indeed, it
>>>>>>> is undesirable. If you want to fix that problem, do it on a separate
>>>>>>> thread in a separate patch. A patch proposing to make parallel backup
>>>>>>> inconsistent in behavior with non-parallel backup will be rejected,
>>>>>>> at
>>>>>>> least if I have anything to say about it.
>>>>>>>
>>>>>>> TBH, fixing this doesn't seem like an urgent problem to me. The
>>>>>>> current situation is not great, but promotions ought to be relatively
>>>>>>> infrequent, so I'm not sure it's a huge problem in practice. It is
>>>>>>> also worth considering whether the right fix is to figure out how to
>>>>>>> make that case actually work, rather than just making it fail
>>>>>>> quicker.
>>>>>>> I don't currently understand the reason for the prohibition so I
>>>>>>> can't
>>>>>>> express an intelligent opinion on what the right answer is here, but
>>>>>>> it seems like it ought to be investigated before somebody goes and
>>>>>>> builds a bunch of infrastructure to make the error more timely.
>>>>>>>
>>>>>>
>>>>>> Non-parallel backup already does the early error checking. I only
>>>>>> intended
>>>>>>
>>>>>> to make parallel behave the same as non-parallel here. So, I agree
>>>>>> with
>>>>>>
>>>>>> you that the behavior of parallel backup should be consistent with the
>>>>>>
>>>>>> non-parallel one.  Please see the code snippet below from
>>>>>>
>>>>>> basebackup.c:sendDir()
>>>>>>
>>>>>>
>>>>>> /*
>>>>>>>
>>>>>>>  * Check if the postmaster has signaled us to exit, and abort with an
>>>>>>>
>>>>>>>  * error in that case. The error handler further up will call
>>>>>>>
>>>>>>>  * do_pg_abort_backup() for us. Also check that if the backup was
>>>>>>>
>>>>>>>  * started while still in recovery, the server wasn't promoted.
>>>>>>>
>>>>>>>  * do_pg_stop_backup() will check that too, but it's better to stop
>>>>>>>
>>>>>>>  * the backup early than continue to the end and fail there.
>>>>>>>
>>>>>>>  */
>>>>>>>
>>>>>>> CHECK_FOR_INTERRUPTS();
>>>>>>>
>>>>>>> *if* (RecoveryInProgress() != backup_started_in_recovery)
>>>>>>>
>>>>>>> ereport(ERROR,
>>>>>>>
>>>>>>> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>>>>>>>
>>>>>>> errmsg("the standby was promoted during online backup"),
>>>>>>>
>>>>>>> errhint("This means that the backup being taken is corrupt "
>>>>>>>
>>>>>>> "and should not be used. "
>>>>>>>
>>>>>>> "Try taking another online backup.")));
>>>>>>>
>>>>>>>
>>>>>>> > Okay, then I will add the shared state. And since we are adding
>>>>>>> the shared state, we can use
>>>>>>> > that for throttling, progress-reporting and standby early error
>>>>>>> checking.
>>>>>>>
>>>>>>> Please propose a grammar here for all the new replication commands
>>>>>>> you
>>>>>>> plan to add before going and implement everything. That will make it
>>>>>>> easier to hash out the design without forcing you to keep changing
>>>>>>> the
>>>>>>> code. Your design should include a sketch of how several sets of
>>>>>>> coordinating backends taking several concurrent parallel backups will
>>>>>>> end up with one shared state per parallel backup.
>>>>>>>
>>>>>>> > There are two possible options:
>>>>>>> >
>>>>>>> > (1) Server may generate a unique ID i.e. BackupID=<unique_string>
>>>>>>> OR
>>>>>>> > (2) (Preferred Option) Use the WAL start location as the BackupID.
>>>>>>> >
>>>>>>> > This BackupID should be given back as a response to start backup
>>>>>>> command. All client workers
>>>>>>> > must append this ID to all parallel backup replication commands.
>>>>>>> So that we can use this identifier
>>>>>>> > to search for that particular backup. Does that sound good?
>>>>>>>
>>>>>>> Using the WAL start location as the backup ID seems like it might be
>>>>>>> problematic -- could a single checkpoint not end up as the start
>>>>>>> location for multiple backups started at the same time? Whether
>>>>>>> that's
>>>>>>> possible now or not, it seems unwise to hard-wire that assumption
>>>>>>> into
>>>>>>> the wire protocol.
>>>>>>>
>>>>>>> I was thinking that perhaps the client should generate a unique
>>>>>>> backup
>>>>>>> ID, e.g. leader does:
>>>>>>>
>>>>>>> START_BACKUP unique_backup_id [options]...
>>>>>>>
>>>>>>> And then others do:
>>>>>>>
>>>>>>> JOIN_BACKUP unique_backup_id
>>>>>>>
>>>>>>> My thought is that you will have a number of shared memory structure
>>>>>>> equal to max_wal_senders, each one large enough to hold the shared
>>>>>>> state for one backup. The shared state will include
>>>>>>> char[NAMEDATALEN-or-something] which will be used to hold the backup
>>>>>>> ID. START_BACKUP would allocate one and copy the name into it;
>>>>>>> JOIN_BACKUP would search for one by name.
>>>>>>>
>>>>>>> If you want to generate the name on the server side, then I suppose
>>>>>>> START_BACKUP would return a result set that includes the backup ID,
>>>>>>> and clients would have to specify that same backup ID when invoking
>>>>>>> JOIN_BACKUP. The rest would stay the same. I am not sure which way is
>>>>>>> better. Either way, the backup ID should be something long and hard
>>>>>>> to
>>>>>>> guess, not e.g. the leader processes' PID. I think we should generate
>>>>>>> it using pg_strong_random, say 8 or 16 bytes, and then hex-encode the
>>>>>>> result to get a string. That way there's almost no risk of two backup
>>>>>>> IDs colliding accidentally, and even if we somehow had a malicious
>>>>>>> user trying to screw up somebody else's parallel backup by choosing a
>>>>>>> colliding backup ID, it would be pretty hard to have any success. A
>>>>>>> user with enough access to do that sort of thing can probably cause a
>>>>>>> lot worse problems anyway, but it seems pretty easy to guard against
>>>>>>> intentional collisions robustly here, so I think we should.
>>>>>>>
>>>>>>>
>>>>>> Okay so If we are to add another replication command ‘JOIN_BACKUP
>>>>>> unique_backup_id’
>>>>>> to make workers find the relevant shared state. There won't be any
>>>>>> need for changing
>>>>>> the grammar for any other command. The START_BACKUP can return the
>>>>>> unique_backup_id
>>>>>> in the result set.
>>>>>>
>>>>>> I am thinking of the following struct for shared state:
>>>>>>
>>>>>>> *typedef* *struct*
>>>>>>>
>>>>>>> {
>>>>>>>
>>>>>>> *char* backupid[NAMEDATALEN];
>>>>>>>
>>>>>>> XLogRecPtr startptr;
>>>>>>>
>>>>>>>
>>>>>>> slock_t lock;
>>>>>>>
>>>>>>> int64 throttling_counter;
>>>>>>>
>>>>>>> *bool* backup_started_in_recovery;
>>>>>>>
>>>>>>> } BackupSharedState;
>>>>>>>
>>>>>>>
>>>>>> The shared state structure entries would be maintained by a shared
>>>>>> hash table.
>>>>>> There will be one structure per parallel backup. Since a single
>>>>>> parallel backup
>>>>>> can engage more than one wal sender, so I think max_wal_senders might
>>>>>> be a little
>>>>>> too much; perhaps max_wal_senders/2 since there will be at least 2
>>>>>> connections
>>>>>> per parallel backup? Alternatively, we can set a new GUC that defines
>>>>>> the maximum
>>>>>> number of for concurrent parallel backups i.e.
>>>>>> ‘max_concurent_backups_allowed = 10’
>>>>>> perhaps, or we can make it user-configurable.
>>>>>>
>>>>>> The key would be “backupid=hex_encode(pg_random_strong(16))”
>>>>>>
>>>>>> Checking for Standby Promotion:
>>>>>> At the START_BACKUP command, we initialize
>>>>>> BackupSharedState.backup_started_in_recovery
>>>>>> and keep checking it whenever send_file () is called to send a new
>>>>>> file.
>>>>>>
>>>>>> Throttling:
>>>>>> BackupSharedState.throttling_counter - The throttling logic remains
>>>>>> the same
>>>>>> as for non-parallel backup with the exception that multiple threads
>>>>>> will now be
>>>>>> updating it. So in parallel backup, this will represent the overall
>>>>>> bytes that
>>>>>> have been transferred. So the workers would sleep if they have
>>>>>> exceeded the
>>>>>> limit. Hence, the shared state carries a lock to safely update the
>>>>>> throttling
>>>>>> value atomically.
>>>>>>
>>>>>> Progress Reporting:
>>>>>> Although I think we should add progress-reporting for parallel backup
>>>>>> as a
>>>>>> separate patch. The relevant entries for progress-reporting such as
>>>>>> ‘backup_total’ and ‘backup_streamed’ would be then added to this
>>>>>> structure
>>>>>> as well.
>>>>>>
>>>>>>
>>>>>> Grammar:
>>>>>> There is a change in the resultset being returned for START_BACKUP
>>>>>> command;
>>>>>> unique_backup_id is added. Additionally, JOIN_BACKUP replication
>>>>>> command is
>>>>>> added. SEND_FILES has been renamed to SEND_FILE. There are no other
>>>>>> changes
>>>>>> to the grammar.
>>>>>>
>>>>>> START_BACKUP [LABEL '<label>'] [FAST]
>>>>>>   - returns startptr, tli, backup_label, unique_backup_id
>>>>>> STOP_BACKUP [NOWAIT]
>>>>>>   - returns startptr, tli, backup_label
>>>>>> JOIN_BACKUP ‘unique_backup_id’
>>>>>>   - attaches a shared state identified by ‘unique_backup_id’ to a
>>>>>> backend process.
>>>>>>
>>>>>> LIST_TABLESPACES [PROGRESS]
>>>>>> LIST_FILES [TABLESPACE]
>>>>>> LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
>>>>>> SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]
>>>>>>
>>>>>>
>>
>
> Hi,
>
> rebased and updated to the current master (8128b0c1). v13 is attached.
>
> - Fixes the above reported issues.
>
> - Added progress-reporting support for parallel:
> For this, 'backup_streamed' is moved to a shared structure (BackupState) as
> pg_atomic_uint64 variable. The worker processes will keep incrementing this
> variable.
>
> While files are being transferred from server to client. The main process
> remains
> in an idle state. So after each increment, the worker process will signal
> master to
> update the stats in pg_stat_progress_basebackup view.
>
> The 'tablespace_streamed' column is not updated and will remain empty.
> This is
> because multiple workers may be copying files from different tablespaces.
>
>
> - Added backup manifest:
> The backend workers maintain their own manifest file which contains a list
> of files
> that are being transferred by the work. Once all backup files are
> transferred, the
> workers will create a temp file as
> ('pg_tempdir/temp_file_prefix_backupid.workerid')
> to write the content of the manifest file from BufFile. The workers won’t
> add the
> header, nor the WAL information in their manifest. These two will be added
> by the
> main process while merging all worker manifest files.
>
> The main process will read these individual files and concatenate them
> into a single file
> which is then sent back to the client.
>
> The manifest file is created when the following command is received:
>
>>     BUILD_MANIFEST 'backupid'
>
>
> This is a new replication command. It is sent when pg_basebackup has
> copied all the
> $PGDATA files including WAL files.
>
>
>
> --
> Asif Rehman
> Highgo Software (Canada/China/Pakistan)
> URL : www.highgo.ca
>
>

-- 
Regards
====================================
Kashif Zeeshan
Lead Quality Assurance Engineer / Manager

EnterpriseDB Corporation
The Enterprise Postgres Company

Re: WIP/PoC for parallel backup

Reply via email to