Hi Asif, While testing further I observed parallel backup is not able to take backup of standby server.
mkdir /tmp/archive_dir echo "archive_mode='on'">> data/postgresql.conf echo "archive_command='cp %p /tmp/archive_dir/%f'">> data/postgresql.conf ./pg_ctl -D data -l logs start ./pg_basebackup -p 5432 -Fp -R -D /tmp/slave echo "primary_conninfo='host=127.0.0.1 port=5432 user=edb'">> /tmp/slave/postgresql.conf echo "restore_command='cp /tmp/archive_dir/%f %p'">> /tmp/slave/postgresql.conf echo "promote_trigger_file='/tmp/failover.log'">> /tmp/slave/postgresql.conf ./pg_ctl -D /tmp/slave -l /tmp/slave_logs -o "-p 5433" start -c [edb@localhost bin]$ ./psql postgres -p 5432 -c "select pg_is_in_recovery();" pg_is_in_recovery ------------------- f (1 row) [edb@localhost bin]$ ./psql postgres -p 5433 -c "select pg_is_in_recovery();" pg_is_in_recovery ------------------- t (1 row) *[edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs 6pg_basebackup: error: could not list backup files: ERROR: the standby was promoted during online backupHINT: This means that the backup being taken is corrupt and should not be used. Try taking another online backup.pg_basebackup: removing data directory "/tmp/bkp_s"* #same is working fine without parallel backup [edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs 1 [edb@localhost bin]$ ls /tmp/bkp_s/PG_VERSION /tmp/bkp_s/PG_VERSION Thanks & Regards, Rajkumar Raghuwanshi On Thu, Mar 19, 2020 at 4:11 PM Rajkumar Raghuwanshi < rajkumar.raghuwan...@enterprisedb.com> wrote: > Hi Asif, > > In another scenarios, bkp data is corrupted for tablespace. again this is > not reproducible everytime, > but If I am running the same set of commands I am getting the same error. > > [edb@localhost bin]$ ./pg_ctl -D data -l logfile start > waiting for server to start.... done > server started > [edb@localhost bin]$ > [edb@localhost bin]$ mkdir /tmp/tblsp > [edb@localhost bin]$ ./psql postgres -p 5432 -c "create tablespace tblsp > location '/tmp/tblsp';" > CREATE TABLESPACE > [edb@localhost bin]$ ./psql postgres -p 5432 -c "create database testdb > tablespace tblsp;" > CREATE DATABASE > [edb@localhost bin]$ ./psql testdb -p 5432 -c "create table testtbl (a > text);" > CREATE TABLE > [edb@localhost bin]$ ./psql testdb -p 5432 -c "insert into testtbl values > ('parallel_backup with tablespace');" > INSERT 0 1 > [edb@localhost bin]$ ./pg_basebackup -p 5432 -D /tmp/bkp -T > /tmp/tblsp=/tmp/tblsp_bkp --jobs 2 > [edb@localhost bin]$ ./pg_ctl -D /tmp/bkp -l /tmp/bkp_logs -o "-p 5555" > start > waiting for server to start.... done > server started > [edb@localhost bin]$ ./psql postgres -p 5555 -c "select * from > pg_tablespace where spcname like 'tblsp%' or spcname = 'pg_default'"; > oid | spcname | spcowner | spcacl | spcoptions > -------+------------+----------+--------+------------ > 1663 | pg_default | 10 | | > 16384 | tblsp | 10 | | > (2 rows) > > [edb@localhost bin]$ ./psql testdb -p 5555 -c "select * from testtbl"; > psql: error: could not connect to server: FATAL: > "pg_tblspc/16384/PG_13_202003051/16385" is not a valid data directory > DETAIL: File "pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION" is > missing. > [edb@localhost bin]$ > [edb@localhost bin]$ ls > data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION > data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION > [edb@localhost bin]$ ls > /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION > ls: cannot access > /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION: No such file or > directory > > > Thanks & Regards, > Rajkumar Raghuwanshi > > > On Mon, Mar 16, 2020 at 6:19 PM Rajkumar Raghuwanshi < > rajkumar.raghuwan...@enterprisedb.com> wrote: > >> Hi Asif, >> >> On testing further, I found when taking backup with -R, pg_basebackup >> crashed >> this crash is not consistently reproducible. >> >> [edb@localhost bin]$ ./psql postgres -p 5432 -c "create table test (a >> text);" >> CREATE TABLE >> [edb@localhost bin]$ ./psql postgres -p 5432 -c "insert into test values >> ('parallel_backup with -R recovery-conf');" >> INSERT 0 1 >> [edb@localhost bin]$ ./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp -R >> Segmentation fault (core dumped) >> >> stack trace looks the same as it was on earlier reported crash with >> tablespace. >> --stack trace >> [edb@localhost bin]$ gdb -q -c core.37915 pg_basebackup >> Loaded symbols for /lib64/libnss_files.so.2 >> Core was generated by `./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp >> -R'. >> Program terminated with signal 11, Segmentation fault. >> #0 0x00000000004099ee in worker_get_files (wstate=0xc1e458) at >> pg_basebackup.c:3175 >> 3175 backupinfo->curr = fetchfile->next; >> Missing separate debuginfos, use: debuginfo-install >> keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64 >> libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64 >> openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64 >> (gdb) bt >> #0 0x00000000004099ee in worker_get_files (wstate=0xc1e458) at >> pg_basebackup.c:3175 >> #1 0x0000000000408a9e in worker_run (arg=0xc1e458) at >> pg_basebackup.c:2715 >> #2 0x0000003921a07aa1 in start_thread (arg=0x7f72207c0700) at >> pthread_create.c:301 >> #3 0x00000039212e8c4d in clone () at >> ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 >> (gdb) >> >> Thanks & Regards, >> Rajkumar Raghuwanshi >> >> >> On Mon, Mar 16, 2020 at 2:14 PM Jeevan Chalke < >> jeevan.cha...@enterprisedb.com> wrote: >> >>> Hi Asif, >>> >>> >>>> Thanks Rajkumar. I have fixed the above issues and have rebased the >>>> patch to the latest master (b7f64c64). >>>> (V9 of the patches are attached). >>>> >>> >>> I had a further review of the patches and here are my few observations: >>> >>> 1. >>> +/* >>> + * stop_backup() - ends an online backup >>> + * >>> + * The function is called at the end of an online backup. It sends out >>> pg_control >>> + * file, optionally WAL segments and ending WAL location. >>> + */ >>> >>> Comments seem out-dated. >>> >>> 2. With parallel jobs, maxrate is now not supported. Since we are now >>> asking >>> data in multiple threads throttling seems important here. Can you please >>> explain why have you disabled that? >>> >>> 3. As we are always fetching a single file and as Robert suggested, let >>> rename >>> SEND_FILES to SEND_FILE instead. >>> >>> 4. Does this work on Windows? I mean does pthread_create() work on >>> Windows? >>> I asked this as I see that pgbench has its own implementation for >>> pthread_create() for WIN32 but this patch doesn't. >>> >>> 5. Typos: >>> tablspace => tablespace >>> safly => safely >>> >>> 6. parallel_backup_run() needs some comments explaining the states it >>> goes >>> through PB_* states. >>> >>> 7. >>> + case PB_FETCH_REL_FILES: /* fetch files from server */ >>> + if (backupinfo->activeworkers == 0) >>> + { >>> + backupinfo->backupstate = PB_STOP_BACKUP; >>> + free_filelist(backupinfo); >>> + } >>> + break; >>> + case PB_FETCH_WAL_FILES: /* fetch WAL files from server >>> */ >>> + if (backupinfo->activeworkers == 0) >>> + { >>> + backupinfo->backupstate = PB_BACKUP_COMPLETE; >>> + } >>> + break; >>> >>> Why free_filelist() is not called in PB_FETCH_WAL_FILES case? >>> >>> Thanks >>> -- >>> Jeevan Chalke >>> Associate Database Architect & Team Lead, Product Development >>> EnterpriseDB Corporation >>> The Enterprise PostgreSQL Company >>> >>> Phone: +91 20 66449694 >>> >>> Website: www.enterprisedb.com >>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >>> Follow us on Twitter: http://www.twitter.com/enterprisedb >>> >>> This e-mail message (and any attachment) is intended for the use of the >>> individual or entity to whom it is addressed. This message contains >>> information from EnterpriseDB Corporation that may be privileged, >>> confidential, or exempt from disclosure under applicable law. If you are >>> not the intended recipient or authorized to receive this for the intended >>> recipient, any use, dissemination, distribution, retention, archiving, or >>> copying of this communication is strictly prohibited. If you have received >>> this e-mail in error, please notify the sender immediately by reply e-mail >>> and delete this message. >>> >>