Hi Asif, In another scenarios, bkp data is corrupted for tablespace. again this is not reproducible everytime, but If I am running the same set of commands I am getting the same error.
[edb@localhost bin]$ ./pg_ctl -D data -l logfile start waiting for server to start.... done server started [edb@localhost bin]$ [edb@localhost bin]$ mkdir /tmp/tblsp [edb@localhost bin]$ ./psql postgres -p 5432 -c "create tablespace tblsp location '/tmp/tblsp';" CREATE TABLESPACE [edb@localhost bin]$ ./psql postgres -p 5432 -c "create database testdb tablespace tblsp;" CREATE DATABASE [edb@localhost bin]$ ./psql testdb -p 5432 -c "create table testtbl (a text);" CREATE TABLE [edb@localhost bin]$ ./psql testdb -p 5432 -c "insert into testtbl values ('parallel_backup with tablespace');" INSERT 0 1 [edb@localhost bin]$ ./pg_basebackup -p 5432 -D /tmp/bkp -T /tmp/tblsp=/tmp/tblsp_bkp --jobs 2 [edb@localhost bin]$ ./pg_ctl -D /tmp/bkp -l /tmp/bkp_logs -o "-p 5555" start waiting for server to start.... done server started [edb@localhost bin]$ ./psql postgres -p 5555 -c "select * from pg_tablespace where spcname like 'tblsp%' or spcname = 'pg_default'"; oid | spcname | spcowner | spcacl | spcoptions -------+------------+----------+--------+------------ 1663 | pg_default | 10 | | 16384 | tblsp | 10 | | (2 rows) [edb@localhost bin]$ ./psql testdb -p 5555 -c "select * from testtbl"; psql: error: could not connect to server: FATAL: "pg_tblspc/16384/PG_13_202003051/16385" is not a valid data directory DETAIL: File "pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION" is missing. [edb@localhost bin]$ [edb@localhost bin]$ ls data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION [edb@localhost bin]$ ls /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION ls: cannot access /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION: No such file or directory Thanks & Regards, Rajkumar Raghuwanshi On Mon, Mar 16, 2020 at 6:19 PM Rajkumar Raghuwanshi < rajkumar.raghuwan...@enterprisedb.com> wrote: > Hi Asif, > > On testing further, I found when taking backup with -R, pg_basebackup > crashed > this crash is not consistently reproducible. > > [edb@localhost bin]$ ./psql postgres -p 5432 -c "create table test (a > text);" > CREATE TABLE > [edb@localhost bin]$ ./psql postgres -p 5432 -c "insert into test values > ('parallel_backup with -R recovery-conf');" > INSERT 0 1 > [edb@localhost bin]$ ./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp -R > Segmentation fault (core dumped) > > stack trace looks the same as it was on earlier reported crash with > tablespace. > --stack trace > [edb@localhost bin]$ gdb -q -c core.37915 pg_basebackup > Loaded symbols for /lib64/libnss_files.so.2 > Core was generated by `./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp > -R'. > Program terminated with signal 11, Segmentation fault. > #0 0x00000000004099ee in worker_get_files (wstate=0xc1e458) at > pg_basebackup.c:3175 > 3175 backupinfo->curr = fetchfile->next; > Missing separate debuginfos, use: debuginfo-install > keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64 > libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64 > openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64 > (gdb) bt > #0 0x00000000004099ee in worker_get_files (wstate=0xc1e458) at > pg_basebackup.c:3175 > #1 0x0000000000408a9e in worker_run (arg=0xc1e458) at pg_basebackup.c:2715 > #2 0x0000003921a07aa1 in start_thread (arg=0x7f72207c0700) at > pthread_create.c:301 > #3 0x00000039212e8c4d in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 > (gdb) > > Thanks & Regards, > Rajkumar Raghuwanshi > > > On Mon, Mar 16, 2020 at 2:14 PM Jeevan Chalke < > jeevan.cha...@enterprisedb.com> wrote: > >> Hi Asif, >> >> >>> Thanks Rajkumar. I have fixed the above issues and have rebased the >>> patch to the latest master (b7f64c64). >>> (V9 of the patches are attached). >>> >> >> I had a further review of the patches and here are my few observations: >> >> 1. >> +/* >> + * stop_backup() - ends an online backup >> + * >> + * The function is called at the end of an online backup. It sends out >> pg_control >> + * file, optionally WAL segments and ending WAL location. >> + */ >> >> Comments seem out-dated. >> >> 2. With parallel jobs, maxrate is now not supported. Since we are now >> asking >> data in multiple threads throttling seems important here. Can you please >> explain why have you disabled that? >> >> 3. As we are always fetching a single file and as Robert suggested, let >> rename >> SEND_FILES to SEND_FILE instead. >> >> 4. Does this work on Windows? I mean does pthread_create() work on >> Windows? >> I asked this as I see that pgbench has its own implementation for >> pthread_create() for WIN32 but this patch doesn't. >> >> 5. Typos: >> tablspace => tablespace >> safly => safely >> >> 6. parallel_backup_run() needs some comments explaining the states it goes >> through PB_* states. >> >> 7. >> + case PB_FETCH_REL_FILES: /* fetch files from server */ >> + if (backupinfo->activeworkers == 0) >> + { >> + backupinfo->backupstate = PB_STOP_BACKUP; >> + free_filelist(backupinfo); >> + } >> + break; >> + case PB_FETCH_WAL_FILES: /* fetch WAL files from server */ >> + if (backupinfo->activeworkers == 0) >> + { >> + backupinfo->backupstate = PB_BACKUP_COMPLETE; >> + } >> + break; >> >> Why free_filelist() is not called in PB_FETCH_WAL_FILES case? >> >> Thanks >> -- >> Jeevan Chalke >> Associate Database Architect & Team Lead, Product Development >> EnterpriseDB Corporation >> The Enterprise PostgreSQL Company >> >> Phone: +91 20 66449694 >> >> Website: www.enterprisedb.com >> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >> Follow us on Twitter: http://www.twitter.com/enterprisedb >> >> This e-mail message (and any attachment) is intended for the use of the >> individual or entity to whom it is addressed. This message contains >> information from EnterpriseDB Corporation that may be privileged, >> confidential, or exempt from disclosure under applicable law. If you are >> not the intended recipient or authorized to receive this for the intended >> recipient, any use, dissemination, distribution, retention, archiving, or >> copying of this communication is strictly prohibited. If you have received >> this e-mail in error, please notify the sender immediately by reply e-mail >> and delete this message. >> >