On Sat, Aug 24, 2013 at 1:46 PM, <haru...@sraoss.co.jp> wrote: > The following bug has been logged on the website: > > Bug reference: 8397 > Logged by: TAKATSUKA Haruka > Email address: haru...@sraoss.co.jp > PostgreSQL version: 9.2.4 > Operating system: Linux (CentOS6) > Description: > > Hi. > > > I report a small bug. > pg_basebackup -x from new standby server sometimes causes Segmentation > fault. > > > (1) create new standby server dir by pg_basebackup without -x > (2) start new standby server > (3) pg_basebackup from new standby server with -x > (!) when new standby has no WAL files in pg_xlog, > new standby's wal sender crash > > > new standby server's core file: > > > Core was generated by `postgres: wal sender process postgres ::1(55210) > sending backup "pg_basebackup'. > Program terminated with signal 11, Segmentation fault. > #0 0x0000003b7368ac66 in __rawmemchr_sse2 () from /lib64/libc.so.6 > Missing separate debuginfos, use: debuginfo-install > glibc-2.12-1.107.el6.x86_64 libxml2-2.7.6-4.el6.x86_64 > zlib-1.2.3-27.el6.x86_64 > (gdb) bt > #0 0x0000003b7368ac66 in __rawmemchr_sse2 () from /lib64/libc.so.6 > #1 0x0000003b73675990 in _IO_str_init_static_internal () from > /lib64/libc.so.6 > #2 0x0000003b73669935 in vsscanf () from /lib64/libc.so.6 > #3 0x0000003b736639a8 in sscanf () from /lib64/libc.so.6 > #4 0x0000000000622351 in perform_base_backup (opt=0x7fffc2e22300, > tblspcdir=0xd424c0) at basebackup.c:304 > #5 0x0000000000622c50 in SendBaseBackup (cmd=<value optimized out>) > at basebackup.c:558 > #6 0x000000000061f5b0 in HandleReplicationCommand () at walsender.c:482 > #7 WalSndHandshake () at walsender.c:257 > #8 WalSenderMain () at walsender.c:181 > #9 0x0000000000650b12 in PostgresMain (argc=1, argv=<value optimized out>, > dbname=0xc82a90 "", username=0xc82a70 "postgres") at postgres.c:3715 > #10 0x000000000060c4f1 in BackendRun () at postmaster.c:3614 > #11 BackendStartup () at postmaster.c:3304 > #12 ServerLoop () at postmaster.c:1367 > #13 0x000000000060f031 in PostmasterMain (argc=<value optimized out>, > argv=<value optimized out>) at postmaster.c:1127 > #14 0x00000000005ae140 in main (argc=5, argv=0xc80bb0) at main.c:199 > > > > > ./backend/replication/basebackup.c:304 > XLogFromFileName(walFiles[0], &tli, &logid, &logseg); > > > In this case, nWalFiles = 0 and walFiles[] palloced zero size. > > > Though pg_basebackup does not have to work in this rare case, > we should insert something like "if (nWalFiles <= 0) ereport(...);".
Yes, we definitely need better error checking there - a crash is never the right answer. Does this happen only when you take a backup "really quickly" after setting up the new standby, or is there some scenario further in it's lifetime when it can happen? In the first case, throwing a hard error seems quite reasonable, but if it's repeatable, perhaps there is something better we can do? Also, while we definitely need a sanity check at this point, might it be worth it to put a second check earlier in the process as well - since AFAICT this error gets thrown only after all the data has been sent arlready. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs