The following bug has been logged online: Bug reference: 5452 Logged by: Chris Copeland Email address: ch...@cope360.com PostgreSQL version: 8.2.16 Operating system: Solaris 10 x86 Description: Server core dumps coming out of recovery mode Details:
I originally had this problem in 8.2.4 and have not upgraded to 8.2.16 and I am still having it. I have two servers with the same hardware, OS, and pg binaries. Log files are copied from the master to the standby and the standby is run in recovery mode. When the standby is triggered to come out of recovery mode, it fails and generates a core dump. Upon trying to start it after failure, it starts looking for WAL files that it has already recovered. Here are the log files. My annotations start with >> :::::::::::::: postgresql-2010-05-06_102646.log :::::::::::::: >> First time starting the server after the data directory was copied from the master 2010-05-06 10:26:46 CDT :LOG: database system was interrupted at 2010-05-06 09:32:20 CDT 2010-05-06 10:26:46 CDT :LOG: starting archive recovery 2010-05-06 10:26:46 CDT :LOG: restore_command = "/opt/data/restore.sh /opt/wal/archwalremote/%f %p" 2010-05-06 10:26:46 CDT :LOG: restored log file "00000001000000AF00000048.00000020.backup" from archive 2010-05-06 10:26:47 CDT :LOG: restored log file "00000001000000AF00000048" from archive 2010-05-06 10:26:47 CDT :LOG: checkpoint record is at AF/48000020 2010-05-06 10:26:47 CDT :LOG: redo record is at AF/48000020; undo record is at 0/0; shutdown FALSE 2010-05-06 10:26:47 CDT :LOG: next transaction ID: 0/326126241; next OID: 862575 2010-05-06 10:26:47 CDT :LOG: next MultiXactId: 2128; next MultiXactOffset: 4280 2010-05-06 10:26:47 CDT :LOG: automatic recovery in progress 2010-05-06 10:26:47 CDT :LOG: redo starts at AF/48000068 2010-05-06 10:26:48 CDT :LOG: restored log file "00000001000000AF00000049" from archive 2010-05-06 10:26:50 CDT :LOG: restored log file "00000001000000AF0000004A" from archive 2010-05-06 10:26:51 CDT :LOG: restored log file "00000001000000AF0000004B" from archive 2010-05-06 10:26:52 CDT :LOG: restored log file "00000001000000AF0000004C" from archive 2010-05-06 10:26:54 CDT :LOG: restored log file "00000001000000AF0000004D" from archive 2010-05-06 10:26:56 CDT :LOG: restored log file "00000001000000AF0000004E" from archive 2010-05-06 10:26:58 CDT :LOG: restored log file "00000001000000AF0000004F" from archive 2010-05-06 10:27:00 CDT :LOG: restored log file "00000001000000AF00000050" from archive 2010-05-06 10:27:02 CDT :LOG: restored log file "00000001000000AF00000051" from archive 2010-05-06 10:27:04 CDT :LOG: restored log file "00000001000000AF00000052" from archive 2010-05-06 10:27:24 CDT :LOG: restored log file "00000001000000AF00000053" from archive >> WAL files now being restored as they arrive from the master 2010-05-06 10:32:25 CDT :LOG: restored log file "00000001000000AF00000054" from archive 2010-05-06 10:37:26 CDT :LOG: restored log file "00000001000000AF00000055" from archive 2010-05-06 10:42:27 CDT :LOG: restored log file "00000001000000AF00000056" from archive 2010-05-06 10:47:28 CDT :LOG: restored log file "00000001000000AF00000057" from archive 2010-05-06 10:52:29 CDT :LOG: restored log file "00000001000000AF00000058" from archive 2010-05-06 10:57:30 CDT :LOG: restored log file "00000001000000AF00000059" from archive >> Now I trigger the restore command to return 1 to stop the recovery 2010-05-06 10:59:30 CDT :LOG: could not open file "pg_xlog/00000001000000AF0000005A" (log file 175, segment 90): No such file or directory 2010-05-06 10:59:30 CDT :LOG: redo done at AF/59000068 2010-05-06 10:59:30 CDT :PANIC: could not open file "pg_xlog/00000001000000AF00000059" (log file 175, segment 89): No such file or directory 2010-05-06 10:59:34 CDT :LOG: startup process (PID 24763) was terminated by signal 6 2010-05-06 10:59:34 CDT :LOG: aborting startup due to startup process failure 2010-05-06 10:59:34 CDT :LOG: logger shutting down :::::::::::::: postgresql-2010-05-06_105934.log :::::::::::::: >> Attempting to restart server 2010-05-06 10:59:34 CDT :LOG: database system was interrupted while in recovery at log time 2010-05-06 10:57:20 CDT 2010-05-06 10:59:34 CDT :HINT: If this has occurred more than once some data may be corrupted and you may need to choose an earlier recovery target. 2010-05-06 10:59:34 CDT :LOG: starting archive recovery 2010-05-06 10:59:34 CDT :LOG: restore_command = "/opt/data/restore.sh /opt/wal/archwalremote/%f %p" 2010-05-06 10:59:34 CDT :LOG: could not open file "pg_xlog/00000001000000AF00000059" (log file 175, segment 89): No such file or directory 2010-05-06 10:59:34 CDT :LOG: invalid primary checkpoint record 2010-05-06 10:59:34 CDT :LOG: could not open file "pg_xlog/00000001000000AF00000058" (log file 175, segment 88): No such file or directory 2010-05-06 10:59:34 CDT :LOG: invalid secondary checkpoint record 2010-05-06 10:59:34 CDT :PANIC: could not locate a valid checkpoint record 2010-05-06 10:59:38 CDT :LOG: startup process (PID 29450) was terminated by signal 6 2010-05-06 10:59:38 CDT :LOG: aborting startup due to startup process failure 2010-05-06 10:59:38 CDT :LOG: logger shutting down Here is some information from the core file. I don't know if this is useful since this is not a debug build. ---------------------- bsiausstgdb02:/opt/data root# pflags core core 'core' of 29639: /usr/local/pgsql-8.2.16/bin/postgres -D /opt/data data model = _ILP32 flags = ORPHAN|MSACCT|MSFORK /1: flags = 0 sigmask = 0xffffbefc,0x0000ffff cursig = SIGABRT ---------------------- bsiausstgdb02:/opt/data root# mdb core mdb: core file data for mapping at cbc00000 not saved: Interrupted system call mdb: core file data for mapping at ce4b5000 not saved: Interrupted system call mdb: core file data for mapping at ce4c0000 not saved: Interrupted system call mdb: core file data for mapping at ce4e0000 not saved: Interrupted system call mdb: core file data for mapping at ce500000 not saved: Interrupted system call mdb: core file data for mapping at ce520000 not saved: Interrupted system call mdb: core file data for mapping at ce530000 not saved: Interrupted system call mdb: core file data for mapping at ce5c1000 not saved: Interrupted system call mdb: core file data for mapping at ce5c6000 not saved: Interrupted system call mdb: core file data for mapping at ce5d0000 not saved: Interrupted system call mdb: core file data for mapping at ce5ee000 not saved: Interrupted system call mdb: core file data for mapping at ce600000 not saved: Interrupted system call mdb: core file data for mapping at ce617000 not saved: Interrupted system call mdb: core file data for mapping at ce618000 not saved: Interrupted system call mdb: core file data for mapping at ce620000 not saved: Interrupted system call mdb: core file data for mapping at ce630000 not saved: Interrupted system call mdb: core file data for mapping at ce6ff000 not saved: Interrupted system call mdb: core file data for mapping at ce706000 not saved: Interrupted system call mdb: core file data for mapping at ce710000 not saved: Interrupted system call mdb: core file data for mapping at ce762000 not saved: Interrupted system call mdb: core file data for mapping at ce770000 not saved: Interrupted system call mdb: core file data for mapping at ce78b000 not saved: Interrupted system call mdb: core file data for mapping at ce790000 not saved: Interrupted system call mdb: core file data for mapping at ce7a6000 not saved: Interrupted system call mdb: core file data for mapping at ce7b0000 not saved: Interrupted system call mdb: core file data for mapping at ce7c0000 not saved: Interrupted system call mdb: core file data for mapping at ce7c4000 not saved: Interrupted system call mdb: core file data for mapping at ce7fb000 not saved: Interrupted system call mdb: core file data for mapping at ce7fc000 not saved: Interrupted system call mdb: warning: librtld_db failed to initialize; shared library information will not be available Loading modules: [ ld.so.1 ] > ::stack 0xce6d5f85(1, 6, 8329ba0, ce6ff000, 8044618, ce6615e9) 0xce6811cb(6, 8329ba0, 0, 0, 0, 0) 0xce6615e9(832d4d0, 16, 8044648, 821ae3b, 0, 0) 0x821b032(0, 82a1510, 12d8, 823f077) StartupXLOG+0x1122(2, 0, 2, 8339060, 82a9411, 833c840) BootstrapMain+0x46a(4, 8047880, 82a941a, 2, 32782d, ce7c0d50) 0x818aa17(1f, 1, 0, d, 0, 0) PostmasterMain+0x633(3, 8338d90, 1, 5a, 8047dec) main+0x396(3, 8338d90, 8047e1c) _start+0x80(3, 8047ea8, 8047fd0, 8047fd0, 0, 8047eda) > ---------------------- If I can provide more information, please let me know and I will get it for you. -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs