The following bug has been logged on the website: Bug reference: 8043 Logged by: Jeff Bohmer Email address: boh...@visionlink.org PostgreSQL version: 9.2.4 Operating system: CentOS 5.9 x86_64 kernel 2.6.18-348.3.1.el5 Description:
Hi pgsql-bugs, PG version: 9.2.4, from yum.postgresql.org OS: CentOS 5.9 x86_64 kernel 2.6.18-348.3.1.el5 Upgrading from 9.2.3 to 9.2.4 has broken archive recovery for me. Probably related to this 9.2.4 change: Ensure we do crash recovery before entering archive recovery, if the database was not stopped cleanly and a recovery.conf file is present (Heikki Linnakangas, Kyotaro Horiguchi, Mitsumasa Kondo) When starting the PostgreSQL 9.2.4 service on a base backup, I get this: 2013-04-05 12:49:04 MDT [10302]: [1-1] user=,db= LOG: database system was interrupted; last known up at 2013-04-05 10:02:01 MDT 2013-04-05 12:49:04 MDT [10302]: [2-1] user=,db= LOG: starting archive recovery 2013-04-05 12:49:04 MDT [10302]: [3-1] user=,db= LOG: could not open file "pg_xlog/000000010000002F0000002D" (log file 47, segment 45): No such file or directory 2013-04-05 12:49:04 MDT [10302]: [4-1] user=,db= LOG: invalid primary checkpoint record 2013-04-05 12:49:04 MDT [10302]: [5-1] user=,db= LOG: could not open file "pg_xlog/000000010000002F0000002C" (log file 47, segment 44): No such file or directory 2013-04-05 12:49:04 MDT [10302]: [6-1] user=,db= LOG: invalid secondary checkpoint record 2013-04-05 12:49:04 MDT [10302]: [7-1] user=,db= PANIC: could not locate a valid checkpoint record 2013-04-05 12:49:04 MDT [10297]: [2-1] user=,db= LOG: startup process (PID 10302) was terminated by signal 6: Aborted 2013-04-05 12:49:04 MDT [10297]: [3-1] user=,db= LOG: aborting startup due to startup process failure The WAL file 000000010000002F0000002D does exist in my WAL archive, but not in pg_xlog. I exclude pg_xlog files when taking the base backup, per the instructions from <http://www.postgresql.org/docs/9.2/static/continuous-archiving.html>: You can, however, omit from the backup dump the files within the cluster's pg_xlog/ subdirectory. This slight adjustment is worthwhile because it reduces the risk of mistakes when restoring. I use a custom base backup script to call pg_start/stop_backup() and make the backup with rsync. The restore_command in recovery.conf is never called by PG 9.2.4 during startup. I confirmed this by adding a "touch /tmp/restore_command.`date +%H:%M:%S`" line at the beginning of the shell script I use for my restore_command. No such files are created when starting PG 9.2.4. After downgrading back to 9.2.3, archive recovery works using the very same base backup, recovery.conf file, and restore_command. The log indicates that PG 9.2.3 begins recovery by pulling WAL files from the archive instead of pg_xlog: 2013-04-05 13:01:14 MDT [16981]: [1-1] user=,db= LOG: database system was interrupted; last known up at 2013-04-05 10:02:01 MDT 2013-04-05 13:01:14 MDT [16981]: [2-1] user=,db= LOG: starting archive recovery 2013-04-05 13:01:14 MDT [16981]: [3-1] user=,db= LOG: restored log file "000000010000002F0000002D" from archive 2013-04-05 13:01:14 MDT [16981]: [4-1] user=,db= LOG: consistent recovery state reached at 2F/2D000080 2013-04-05 13:01:14 MDT [16981]: [5-1] user=,db= LOG: redo starts at 2F/2D000080 2013-04-05 13:01:15 MDT [16981]: [6-1] user=,db= LOG: restored log file "000000010000002F0000002E" from archive 2013-04-05 13:01:15 MDT [16981]: [7-1] user=,db= LOG: restored log file "000000010000002F0000002F" from archive <snip> 2013-04-05 13:01:17 MDT [16981]: [25-1] user=,db= LOG: redo done at 2F/3F07B4D0 2013-04-05 13:01:17 MDT [16981]: [26-1] user=,db= LOG: last completed transaction was at log time 2013-04-05 12:53:01.069826-06 2013-04-05 13:01:17 MDT [16981]: [27-1] user=,db= LOG: restored log file "000000010000002F0000003F" from archive 2013-04-05 13:01:17 MDT [16981]: [28-1] user=,db= LOG: selected new timeline ID: 2 2013-04-05 13:01:17 MDT [16981]: [29-1] user=,db= LOG: archive recovery complete 2013-04-05 13:01:17 MDT [16991]: [1-1] user=,db= LOG: checkpoint starting: end-of-recovery immediate wait 2013-04-05 13:01:17 MDT [16991]: [2-1] user=,db= LOG: checkpoint complete: wrote 327 buffers (0.1%); 0 transaction log file(s) ad ded, 0 removed, 0 recycled; write=0.015 s, sync=0.000 s, total=0.035 s; sync files=0, longest=0.000 s, average=0.000 s 2013-04-05 13:01:17 MDT [16978]: [2-1] user=,db= LOG: database system is ready to accept connections So, the behavior has definitely changed between 9.2.3 and 9.2.4. Is this a bug in 9.2.4? Or, must I now include pg_xlog files when taking base backups with 9.2.4, contrary to the documentation? Or, is there a way to make PG 9.2.4 use restore_command for recovery, as 9.2.3 does? Thank you, - Jeff -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs