The following bug has been logged on the website: Bug reference: 7753 Logged by: Daniel Farina Email address: dan...@heroku.com PostgreSQL version: 9.1.6 Operating system: Ubuntu 10.04 Description:
Touching a trigger file will not cause promotion out of hot standby. Basically, an apparently normally-working hot-standby database will not leave hot standby. The database emitting WAL is version 9.1.4. Everything appears normal in the log (downloads and restoring of archived segments), and the server seems to take no notice of the trigger file. To force the issue, I introduced an error into the configuration of the restoration program to cause it to exit. Normally that's no problem; postgres would just keep on trying to restore a segment over and over until the error is fixed. Instead, the server crashes: [413-1] [COPPER] LOG: restored log file "000000010000034D00000050" from archive wal_e.worker.s3_worker INFO MSG: completed download and decompression#012 DETAIL: Downloaded and decompressed "s3://archive-root/wal_005/000000010000034D00000051.lzo" to "pg_xlog/RECOVERYXLOG" [414-1] [COPPER] LOG: restored log file "000000010000034D00000051" from archive wal_e.worker.s3_worker INFO MSG: completed download and decompression#012 DETAIL: Downloaded and decompressed "s3://archive-root/wal_005/000000010000034D00000052.lzo" to "pg_xlog/RECOVERYXLOG" [415-1] [COPPER] LOG: restored log file "000000010000034D00000052" from archive # I introduce the failure here wal_e.main ERROR MSG: no AWS_SECRET_ACCESS_KEY defined#012 HINT: Define the environment variable AWS_SECRET_ACCESS_KEY. LOG: trigger file found: /etc/postgresql/wal-e.d/pull-env/STANDBY_OFF LOG: redo done at 34D/52248590 LOG: last completed transaction was at log time 2012-12-10 wal_e.main ERROR MSG: no AWS_SECRET_ACCESS_KEY defined#012 HINT: Define the environment variable AWS_SECRET_ACCESS_KEY. PANIC: could not open file "pg_xlog/000000010000034D00000052" (log file 845, segment 82): No such file or directory LOG: startup process (PID 7) was terminated by signal 6: Aborted LOG: terminating any other active server processes WARNING: terminating connection because of crash of another server process WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. I can fix the configuration and restart the server, and everything is as fine as before. Next, I try removing recovery.conf and restarting the server as an alternative way of promoting...but, no avail; however, a slightly different error message: # Server begins starting LOG: loaded library "auto_explain" LOG: loaded library "pg_stat_statements" LOG: database system was interrupted while in recovery at log time 2012-12-10 15:20:03 UTC HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target. LOG: could not open file "pg_xlog/000000010000034E0000001A" (log file 846, segment 26): No such file or directory LOG: invalid primary checkpoint record LOG: could not open file "pg_xlog/000000010000034D000000F2" (log file 845, segment 242): No such file or directory LOG: invalid secondary checkpoint record PANIC: could not locate a valid checkpoint record LOG: startup process (PID 7) was terminated by signal 6: Aborted LOG: aborting startup due to startup process failure main process (24284) terminated with status 1 pg_control looks like this around the same time, for reference: pg_control version number: 903 Catalog version number: 201105231 Database cluster state: in archive recovery pg_control last modified: Wed 12 Dec 2012 09:22:30 PM UTC Latest checkpoint location: 351/1FE194C0 Prior checkpoint location: 351/FD64A78 Latest checkpoint's REDO location: 351/131848C8 Latest checkpoint's TimeLineID: 1 Latest checkpoint's NextXID: 0/652342033 Latest checkpoint's NextOID: 103224 Latest checkpoint's NextMultiXactId: 1 Latest checkpoint's NextMultiOffset: 0 Latest checkpoint's oldestXID: 455900714 Latest checkpoint's oldestXID's DB: 16385 Latest checkpoint's oldestActiveXID: 652311442 Time of latest checkpoint: Mon 10 Dec 2012 07:19:23 PM UTC Minimum recovery ending location: 351/4BFFFE20 Backup start location: 0/0 Current wal_level setting: hot_standby Current max_connections setting: 500 Current max_prepared_xacts setting: 500 Current max_locks_per_xact setting: 64 Maximum data alignment: 8 Database block size: 8192 Blocks per segment of large relation: 131072 WAL block size: 8192 Bytes per WAL segment: 16777216 Maximum length of identifiers: 64 Maximum columns in an index: 32 Maximum size of a TOAST chunk: 1996 Date/time type storage: 64-bit integers Float4 argument passing: by value Float8 argument passing: by value In the course of all this messing around, it has never been a problem to go back to archive recovery. -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs