Hi List! On an originally designated hot_standby that had been not so hot for the last 24h i tried to replay the wal_files left over from a (seemingly corrupted) master server [missing clog... ].
I get this output: "/etc/postgresql/9.1/main/postgresql.conf" 584L, 19942C written root@pgstandby7:/var/lib/postgresql/9.1# /etc/init.d/postgresql restart Restarting PostgreSQL 9.1 database server: main2013-07-30 06:18:20 UTC LOG: database system was interrupted while in recovery at log time 2013-07-29 09:39:03 UTC 2013-07-30 06:18:20 UTC HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target. 2013-07-30 06:18:20 UTC LOG: entering standby mode 2013-07-30 06:18:20 UTC LOG: incomplete startup packet 2013-07-30 06:18:20 UTC LOG: restored log file "000000010000027A0000002B" from archive 2013-07-30 06:18:20 UTC LOG: redo starts at 27A/2B000078 2013-07-30 06:18:21 UTC FATAL: the database system is starting up 2013-07-30 06:18:21 UTC FATAL: the database system is starting up 2013-07-30 06:18:21 UTC LOG: restored log file "000000010000027A0000002C" from archive 2013-07-30 06:18:22 UTC FATAL: the database system is starting up 2013-07-30 06:18:22 UTC LOG: restored log file "000000010000027A0000002D" from archive 2013-07-30 06:18:22 UTC FATAL: the database system is starting up 2013-07-30 06:18:22 UTC LOG: restored log file "000000010000027A0000002E" from archive 2013-07-30 06:18:22 UTC LOG: consistent recovery state reached at 27A/2E3F42E8 2013-07-30 06:18:22 UTC PANIC: _bt_restore_page: cannot add item to page 2013-07-30 06:18:22 UTC CONTEXT: xlog redo split_r: rel 1663/16405/797541 left 4743, right 18008, next 9681, level 0, firstright 194 2013-07-30 06:18:22 UTC LOG: startup process (PID 11637) was terminated by signal 6: Aborted 2013-07-30 06:18:22 UTC LOG: terminating any other active server processes The PostgreSQL server failed to start. Please check the log output: 2013-07-30 06:18:20 UTC LOG: database system was interrupted while in recovery at log time 2013-07-29 09:39:03 UTC 2013-07-30 06:18:20 UTC HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target. 2013-07-30 06:18:20 UTC LOG: entering standby mode 2013-07-30 06:18:20 UTC LOG: incomplete startup packet 2013-07-30 06:18:20 UTC LOG: restored log file "000000010000027A0000002B" from archive 2013-07-30 06:18:20 UTC LOG: redo starts at 27A/2B000078 2013-07-30 06:18:21 UTC FATAL: the database system is starting up 2013-07-30 06:18:21 UTC FATAL: the database system is starting up 2013-07-30 06:18:21 UTC LOG: restored log file "000000010000027A0000002C" from archive 2013-07-30 06:18:22 UTC FATAL: the database system is starting up 2013-07-30 06:18:22 UTC LOG: restored log file "000000010000027A0000002D" from archive 2013-07-30 06:18:22 UTC FATAL: the database system is starting up 2013-07-30 06:18:22 UTC LOG: restored log file "000000010000027A0000002E" from archive 2013-07-30 06:18:22 UTC LOG: consistent recovery state reached at 27A/2E3F42E8 2013-07-30 06:18:22 UTC PANIC: _bt_restore_page: cannot add item to page 2013-07-30 06:18:22 UTC CONTEXT: xlog redo split_r: rel 1663/16405/797541 left 4743, right 18008, next 9681, level 0, firstright 194 2013-07-30 06:18:22 UTC LOG: startup process (PID 11637) was terminated by signal 6: Aborted 2013-07-30 06:18:22 UTC LOG: terminating any other active server processes ... failed! failed! root@pgstandby7:/var/lib/postgresql/9.1# sha1sum /home/validad-pg-backups/pgmaster/wal_files/000000010000027A0000002E.gz 5e9390c165b0885f165ed2bceafdd88692994b8a /home/validad-pg-backups/pgmaster/wal_files/000000010000027A0000002E.gz The wal_file itself '000000010000027A0000002E.gz' is the same on all my 3 backup destinations so if there was a corruption, it occured on the master host. It there a way to 'skip' this wal file? Initially, I agreed with 'greg stark' on pgsql-bugs, that there must have been a filesystem / whatever problem with my master cluster, but how can that propagate to a hot/warm standby? shouldn't they write/delete their own files? mandatory basic info: #------------------------------------------------------------------------------ # CUSTOMIZED OPTIONS #------------------------------------------------------------------------------ #custom_variable_classes = '' # list of custom variable class names listen_addresses = '*' # what IP address(es) to listen on; max_connections = 25 # (change requires restart) timezone = 'Etc/UTC' shared_buffers = 250MB # min 128kB maintenance_work_mem = 20MB checkpoint_completion_target = 0.9 effective_cache_size = 200MB hot_standby = off # "on" allows queries during recovery max_standby_archive_delay = 90min # max delay before canceling queries # when reading WAL from archive; # -1 allows indefinite delay max_standby_streaming_delay = 50min # max delay before canceling queries # when reading streaming WAL; # -1 allows indefinite delay data_directory = '/var/lib/postgresql/9.1/not_main' # use data in another directory root@pgstandby7:/var/lib/postgresql/9.1# cat not_main/recovery.conf # Note that recovery.conf must be in $PGDATA directory. # Specifies whether to start the server as a standby. In streaming replication, # this parameter must to be set to on. standby_mode = 'on' # Specifies a connection string which is used for the standby server to connect # with the primary. primary_conninfo = 'host=pgmaster7.ipv6.oo.net port=5432 user=pgu_replicate password=passwordpasswordpassword' # Specifies a command to load archive segments from the WAL archive. If # wal_keep_segments is a high enough number to retain the WAL segments # required for the standby server, this may not be necessary. But # a large workload can cause segments to be recycled before the standby # is fully synchronized, requiring you to start again from a new base backup. restore_command = 'cp /home/validad-pg-backups/pgmaster/wal_files/%f.gz /tmp/%f.gz && gunzip /tmp/%f.gz && mv /tmp/%f %p' # Specifies a trigger file whose presence should cause streaming replication to # end (i.e., failover). trigger_file = '/var/lib/postgresql/9.1/main/stop_replication_trigger' cannot currently to psql 'select version()' as the beast won't startup ;(, but it's ii postgresql-9.1 9.1.9-1 amd64 object-relational SQL database, version 9.1 server thank you, klaus