Yesterday morning, one of my streaming replication slaves running 9.2.3 crashed with the following in the log file:
2013-03-28 12:49:30 GMT WARNING: page 1441792 of relation base/63229/63370 does not exist 2013-03-28 12:49:30 GMT CONTEXT: xlog redo delete: index 1663/63229/109956; iblk 303, heap 1663/63229/63370; 2013-03-28 12:49:30 GMT PANIC: WAL contains references to invalid pages 2013-03-28 12:49:30 GMT CONTEXT: xlog redo delete: index 1663/63229/109956; iblk 303, heap 1663/63229/63370; 2013-03-28 12:49:31 GMT LOG: startup process (PID 22941) was terminated by signal 6: Aborted 2013-03-28 12:49:31 GMT LOG: terminating any other active server processes 2013-03-28 12:49:31 GMT WARNING: terminating connection because of crash of another server process 2013-03-28 12:49:31 GMT DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2013-03-28 12:49:31 GMT HINT: In a moment you should be able to reconnect to the database and repeat your command. 2013-03-28 12:57:44 GMT LOG: database system was interrupted while in recovery at log time 2013-03-28 12:37:42 GMT 2013-03-28 12:57:44 GMT HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target. 2013-03-28 12:57:44 GMT LOG: entering standby mode 2013-03-28 12:57:44 GMT LOG: redo starts at 19/2367CE30 2013-03-28 12:57:44 GMT LOG: incomplete startup packet 2013-03-28 12:57:44 GMT LOG: consistent recovery state reached at 19/241835B0 2013-03-28 12:57:44 GMT LOG: database system is ready to accept read only connections 2013-03-28 12:57:44 GMT LOG: invalid record length at 19/2419EE38 2013-03-28 12:57:44 GMT LOG: streaming replication successfully connected to primary As you can see I was able to restart it and it picked up and synchronized right away, but this crash still concerns me. The DB has about 75GB of data in it, and it is almost entirely write traffic. It's essentially a log aggregator. I believe it was doing a pg_dump backup at the time of the crash. It has hot_standby_feedback on to allow that process to complete. Any insights into this, or advice on figuring out the root of it would be appreciated. So far all the things I've found like this are bugs that should be fixed in this version, or the internet equivalent of a shrug. Thanks! QH