[GENERAL] Streaming replication slave crash

Quentin Hartman Fri, 29 Mar 2013 09:19:55 -0700

Yesterday morning, one of my streaming replication slaves running 9.2.3
crashed with the following in the log file:


2013-03-28 12:49:30 GMT WARNING:  page 1441792 of relation base/63229/63370
does not exist
2013-03-28 12:49:30 GMT CONTEXT:  xlog redo delete: index
1663/63229/109956; iblk 303, heap 1663/63229/63370;
2013-03-28 12:49:30 GMT PANIC:  WAL contains references to invalid pages
2013-03-28 12:49:30 GMT CONTEXT:  xlog redo delete: index
1663/63229/109956; iblk 303, heap 1663/63229/63370;
2013-03-28 12:49:31 GMT LOG:  startup process (PID 22941) was terminated by
signal 6: Aborted
2013-03-28 12:49:31 GMT LOG:  terminating any other active server processes
2013-03-28 12:49:31 GMT WARNING:  terminating connection because of crash
of another server process
2013-03-28 12:49:31 GMT DETAIL:  The postmaster has commanded this server
process to roll back the current transaction and exit, because another
server process exited abnormally and possibly corrupted shared memory.
2013-03-28 12:49:31 GMT HINT:  In a moment you should be able to reconnect
to the database and repeat your command.
2013-03-28 12:57:44 GMT LOG:  database system was interrupted while in
recovery at log time 2013-03-28 12:37:42 GMT
2013-03-28 12:57:44 GMT HINT:  If this has occurred more than once some
data might be corrupted and you might need to choose an earlier recovery
target.
2013-03-28 12:57:44 GMT LOG:  entering standby mode
2013-03-28 12:57:44 GMT LOG:  redo starts at 19/2367CE30
2013-03-28 12:57:44 GMT LOG:  incomplete startup packet
2013-03-28 12:57:44 GMT LOG:  consistent recovery state reached at
19/241835B0
2013-03-28 12:57:44 GMT LOG:  database system is ready to accept read only
connections
2013-03-28 12:57:44 GMT LOG:  invalid record length at 19/2419EE38
2013-03-28 12:57:44 GMT LOG:  streaming replication successfully connected
to primary

As you can see I was able to restart it and it picked up and synchronized
right away, but this crash still concerns me.

The DB has about 75GB of data in it, and it is almost entirely write
traffic. It's essentially a log aggregator. I believe it was doing a
pg_dump backup at the time of the crash. It has hot_standby_feedback on to
allow that process to complete.

Any insights into this, or advice on figuring out the root of it would be
appreciated. So far all the things I've found like this are bugs that
should be fixed in this version, or the internet equivalent of a shrug.

Thanks!

QH

[GENERAL] Streaming replication slave crash

Reply via email to