Postgres 9.0.11 running as a hot standby. The master was restarted and the standby went into a segmentation fault loop. A hard stop/start fixed it. Here are pertinent logs with excess and identifying information removed:
2012-12-28 03:39:14 UTC [16850]: [2-1] FATAL: replication terminated by primary server zcat: /mnt/dbmount/walarchive/0000000300001A01000000D5.gz: No such file or directory 2012-12-28 03:39:14 UTC [16801]: [21-1] LOG: record with zero length at 1A01/D5000078 zcat: /mnt/dbmount/walarchive/0000000300001A01000000D5.gz: No such file or directory 2012-12-28 03:39:14 UTC [16798]: [2-1] LOG: WAL receiver process (PID 16671) was terminated by signal 11: Segmentation fault 2012-12-28 03:39:14 UTC [16798]: [3-1] LOG: terminating any other active server processes 2012-12-28 03:39:15 UTC [16798]: [4-1] LOG: all server processes terminated; reinitializing 2012-12-28 03:39:15 UTC [16673]: [1-1] LOG: database system was interrupted while in recovery at log time 2012-12-28 03:35:47 UTC 2012-12-28 03:39:15 UTC [16673]: [2-1] HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target. zcat: /mnt/dbmount/walarchive/00000004.history.gz: No such file or directory zcat: /mnt/dbmount/walarchive/00000003.history.gz: No such file or directory 2012-12-28 03:39:16 UTC [16673]: [3-1] LOG: entering standby mode zcat: /mnt/dbmount/walarchive/0000000300001A0100000092.gz: No such file or directory zcat: /mnt/dbmount/walarchive/0000000300001A010000007D.gz: No such file or directory 2012-12-28 03:39:16 UTC [16673]: [4-1] LOG: redo starts at 1A01/7D00C500 zcat: /mnt/dbmount/walarchive/0000000300001A010000007E.gz: No such file or directory zcat: /mnt/dbmount/walarchive/0000000300001A010000007F.gz: No such file or directory ... zcat: /mnt/dbmount/walarchive/0000000300001A01000000C0.gz: No such file or directory zcat: /mnt/dbmount/walarchive/0000000300001A01000000C1.gz: No such file or directory 2012-12-28 03:39:24 UTC [16681]: [1-1] LOG: restartpoint starting: xlog zcat: /mnt/dbmount/walarchive/0000000300001A01000000C2.gz: No such file or directory zcat: /mnt/dbmount/walarchive/0000000300001A01000000C3.gz: No such file or directory ... zcat: /mnt/dbmount/walarchive/0000000300001A01000000D3.gz: No such file or directory zcat: /mnt/dbmount/walarchive/0000000300001A01000000D4.gz: No such file or directory 2012-12-28 03:39:28 UTC [16673]: [5-1] LOG: consistent recovery state reached at 1A01/D430F1A0 2012-12-28 03:39:28 UTC [16798]: [5-1] LOG: database system is ready to accept read only connections zcat: /mnt/dbmount/walarchive/0000000300001A01000000D5.gz: No such file or directory 2012-12-28 03:39:28 UTC [16673]: [6-1] LOG: record with zero length at 1A01/D5000078 zcat: /mnt/dbmount/walarchive/0000000300001A01000000D5.gz: No such file or directory 2012-12-28 03:39:28 UTC [16798]: [6-1] LOG: WAL receiver process (PID 16870) was terminated by signal 11: Segmentation fault 2012-12-28 03:39:28 UTC [16798]: [7-1] LOG: terminating any other active server processes 2012-12-28 03:39:28 UTC [16798]: [8-1] LOG: all server processes terminated; reinitializing 2012-12-28 03:39:30 UTC [16871]: [1-1] LOG: database system was interrupted while in recovery at log time 2012-12-28 03:35:47 UTC 2012-12-28 03:39:30 UTC [16871]: [2-1] HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target. zcat: /mnt/dbmount/walarchive/00000004.history.gz: No such file or directory zcat: /mnt/dbmount/walarchive/00000003.history.gz: No such file or directory 2012-12-28 03:39:30 UTC [16871]: [3-1] LOG: entering standby mode zcat: /mnt/dbmount/walarchive/0000000300001A0100000092.gz: No such file or directory zcat: /mnt/dbmount/walarchive/0000000300001A010000007D.gz: No such file or directory 2012-12-28 03:39:30 UTC [16871]: [4-1] LOG: redo starts at 1A01/7D00C500 zcat: /mnt/dbmount/walarchive/0000000300001A010000007E.gz: No such file or directory zcat: /mnt/dbmount/walarchive/0000000300001A010000007F.gz: No such file or directory ... zcat: /mnt/dbmount/walarchive/0000000300001A01000000C0.gz: No such file or directory zcat: /mnt/dbmount/walarchive/0000000300001A01000000C1.gz: No such file or directory 2012-12-28 03:39:38 UTC [16883]: [1-1] LOG: restartpoint starting: xlog zcat: /mnt/dbmount/walarchive/0000000300001A01000000C2.gz: No such file or directory zcat: /mnt/dbmount/walarchive/0000000300001A01000000C3.gz: No such file or directory ... zcat: /mnt/dbmount/walarchive/0000000300001A01000000D3.gz: No such file or directory zcat: /mnt/dbmount/walarchive/0000000300001A01000000D4.gz: No such file or directory 2012-12-28 03:39:41 UTC [16871]: [5-1] LOG: consistent recovery state reached at 1A01/D430F1A0 2012-12-28 03:39:41 UTC [16798]: [9-1] LOG: database system is ready to accept read only connections zcat: /mnt/dbmount/walarchive/0000000300001A01000000D5.gz: No such file or directory 2012-12-28 03:39:41 UTC [16871]: [6-1] LOG: record with zero length at 1A01/D5000078 zcat: /mnt/dbmount/walarchive/0000000300001A01000000D5.gz: No such file or directory 2012-12-28 03:39:41 UTC [16798]: [10-1] LOG: WAL receiver process (PID 17144) was terminated by signal 11: Segmentation fault 2012-12-28 03:39:41 UTC [16798]: [11-1] LOG: terminating any other active server processes 2012-12-28 03:39:42 UTC [16798]: [12-1] LOG: all server processes terminated; reinitializing Basically kept doing that over and over until I stopped and started it: 2012-12-28 03:58:22 UTC [16798]: [161-1] LOG: received fast shutdown request 2012-12-28 03:58:22 UTC [983]: [1-1] LOG: shutting down 2012-12-28 03:58:22 UTC [983]: [2-1] LOG: database system is shut down 2012-12-28 03:58:48 UTC [1219]: [1-1] LOG: database system was shut down in recovery at 2012-12-28 03:58:22 UTC zcat: /mnt/dbmount/walarchive/00000004.history.gz: No such file or directory zcat: /mnt/dbmount/walarchive/00000003.history.gz: No such file or directory 2012-12-28 03:58:48 UTC [1219]: [2-1] LOG: entering standby mode 2012-12-28 03:58:48 UTC [1219]: [3-1] LOG: restored log file "0000000300001A01000000C1" from archive 2012-12-28 03:58:48 UTC [1219]: [4-1] LOG: restored log file "0000000300001A01000000AF" from archive 2012-12-28 03:58:48 UTC [1219]: [5-1] LOG: redo starts at 1A01/AF010A98 2012-12-28 03:58:48 UTC [1219]: [6-1] LOG: restored log file "0000000300001A01000000B0" from archive 2012-12-28 03:58:48 UTC [1219]: [7-1] LOG: restored log file "0000000300001A01000000B1" from archive ... 2012-12-28 03:59:10 UTC [1219]: [50-1] LOG: restored log file "0000000300001A01000000DC" from archive 2012-12-28 03:59:10 UTC [1219]: [51-1] LOG: restored log file "0000000300001A01000000DD" from archive 2012-12-28 03:59:10 UTC [1219]: [52-1] LOG: consistent recovery state reached at 1A01/DDED8528 2012-12-28 03:59:10 UTC [1215]: [1-1] LOG: database system is ready to accept read only connections 2012-12-28 03:59:10 UTC [1219]: [53-1] LOG: restored log file "0000000300001A01000000DE" from archive zcat: /mnt/dbmount/walarchive/0000000300001A01000000DF.gz: No such file or directory 2012-12-28 03:59:10 UTC [1219]: [54-1] LOG: unexpected pageaddr 1A00/F4000000 in log file 6657, segment 223, offset 0 zcat: /mnt/dbmount/walarchive/0000000300001A01000000DF.gz: No such file or directory 2012-12-28 03:59:10 UTC [1700]: [1-1] LOG: streaming replication successfully connected to primary I'll note that /mnt/dbmount is on NFS. That might be related to the problem, but I did nothing to NFS at any point to fix this. It also never attempts to connect to primary when it couldn't find the archive. If there is any more info I can provide, let me know. This is a production DB so I won't be able to do any disruptive testing. Based on what I have seen so far, I think this would be difficult to replicate anyway. I did a search and this was the only thing related I could find: http://archives.postgresql.org/pgsql-bugs/2010-04/msg00080.php -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs