[BUGS] WAL Receiver Segmentation Fault
Postgres 9.0.11 running as a hot standby. The master was restarted and the standby went into a segmentation fault loop. A hard stop/start fixed it. Here are pertinent logs with excess and identifying information removed: 2012-12-28 03:39:14 UTC [16850]: [2-1] FATAL: replication terminated by primary server zcat: /mnt/dbmount/walarchive/00031A0100D5.gz: No such file or directory 2012-12-28 03:39:14 UTC [16801]: [21-1] LOG: record with zero length at 1A01/D578 zcat: /mnt/dbmount/walarchive/00031A0100D5.gz: No such file or directory 2012-12-28 03:39:14 UTC [16798]: [2-1] LOG: WAL receiver process (PID 16671) was terminated by signal 11: Segmentation fault 2012-12-28 03:39:14 UTC [16798]: [3-1] LOG: terminating any other active server processes 2012-12-28 03:39:15 UTC [16798]: [4-1] LOG: all server processes terminated; reinitializing 2012-12-28 03:39:15 UTC [16673]: [1-1] LOG: database system was interrupted while in recovery at log time 2012-12-28 03:35:47 UTC 2012-12-28 03:39:15 UTC [16673]: [2-1] HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target. zcat: /mnt/dbmount/walarchive/0004.history.gz: No such file or directory zcat: /mnt/dbmount/walarchive/0003.history.gz: No such file or directory 2012-12-28 03:39:16 UTC [16673]: [3-1] LOG: entering standby mode zcat: /mnt/dbmount/walarchive/00031A010092.gz: No such file or directory zcat: /mnt/dbmount/walarchive/00031A01007D.gz: No such file or directory 2012-12-28 03:39:16 UTC [16673]: [4-1] LOG: redo starts at 1A01/7D00C500 zcat: /mnt/dbmount/walarchive/00031A01007E.gz: No such file or directory zcat: /mnt/dbmount/walarchive/00031A01007F.gz: No such file or directory ... zcat: /mnt/dbmount/walarchive/00031A0100C0.gz: No such file or directory zcat: /mnt/dbmount/walarchive/00031A0100C1.gz: No such file or directory 2012-12-28 03:39:24 UTC [16681]: [1-1] LOG: restartpoint starting: xlog zcat: /mnt/dbmount/walarchive/00031A0100C2.gz: No such file or directory zcat: /mnt/dbmount/walarchive/00031A0100C3.gz: No such file or directory ... zcat: /mnt/dbmount/walarchive/00031A0100D3.gz: No such file or directory zcat: /mnt/dbmount/walarchive/00031A0100D4.gz: No such file or directory 2012-12-28 03:39:28 UTC [16673]: [5-1] LOG: consistent recovery state reached at 1A01/D430F1A0 2012-12-28 03:39:28 UTC [16798]: [5-1] LOG: database system is ready to accept read only connections zcat: /mnt/dbmount/walarchive/00031A0100D5.gz: No such file or directory 2012-12-28 03:39:28 UTC [16673]: [6-1] LOG: record with zero length at 1A01/D578 zcat: /mnt/dbmount/walarchive/00031A0100D5.gz: No such file or directory 2012-12-28 03:39:28 UTC [16798]: [6-1] LOG: WAL receiver process (PID 16870) was terminated by signal 11: Segmentation fault 2012-12-28 03:39:28 UTC [16798]: [7-1] LOG: terminating any other active server processes 2012-12-28 03:39:28 UTC [16798]: [8-1] LOG: all server processes terminated; reinitializing 2012-12-28 03:39:30 UTC [16871]: [1-1] LOG: database system was interrupted while in recovery at log time 2012-12-28 03:35:47 UTC 2012-12-28 03:39:30 UTC [16871]: [2-1] HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target. zcat: /mnt/dbmount/walarchive/0004.history.gz: No such file or directory zcat: /mnt/dbmount/walarchive/0003.history.gz: No such file or directory 2012-12-28 03:39:30 UTC [16871]: [3-1] LOG: entering standby mode zcat: /mnt/dbmount/walarchive/00031A010092.gz: No such file or directory zcat: /mnt/dbmount/walarchive/00031A01007D.gz: No such file or directory 2012-12-28 03:39:30 UTC [16871]: [4-1] LOG: redo starts at 1A01/7D00C500 zcat: /mnt/dbmount/walarchive/00031A01007E.gz: No such file or directory zcat: /mnt/dbmount/walarchive/00031A01007F.gz: No such file or directory ... zcat: /mnt/dbmount/walarchive/00031A0100C0.gz: No such file or directory zcat: /mnt/dbmount/walarchive/00031A0100C1.gz: No such file or directory 2012-12-28 03:39:38 UTC [16883]: [1-1] LOG: restartpoint starting: xlog zcat: /mnt/dbmount/walarchive/00031A0100C2.gz: No such file or directory zcat: /mnt/dbmount/walarchive/00031A0100C3.gz: No such file or directory ... zcat: /mnt/dbmount/walarchive/00031A0100D3.gz: No such file or directory zcat: /mnt/dbmount/walarchive/00031A0100D4.gz: No such file or directory 2012-12-28 03:39:41 UTC [16871]: [5-1] LOG: consistent recovery state reached at 1A01/D430F1A0 2012-12-28 03:39:41 UTC [16798]: [9-1] LOG: database system is ready to accept read only connections zcat: /mnt/dbmount/walarchive/00031A0100D5.gz: No such file or directory 2
Re: [BUGS] WAL Receiver Segmentation Fault
On 28.12.2012 20:55, Phil Sorber wrote: Postgres 9.0.11 running as a hot standby. The master was restarted and the standby went into a segmentation fault loop. A hard stop/start fixed it. Here are pertinent logs with excess and identifying information removed: ... If there is any more info I can provide, let me know. This is a production DB so I won't be able to do any disruptive testing. Based on what I have seen so far, I think this would be difficult to replicate anyway. A stack trace would be nice. If you didn't get a core dump this time, it would be good to configure the system so that you get one next time it happens. - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] WAL Receiver Segmentation Fault
On Fri, Dec 28, 2012 at 5:30 PM, Heikki Linnakangas wrote: > A stack trace would be nice. If you didn't get a core dump this time, it > would be good to configure the system so that you get one next time it > happens. > > - Heikki Sorry, no core. I will get it set up in case it happens again. -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs