Hi,
We recently encountered a serious database crash that resulted in a significant loss of data…
We took down the database server, and when we restarted the backend we got an error 'database system shutdown was interrupted' … 'invalid checkpoint' etc… with missing xlog files (I've appended the log to the end of this post)…
I've been trawling list-archives for a few days and this issue has cropped up a number of times, but I've found it hard to identify a single post - or set of posts - that might help explain the cause of such a crash…
Hopefully I'll be able to bring together the results of this trawl through the archives in this post - but I'd really appreciate any help or suggestions people have - we currently have a slightly uneasy feeling because we've not quite got to the bottom of the issues, and it would be nice to set our minds at rest! :-)
So far I've identified two possible causes of the crash - I've listed them below, and wonder whether people have any comments on them:
1) We were running postgres version 7.3.6-1 (which is the version in RedHat AS3 : redhat EL AS3 kernel-smp-2.4.21-9.0.1EL)
The following post suggests that this is a known issue in 7.3.3, but 7.3.4 is safe? I assume, therefore, that 7.3.6-1 is also safe...
http://archives.postgresql.org/pgsql-general/2003-09/msg01086.php
2) We are running the database in conjunction with Jboss, connecting to the database server from a different machine via JDBC. The database was taken down *without* stopping Jboss first.
Any thoughts would be much apreciated!
Below are the relevant bits of the shutdown and startup logs,
Best wishes,
Crispin
----------------------
shutdown log (/var/log/messages):
May 28 15:43:35 shutdown: shutting down for system halt
May 28 15:43:35 init: Switching to runlevel: 0
May 28 15:43:36 server rhnsd[1694]: Exiting
May 28 15:43:36 server rhnsd: rhnsd shutdown succeeded
May 28 15:43:36 server atd: atd shutdown succeeded
May 28 15:43:36 server cups: cupsd shutdown succeeded
May 28 15:43:36 server xfs[1643]: terminating
May 28 15:43:36 server xfs: xfs shutdown succeeded
May 28 15:43:36 server mysqld: Stopping MySQL: succeeded
May 28 15:43:36 server gpm: gpm shutdown succeeded
May 28 15:43:37 server rhdb: Stopping PostgreSQL - Red Hat Edition service:
May 28 15:43:37 server su(pam_unix)[12400]: session opened for user postgres by (uid=0)
May 28 15:43:40 server su(pam_unix)[12400]: session closed for user postgres
May 28 15:43:40 server rhdb: ^[[60G[
May 28 15:43:40 server rhdb:
May 28 15:43:40 server rc: Stopping rhdb: succeeded
...
May 28 15:43:44 server kernel: Kernel logging (proc) stopped.
May 28 15:43:44 server kernel: Kernel log daemon terminating.
May 28 15:43:45 server syslog: klogd shutdown succeeded
May 28 15:43:45 server exiting on signal 15
May 28 16:13:35 server syslogd 1.4.1: restart.
-----
starting messages
Jun 1 10:43:55 server postgres[5537]: [30] LOG: database system shutdown was interrupted at 2004-05-28 16:32:08 BST
Jun 1 10:43:55 server postgres[5537]: [31] LOG: open of /var/lib/pgsql/data/pg_xlog/0000000000000000 (log file 0, segment 0) failed: No such file or directory
Jun 1 10:43:55 server postgres[5537]: [32] LOG: invalid primary checkpoint record
Jun 1 10:43:55 server postgres[5537]: [33] LOG: open of /var/lib/pgsql/data/pg_xlog/0000000000000000 (log file 0, segment 0) failed: No such file or directory
Jun 1 10:43:55 server postgres[5537]: [34] LOG: invalid secondary checkpoint record
Jun 1 10:43:55 server postgres[5537]: [35] PANIC: unable to locate a valid checkpoint record
Jun 1 10:43:55 server postgres[5534]: [31] LOG: startup process (pid 5537) was terminated by signal 6
Jun 1 10:43:55 server postgres[5534]: [32] LOG: aborting startup due to startup process failure
Jun 1 10:43:56 server rhdb: Starting PostgreSQL - Red Hat Edition service: failed
Jun 1 10:44:00 server su(pam_unix)[5554]: session opened for user postgres by (uid=0)
Jun 1 10:44:00 server su(pam_unix)[5554]: session closed for user postgres
Jun 1 10:44:00 server postgres[5595]: [30] LOG: database system shutdown was interrupted at 2004-05-28 16:32:08 BST
Jun 1 10:44:00 server postgres[5595]: [31] LOG: open of /var/lib/pgsql/data/pg_xlog/0000000000000000 (log file 0, segment 0) failed: No such file or directory
Jun 1 10:44:00 server postgres[5595]: [32] LOG: invalid primary checkpoint record
Jun 1 10:44:00 server postgres[5595]: [33] LOG: open of /var/lib/pgsql/data/pg_xlog/0000000000000000 (log file 0, segment 0) failed: No such file or directory
Jun 1 10:44:00 server postgres[5595]: [34] LOG: invalid secondary checkpoint record
Jun 1 10:44:00 server postgres[5595]: [35] PANIC: unable to locate a valid checkpoint record
Jun 1 10:44:00 server postgres[5592]: [31] LOG: startup process (pid 5595) was terminated by signal 6
Jun 1 10:44:00 server postgres[5592]: [32] LOG: aborting startup due to startup process failure
Jun 1 10:44:01 server rhdb: Starting PostgreSQL - Red Hat Edition service: failed
