Hello, [note: in pastes below, the only thing changes are database names and hostnames, for privacy reasons; otherwise plain cut'n'paste including typos...)
Very short version: I cancelled a VACUUM FULL; server crashed; won't start again because it tries to access an obsolete WAL log file that is likely to be from the point in time when VACUUM FULL started. Long version: The following refers to PostgreSQL 8.2.4 running on Linux 2.6.19-3 on an old sarge system. I realize 8.2.4 is too old a release, but the bug seems distinct enough that I felt I should definitely report it. I did not find anything that looked relevant in the point release changelog for 8.2. Background is that I have a couple of databases on the system with > 1000 relations (if that is at all relevant). Because of autovacuum issues (likely having to do with a too small max_fsm_relations) I wanted to perform a VACUUM FULL. I did, but realized I actually wanted VACUUM VERBOSE FULL. So after a few seconds I hit ctrl-c in the psql client (run locally on the machine as the postgres user; thus ident authed and connected as the postgres PostgreSQL user). Output was: contentsharing=# \timing Timing is on. contentsharing=# vacuum full contentsharing-# ; Cancel request sent ERROR: canceling statement due to user request PANIC: cannot abort transaction 830706491, it was already committed server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. !> !> \q After this I tried re-starting the database which failed (i.e., it was not up after the start attempt). Checking the log file I had: Aug 26 17:14:52 host postgres[5094]: [3116-1] user=,db= LOG: archived transaction log file "0000000100000018000000ED" Aug 26 17:16:06 host postgres[28467]: [2-1] user=postgres,db=databasename ERROR: canceling statement due to user request Aug 26 17:16:06 host postgres[28467]: [2-2] user=postgres,db=databasename STATEMENT: vacuum full Aug 26 17:16:06 host postgres[28467]: [2-3] ^I; Aug 26 17:16:06 host postgres[28467]: [3-1] user=postgres,db=databasename PANIC: cannot abort transaction 830706491, it was already committed Aug 26 17:16:06 host postgres[5091]: [2-1] user=,db= LOG: server process (PID 28467) was terminated by signal 6 Aug 26 17:16:06 host postgres[5091]: [3-1] user=,db= LOG: terminating any other active server processes Aug 26 17:16:06 host postgres[16307]: [2792-1] user=databasename,db=databasename WARNING: terminating connection because of crash of another server process Aug 26 17:16:06 host postgres[16307]: [2792-2] user=databasename,db=databasename DETAIL: The postmaster has commanded this server process to roll back the current Aug 26 17:16:06 host postgres[16307]: [2792-3] transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. Aug 26 17:16:06 host postgres[16307]: [2792-4] user=databasename,db=databasename HINT: In a moment you should be able to reconnect to the database and repeat your Aug 26 17:16:06 host postgres[16307]: [2792-5] command. Aug 26 17:16:06 host postgres[5091]: [4-1] user=,db= LOG: all server processes terminated; reinitializing Aug 26 17:16:07 host postgres[28483]: [5-1] user=postgres,db=databasename FATAL: the database system is starting up Aug 26 17:16:07 host postgres[28482]: [5-1] user=,db= LOG: database system was interrupted at 2008-08-26 17:15:52 CEST Aug 26 17:16:07 host postgres[28482]: [6-1] user=,db= LOG: could not open file "pg_xlog/0000000100000018000000ED" (log file 24, segment 237): No such file or directory Aug 26 17:16:07 host postgres[28482]: [7-1] user=,db= LOG: invalid primary checkpoint record Aug 26 17:16:07 host postgres[28482]: [8-1] user=,db= LOG: could not open file "pg_xlog/0000000100000018000000ED" (log file 24, segment 237): No such file or directory Aug 26 17:16:07 host postgres[28482]: [9-1] user=,db= LOG: invalid secondary checkpoint record Aug 26 17:16:07 host postgres[28482]: [10-1] user=,db= PANIC: could not locate a valid checkpoint record Aug 26 17:16:07 host postgres[5091]: [5-1] user=,db= LOG: startup process (PID 28482) was terminated by signal 6 Aug 26 17:16:07 host postgres[5091]: [6-1] user=,db= LOG: aborting startup due to startup process failure And the following is presumably from my manual start-up attempt (based on timestamps): Aug 26 17:17:11 host postgres[28511]: [1-1] user=,db= LOG: could not bind socket for statistics collector: Cannot assign requested address Aug 26 17:17:11 host postgres[28511]: [2-1] user=,db= LOG: disabling statistics collector for lack of working socket Aug 26 17:17:11 host postgres[28511]: [3-1] user=,db= WARNING: autovacuum not started because of misconfiguration Aug 26 17:17:11 host postgres[28511]: [3-2] user=,db= HINT: Enable options "stats_start_collector" and "stats_row_level". Aug 26 17:17:11 host postgres[28512]: [4-1] user=,db= LOG: database system was interrupted at 2008-08-26 17:15:52 CEST Aug 26 17:17:11 host postgres[28512]: [5-1] user=,db= LOG: could not open file "pg_xlog/0000000100000018000000ED" (log file 24, segment 237): No such file or directory Aug 26 17:17:11 host postgres[28512]: [6-1] user=,db= LOG: invalid primary checkpoint record Aug 26 17:17:11 host postgres[28512]: [7-1] user=,db= LOG: could not open file "pg_xlog/0000000100000018000000ED" (log file 24, segment 237): No such file or directory Aug 26 17:17:11 host postgres[28512]: [8-1] user=,db= LOG: invalid secondary checkpoint record Aug 26 17:17:11 host postgres[28512]: [9-1] user=,db= PANIC: could not locate a valid checkpoint record Aug 26 17:17:11 host postgres[28511]: [4-1] user=,db= LOG: startup process (PID 28512) was terminated by signal 6 Aug 26 17:17:11 host postgres[28511]: [5-1] user=,db= LOG: aborting startup due to startup process failure The pg_xlog directory contains: -rw------- 1 postgres postgres 16777216 Aug 26 17:16 0000000100000018000000EE drwx------ 2 postgres postgres 305232 Aug 26 17:14 archive_status Note that the archival of the ....ED xlog file started at 17:14:52, and I cancelled the VACUUM FULL at 17:16:06. It is entirely likely, though I cannot say for sure, that I started the VACUUM FULL prior to the archival of ...ED. I still have the entire pgsql database directory untouched except for the manual start-up attempt. I am afraid I cannot dump the thing somewhere for inspection (it contains confidential information), but I can assist in inspecting the contents. -- / Peter Schuller PGP userID: 0xE9758B7D or 'Peter Schuller <[EMAIL PROTECTED]>' Key retrieval: Send an E-Mail to [EMAIL PROTECTED] E-Mail: [EMAIL PROTECTED] Web: http://www.scode.org
pgpooalCAihTU.pgp
Description: PGP signature