Hello, [note: in pastes below, the only thing changes are database names and hostnames, for privacy reasons; otherwise plain cut'n'paste including typos...)
Very short version: I cancelled a VACUUM FULL; server crashed; won't
start again because it tries to access an obsolete WAL log file that
is likely to be from the point in time when VACUUM FULL started.
Long version:
The following refers to PostgreSQL 8.2.4 running on Linux 2.6.19-3 on
an old sarge system. I realize 8.2.4 is too old a release, but the bug
seems distinct enough that I felt I should definitely report it. I did
not find anything that looked relevant in the point release changelog
for 8.2.
Background is that I have a couple of databases on the system with >
1000 relations (if that is at all relevant). Because of autovacuum
issues (likely having to do with a too small max_fsm_relations) I
wanted to perform a VACUUM FULL.
I did, but realized I actually wanted VACUUM VERBOSE FULL. So after a
few seconds I hit ctrl-c in the psql client (run locally on the
machine as the postgres user; thus ident authed and connected as the
postgres PostgreSQL user). Output was:
contentsharing=# \timing
Timing is on.
contentsharing=# vacuum full
contentsharing-# ;
Cancel request sent
ERROR: canceling statement due to user request
PANIC: cannot abort transaction 830706491, it was already committed
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>
!> \q
After this I tried re-starting the database which failed (i.e., it was
not up after the start attempt). Checking the log file I had:
Aug 26 17:14:52 host postgres[5094]: [3116-1] user=,db= LOG: archived
transaction log file "0000000100000018000000ED"
Aug 26 17:16:06 host postgres[28467]: [2-1] user=postgres,db=databasename
ERROR: canceling statement due to user request
Aug 26 17:16:06 host postgres[28467]: [2-2] user=postgres,db=databasename
STATEMENT: vacuum full
Aug 26 17:16:06 host postgres[28467]: [2-3] ^I;
Aug 26 17:16:06 host postgres[28467]: [3-1] user=postgres,db=databasename
PANIC: cannot abort transaction 830706491, it was already committed
Aug 26 17:16:06 host postgres[5091]: [2-1] user=,db= LOG: server process (PID
28467) was terminated by signal 6
Aug 26 17:16:06 host postgres[5091]: [3-1] user=,db= LOG: terminating any
other active server processes
Aug 26 17:16:06 host postgres[16307]: [2792-1]
user=databasename,db=databasename WARNING: terminating connection because of
crash of another server process
Aug 26 17:16:06 host postgres[16307]: [2792-2]
user=databasename,db=databasename DETAIL: The postmaster has commanded this
server process to roll back the current
Aug 26 17:16:06 host postgres[16307]: [2792-3] transaction and exit, because
another server process exited abnormally and possibly corrupted shared memory.
Aug 26 17:16:06 host postgres[16307]: [2792-4]
user=databasename,db=databasename HINT: In a moment you should be able to
reconnect to the database and repeat your
Aug 26 17:16:06 host postgres[16307]: [2792-5] command.
Aug 26 17:16:06 host postgres[5091]: [4-1] user=,db= LOG: all server processes
terminated; reinitializing
Aug 26 17:16:07 host postgres[28483]: [5-1] user=postgres,db=databasename
FATAL: the database system is starting up
Aug 26 17:16:07 host postgres[28482]: [5-1] user=,db= LOG: database system was
interrupted at 2008-08-26 17:15:52 CEST
Aug 26 17:16:07 host postgres[28482]: [6-1] user=,db= LOG: could not open file
"pg_xlog/0000000100000018000000ED" (log file 24, segment 237): No such file or
directory
Aug 26 17:16:07 host postgres[28482]: [7-1] user=,db= LOG: invalid primary
checkpoint record
Aug 26 17:16:07 host postgres[28482]: [8-1] user=,db= LOG: could not open file
"pg_xlog/0000000100000018000000ED" (log file 24, segment 237): No such file or
directory
Aug 26 17:16:07 host postgres[28482]: [9-1] user=,db= LOG: invalid secondary
checkpoint record
Aug 26 17:16:07 host postgres[28482]: [10-1] user=,db= PANIC: could not locate
a valid checkpoint record
Aug 26 17:16:07 host postgres[5091]: [5-1] user=,db= LOG: startup process (PID
28482) was terminated by signal 6
Aug 26 17:16:07 host postgres[5091]: [6-1] user=,db= LOG: aborting startup due
to startup process failure
And the following is presumably from my manual start-up attempt (based on
timestamps):
Aug 26 17:17:11 host postgres[28511]: [1-1] user=,db= LOG: could not bind
socket for statistics collector: Cannot assign requested address
Aug 26 17:17:11 host postgres[28511]: [2-1] user=,db= LOG: disabling
statistics collector for lack of working socket
Aug 26 17:17:11 host postgres[28511]: [3-1] user=,db= WARNING: autovacuum not
started because of misconfiguration
Aug 26 17:17:11 host postgres[28511]: [3-2] user=,db= HINT: Enable options
"stats_start_collector" and "stats_row_level".
Aug 26 17:17:11 host postgres[28512]: [4-1] user=,db= LOG: database system was
interrupted at 2008-08-26 17:15:52 CEST
Aug 26 17:17:11 host postgres[28512]: [5-1] user=,db= LOG: could not open file
"pg_xlog/0000000100000018000000ED" (log file 24, segment 237): No such file or
directory
Aug 26 17:17:11 host postgres[28512]: [6-1] user=,db= LOG: invalid primary
checkpoint record
Aug 26 17:17:11 host postgres[28512]: [7-1] user=,db= LOG: could not open file
"pg_xlog/0000000100000018000000ED" (log file 24, segment 237): No such file or
directory
Aug 26 17:17:11 host postgres[28512]: [8-1] user=,db= LOG: invalid secondary
checkpoint record
Aug 26 17:17:11 host postgres[28512]: [9-1] user=,db= PANIC: could not locate
a valid checkpoint record
Aug 26 17:17:11 host postgres[28511]: [4-1] user=,db= LOG: startup process
(PID 28512) was terminated by signal 6
Aug 26 17:17:11 host postgres[28511]: [5-1] user=,db= LOG: aborting startup
due to startup process failure
The pg_xlog directory contains:
-rw------- 1 postgres postgres 16777216 Aug 26 17:16 0000000100000018000000EE
drwx------ 2 postgres postgres 305232 Aug 26 17:14 archive_status
Note that the archival of the ....ED xlog file started at 17:14:52,
and I cancelled the VACUUM FULL at 17:16:06.
It is entirely likely, though I cannot say for sure, that I started
the VACUUM FULL prior to the archival of ...ED.
I still have the entire pgsql database directory untouched except for
the manual start-up attempt. I am afraid I cannot dump the thing
somewhere for inspection (it contains confidential information), but I
can assist in inspecting the contents.
--
/ Peter Schuller
PGP userID: 0xE9758B7D or 'Peter Schuller <[EMAIL PROTECTED]>'
Key retrieval: Send an E-Mail to [EMAIL PROTECTED]
E-Mail: [EMAIL PROTECTED] Web: http://www.scode.org
pgpooalCAihTU.pgp
Description: PGP signature
