Hello,

[note: in pastes below, the only thing changes are database names and
hostnames, for privacy reasons; otherwise plain cut'n'paste including
typos...)

Very short version: I cancelled a VACUUM FULL; server crashed; won't
start again because it tries to access an obsolete WAL log file that
is likely to be from the point in time when VACUUM FULL started.

Long version:

The following refers to PostgreSQL 8.2.4 running on Linux 2.6.19-3 on
an old sarge system. I realize 8.2.4 is too old a release, but the bug
seems distinct enough that I felt I should definitely report it. I did
not find anything that looked relevant in the point release changelog
for 8.2.

Background is that I have a couple of databases on the system with >
1000 relations (if that is at all relevant). Because of autovacuum
issues (likely having to do with a too small max_fsm_relations) I
wanted to perform a VACUUM FULL.

I did, but realized I actually wanted VACUUM VERBOSE FULL. So after a
few seconds I hit ctrl-c in the psql client (run locally on the
machine as the postgres user; thus ident authed and connected as the
postgres PostgreSQL user). Output was:

contentsharing=# \timing
Timing is on.
contentsharing=# vacuum full
contentsharing-# ;
Cancel request sent
ERROR:  canceling statement due to user request
PANIC:  cannot abort transaction 830706491, it was already committed
server closed the connection unexpectedly
       This probably means the server terminated abnormally
       before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!> 
!> \q

After this I tried re-starting the database which failed (i.e., it was
not up after the start attempt). Checking the log file I had:

Aug 26 17:14:52 host postgres[5094]: [3116-1] user=,db= LOG:  archived 
transaction log file "0000000100000018000000ED"
Aug 26 17:16:06 host postgres[28467]: [2-1] user=postgres,db=databasename 
ERROR:  canceling statement due to user request
Aug 26 17:16:06 host postgres[28467]: [2-2] user=postgres,db=databasename 
STATEMENT:  vacuum full
Aug 26 17:16:06 host postgres[28467]: [2-3] ^I;
Aug 26 17:16:06 host postgres[28467]: [3-1] user=postgres,db=databasename 
PANIC:  cannot abort transaction 830706491, it was already committed
Aug 26 17:16:06 host postgres[5091]: [2-1] user=,db= LOG:  server process (PID 
28467) was terminated by signal 6
Aug 26 17:16:06 host postgres[5091]: [3-1] user=,db= LOG:  terminating any 
other active server processes
Aug 26 17:16:06 host postgres[16307]: [2792-1] 
user=databasename,db=databasename WARNING:  terminating connection because of 
crash of another server process
Aug 26 17:16:06 host postgres[16307]: [2792-2] 
user=databasename,db=databasename DETAIL:  The postmaster has commanded this 
server process to roll back the current
Aug 26 17:16:06 host postgres[16307]: [2792-3]  transaction and exit, because 
another server process exited abnormally and possibly corrupted shared memory.
Aug 26 17:16:06 host postgres[16307]: [2792-4] 
user=databasename,db=databasename HINT:  In a moment you should be able to 
reconnect to the database and repeat your
Aug 26 17:16:06 host postgres[16307]: [2792-5]  command.
Aug 26 17:16:06 host postgres[5091]: [4-1] user=,db= LOG:  all server processes 
terminated; reinitializing
Aug 26 17:16:07 host postgres[28483]: [5-1] user=postgres,db=databasename 
FATAL:  the database system is starting up
Aug 26 17:16:07 host postgres[28482]: [5-1] user=,db= LOG:  database system was 
interrupted at 2008-08-26 17:15:52 CEST
Aug 26 17:16:07 host postgres[28482]: [6-1] user=,db= LOG:  could not open file 
"pg_xlog/0000000100000018000000ED" (log file 24, segment 237): No such file or 
directory
Aug 26 17:16:07 host postgres[28482]: [7-1] user=,db= LOG:  invalid primary 
checkpoint record
Aug 26 17:16:07 host postgres[28482]: [8-1] user=,db= LOG:  could not open file 
"pg_xlog/0000000100000018000000ED" (log file 24, segment 237): No such file or 
directory
Aug 26 17:16:07 host postgres[28482]: [9-1] user=,db= LOG:  invalid secondary 
checkpoint record
Aug 26 17:16:07 host postgres[28482]: [10-1] user=,db= PANIC:  could not locate 
a valid checkpoint record
Aug 26 17:16:07 host postgres[5091]: [5-1] user=,db= LOG:  startup process (PID 
28482) was terminated by signal 6
Aug 26 17:16:07 host postgres[5091]: [6-1] user=,db= LOG:  aborting startup due 
to startup process failure

And the following is presumably from my manual start-up attempt (based on 
timestamps):

Aug 26 17:17:11 host postgres[28511]: [1-1] user=,db= LOG:  could not bind 
socket for statistics collector: Cannot assign requested address
Aug 26 17:17:11 host postgres[28511]: [2-1] user=,db= LOG:  disabling 
statistics collector for lack of working socket
Aug 26 17:17:11 host postgres[28511]: [3-1] user=,db= WARNING:  autovacuum not 
started because of misconfiguration
Aug 26 17:17:11 host postgres[28511]: [3-2] user=,db= HINT:  Enable options 
"stats_start_collector" and "stats_row_level".
Aug 26 17:17:11 host postgres[28512]: [4-1] user=,db= LOG:  database system was 
interrupted at 2008-08-26 17:15:52 CEST
Aug 26 17:17:11 host postgres[28512]: [5-1] user=,db= LOG:  could not open file 
"pg_xlog/0000000100000018000000ED" (log file 24, segment 237): No such file or 
directory
Aug 26 17:17:11 host postgres[28512]: [6-1] user=,db= LOG:  invalid primary 
checkpoint record
Aug 26 17:17:11 host postgres[28512]: [7-1] user=,db= LOG:  could not open file 
"pg_xlog/0000000100000018000000ED" (log file 24, segment 237): No such file or 
directory
Aug 26 17:17:11 host postgres[28512]: [8-1] user=,db= LOG:  invalid secondary 
checkpoint record
Aug 26 17:17:11 host postgres[28512]: [9-1] user=,db= PANIC:  could not locate 
a valid checkpoint record
Aug 26 17:17:11 host postgres[28511]: [4-1] user=,db= LOG:  startup process 
(PID 28512) was terminated by signal 6
Aug 26 17:17:11 host postgres[28511]: [5-1] user=,db= LOG:  aborting startup 
due to startup process failure

The pg_xlog directory contains:

-rw-------  1 postgres postgres 16777216 Aug 26 17:16 0000000100000018000000EE
drwx------  2 postgres postgres   305232 Aug 26 17:14 archive_status

Note that the archival of the ....ED xlog file started at 17:14:52,
and I cancelled the VACUUM FULL at 17:16:06.

It is entirely likely, though I cannot say for sure, that I started
the VACUUM FULL prior to the archival of ...ED.

I still have the entire pgsql database directory untouched except for
the manual start-up attempt. I am afraid I cannot dump the thing
somewhere for inspection (it contains confidential information), but I
can assist in inspecting the contents.

-- 
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller <[EMAIL PROTECTED]>'
Key retrieval: Send an E-Mail to [EMAIL PROTECTED]
E-Mail: [EMAIL PROTECTED] Web: http://www.scode.org

Attachment: pgpooalCAihTU.pgp
Description: PGP signature

Reply via email to