The logic in the lock manager to track the number of held AccessExclusiveLocks (with ProcArrayIncrementNumHeldLocks and ProcArrayDecrementNumHeldLocks) seems to be broken. I added an Assertion into ProcArrayDecrementNumHeldLocks:
--- a/src/backend/storage/ipc/procarray.c +++ b/src/backend/storage/ipc/procarray.c @@ -1401,6 +1401,7 @@ ProcArrayIncrementNumHeldLocks(PGPROC *proc) void ProcArrayDecrementNumHeldLocks(PGPROC *proc) { + Assert(proc->numHeldLocks > 0); proc->numHeldLocks--; } This tripped the assertion: postgres=# CREATE TABLE foo (id int4 primary key); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "foo_pkey" for table "foo" server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. Making matters worse, the primary server refuses to startup up after that, tripping the assertion again in crash recovery: $ bin/postmaster -D data LOG: database system was interrupted while in recovery at 2009-09-23 11:56:15 EEST HINT: This probably means that some data is corrupted and you will have to use the last backup for recovery. LOG: database system was not properly shut down; automatic recovery in progress LOG: redo starts at 0/32000070 LOG: REDO @ 0/32000070; LSN 0/320000AC: prev 0/32000020; xid 0; len 32: Heap2 - clean: rel 1663/11562/1249; blk 32 remxid 4352 LOG: consistent recovery state reached LOG: REDO @ 0/320000AC; LSN 0/320000CC: prev 0/32000070; xid 0; len 4: XLOG - nextOid: 24600 LOG: REDO @ 0/320000CC; LSN 0/320000F4: prev 0/320000AC; xid 0; len 12: Storage - file create: base/11562/16408 LOG: REDO @ 0/320000F4; LSN 0/3200011C: prev 0/320000CC; xid 4364; len 12: Relation - exclusive relation lock: xid 4364 db 11562 rel 16408 LOG: REDO @ 0/3200011C; LSN 0/320001D8: prev 0/320000F4; xid 4364; len 159: Heap - insert: rel 1663/11562/1259; tid 5/4 ... LOG: REDO @ 0/32004754; LSN 0/32004878: prev 0/320046A8; xid 4364; len 264: Transaction - commit: 2009-09-23 11:55:51.888398+03; 15 inval msgs:catcache id38 catcache id37 catcache id38 catcache id37 catcache id38 catcache id37 catcache id7 catcache id6 catcache id26 smgr relcache smgr relcache smgr relcache TRAP: FailedAssertion("!(proc->numHeldLocks > 0)", File: "procarray.c", Line: 1404) LOG: startup process (PID 27430) was terminated by signal 6: Aborted LOG: aborting startup due to startup process failure I'm sure that's just a simple bug somewhere, but it highlights that we need be careful to avoid putting any extra work into the normal recovery path. Otherwise bugs in hot standby related code can cause crash recovery to fail. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers