Tom Lane wrote:
Heikki Linnakangas <[EMAIL PROTECTED]> writes:
Ok, I think I know what's happening. In btbulkdelete we have a
PG_TRY-CATCH block. In the try-block, we call _bt_start_vacuum which
acquires and releases the BtreeVacuumLock. Under certain error
conditions, _bt_start_vacuum calls elog(ERROR) while holding the
BtreeVacuumLock. The PG_CATCH block calls _bt_end_vacuum which also
tries to acquire BtreeVacuumLock.
This is definitely a bug (I unfortunately didn't see your message until
after I'd replicated your reasoning...) but the word from Shuttleworth
is that he doesn't see either of those messages in his postmaster log.
So it seems we need another theory. I haven't a clue at the moment though.
The error message never makes it to the log. The deadlock occurs in the
PG_CATCH-block, before rethrowing and printing the error. I added an
unconditional elog(ERROR) in _bt_start_vacuum to test it, and I'm
getting the same hang with no message in the log.
The unsafe elog while holding a lwlock pattern in _bt_vacuum_start needs
to be fixed, patch attached. We still need to figure out what's causing
the error in the first place. With the patch, we should at least get a
proper error message and not hang when the error occurs.
Martin: Would it be possible for you to reproduce the problem with a
patched version?
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Index: src/backend/access/nbtree/nbtutils.c
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/nbtree/nbtutils.c,v
retrieving revision 1.79
diff -c -r1.79 nbtutils.c
*** src/backend/access/nbtree/nbtutils.c 4 Oct 2006 00:29:49 -0000 1.79
--- src/backend/access/nbtree/nbtutils.c 30 Mar 2007 07:55:36 -0000
***************
*** 998,1016 ****
--- 998,1023 ----
vac = &btvacinfo->vacuums[i];
if (vac->relid.relId == rel->rd_lockInfo.lockRelId.relId &&
vac->relid.dbId == rel->rd_lockInfo.lockRelId.dbId)
+ {
+ LWLockRelease(BtreeVacuumLock);
elog(ERROR, "multiple active vacuums for index \"%s\"",
RelationGetRelationName(rel));
+ }
}
/* OK, add an entry */
if (btvacinfo->num_vacuums >= btvacinfo->max_vacuums)
+ {
+ LWLockRelease(BtreeVacuumLock);
elog(ERROR, "out of btvacinfo slots");
+ }
vac = &btvacinfo->vacuums[btvacinfo->num_vacuums];
vac->relid = rel->rd_lockInfo.lockRelId;
vac->cycleid = result;
btvacinfo->num_vacuums++;
LWLockRelease(BtreeVacuumLock);
+
return result;
}
---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster