Tom Lane wrote:
Heikki Linnakangas <[EMAIL PROTECTED]> writes:
Ok, I think I know what's happening. In btbulkdelete we have a PG_TRY-CATCH block. In the try-block, we call _bt_start_vacuum which acquires and releases the BtreeVacuumLock. Under certain error conditions, _bt_start_vacuum calls elog(ERROR) while holding the BtreeVacuumLock. The PG_CATCH block calls _bt_end_vacuum which also tries to acquire BtreeVacuumLock.

This is definitely a bug (I unfortunately didn't see your message until
after I'd replicated your reasoning...) but the word from Shuttleworth
is that he doesn't see either of those messages in his postmaster log.
So it seems we need another theory.  I haven't a clue at the moment though.

The error message never makes it to the log. The deadlock occurs in the PG_CATCH-block, before rethrowing and printing the error. I added an unconditional elog(ERROR) in _bt_start_vacuum to test it, and I'm getting the same hang with no message in the log.

The unsafe elog while holding a lwlock pattern in _bt_vacuum_start needs to be fixed, patch attached. We still need to figure out what's causing the error in the first place. With the patch, we should at least get a proper error message and not hang when the error occurs.

Martin: Would it be possible for you to reproduce the problem with a patched version?

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
Index: src/backend/access/nbtree/nbtutils.c
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/nbtree/nbtutils.c,v
retrieving revision 1.79
diff -c -r1.79 nbtutils.c
*** src/backend/access/nbtree/nbtutils.c	4 Oct 2006 00:29:49 -0000	1.79
--- src/backend/access/nbtree/nbtutils.c	30 Mar 2007 07:55:36 -0000
***************
*** 998,1016 ****
--- 998,1023 ----
  		vac = &btvacinfo->vacuums[i];
  		if (vac->relid.relId == rel->rd_lockInfo.lockRelId.relId &&
  			vac->relid.dbId == rel->rd_lockInfo.lockRelId.dbId)
+ 		{
+ 			LWLockRelease(BtreeVacuumLock);
  			elog(ERROR, "multiple active vacuums for index \"%s\"",
  				 RelationGetRelationName(rel));
+ 		}
  	}
  
  	/* OK, add an entry */
  	if (btvacinfo->num_vacuums >= btvacinfo->max_vacuums)
+ 	{
+ 		LWLockRelease(BtreeVacuumLock);
  		elog(ERROR, "out of btvacinfo slots");
+ 	}
  	vac = &btvacinfo->vacuums[btvacinfo->num_vacuums];
  	vac->relid = rel->rd_lockInfo.lockRelId;
  	vac->cycleid = result;
  	btvacinfo->num_vacuums++;
  
  	LWLockRelease(BtreeVacuumLock);
+ 
  	return result;
  }
  
---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Reply via email to