Tom Lane wrote:
I wrote:
Heikki Linnakangas <[EMAIL PROTECTED]> writes:
Dave, would you please create a new binary with the attached patch? And LOCK_DEBUG and assertions and debug enabled.

Also, it would be worth adding "lockmode" to the set of things printed
by the panic message in the patch I sent earlier.

Also: as long as we are building a custom-hacked executable to probe
into this, let's hack it to not remove the 2PC state file, so we can
double check what's really in there.  I believe what you'd need to
remove is the RemoveTwoPhaseFile calls at twophase.c line 1583 (where
it thinks it's "stale") and xact.c line 4223 (where it's replaying a XLOG_XACT_COMMIT_PREPARED WAL record).

Yeah, sounds like a good idea.

Patch attached that incorporates all the ideas this far:

1. More verbose PANIC message, including lockmode
2. More debug info in AtPrepare_Locks. I even put a DumpLocks call in it, that should give us a good picture of what's in the lock structures at the time of commit 3. Instead of removing twophase-file in recovery, rename it to *.removed. (it will be ignored by postgresql after that, because it doesn't follow the normal naming rules of 2PC state files)

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
Index: src/backend/access/transam/twophase.c
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/transam/twophase.c,v
retrieving revision 1.25.2.1
diff -c -r1.25.2.1 twophase.c
*** src/backend/access/transam/twophase.c	13 Feb 2007 19:39:48 -0000	1.25.2.1
--- src/backend/access/transam/twophase.c	23 Apr 2007 21:58:29 -0000
***************
*** 1258,1263 ****
--- 1258,1276 ----
  	char		path[MAXPGPATH];
  
  	TwoPhaseFilePath(path, xid);
+ 
+ 	if (InRecovery)
+ 	{
+ 		char newpath[MAXPGPATH+10];
+ 		sprintf(newpath, "%s.removed", path);
+ 		if(rename(path, newpath))
+ 			if (errno != ENOENT || giveWarning)
+ 				ereport(WARNING,
+ 						(errcode_for_file_access(),
+ 						 errmsg("could not remove two-phase state file \"%s\": %m",
+ 								path)));
+ 	}
+ 	else
  	if (unlink(path))
  		if (errno != ENOENT || giveWarning)
  			ereport(WARNING,
Index: src/backend/storage/lmgr/lock.c
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/storage/lmgr/lock.c,v
retrieving revision 1.174
diff -c -r1.174 lock.c
*** src/backend/storage/lmgr/lock.c	4 Oct 2006 00:29:57 -0000	1.174
--- src/backend/storage/lmgr/lock.c	23 Apr 2007 21:52:23 -0000
***************
*** 1796,1801 ****
--- 1796,1817 ----
  	HASH_SEQ_STATUS status;
  	LOCALLOCK  *locallock;
  
+ #ifdef LOCK_DEBUG
+  {
+ 	int i;
+ 	/*
+ 	 * Must grab LWLocks in partition-number order to avoid LWLock deadlock.
+ 	 */
+ 	for (i = 0; i < NUM_LOCK_PARTITIONS; i++)
+ 		LWLockAcquire(FirstLockMgrLock + i, LW_SHARED);
+ 
+ 	DumpLocks(MyProc);
+ 
+ 	for (i = 0; i < NUM_LOCK_PARTITIONS; i++)
+ 		LWLockRelease(FirstLockMgrLock + i);
+  }
+ #endif
+ 
  	/*
  	 * We don't need to touch shared memory for this --- all the necessary
  	 * state information is in the locallock table.
***************
*** 1830,1835 ****
--- 1846,1854 ----
  					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
  					 errmsg("cannot PREPARE a transaction that has operated on temporary tables")));
  
+ 		PROCLOCK_PRINT("AtPrepare_Locks", locallock->proclock);
+ 		LOCK_PRINT("AtPrepare_Locks", locallock->lock, locallock->tag.mode);
+ 
  		/*
  		 * Create a 2PC record.
  		 */
***************
*** 2430,2436 ****
  												HASH_FIND,
  												NULL);
  	if (!lock)
! 		elog(PANIC, "failed to re-find shared lock object");
  
  	/*
  	 * Re-find the proclock object (ditto).
--- 2449,2462 ----
  												HASH_FIND,
  												NULL);
  	if (!lock)
!  		elog(PANIC, "failed to re-find shared lock object: %u %u %u %u %u %u, mode %s",
! 			 locktag->locktag_field1,
! 			 locktag->locktag_field2,
! 			 locktag->locktag_field3,
! 			 locktag->locktag_field4,
! 			 locktag->locktag_type,
! 			 locktag->locktag_lockmethodid,
! 			 LockMethods[lockmethodid]->lockModeNames[lockmode]);
  
  	/*
  	 * Re-find the proclock object (ditto).
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Reply via email to