Tom Lane wrote:
I wrote:
Heikki Linnakangas <[EMAIL PROTECTED]> writes:
Dave, would you please create a new binary with the attached patch? And
LOCK_DEBUG and assertions and debug enabled.
Also, it would be worth adding "lockmode" to the set of things printed
by the panic message in the patch I sent earlier.
Also: as long as we are building a custom-hacked executable to probe
into this, let's hack it to not remove the 2PC state file, so we can
double check what's really in there. I believe what you'd need to
remove is the RemoveTwoPhaseFile calls at twophase.c line 1583 (where
it thinks it's "stale") and xact.c line 4223 (where it's replaying a
XLOG_XACT_COMMIT_PREPARED WAL record).
Yeah, sounds like a good idea.
Patch attached that incorporates all the ideas this far:
1. More verbose PANIC message, including lockmode
2. More debug info in AtPrepare_Locks. I even put a DumpLocks call in
it, that should give us a good picture of what's in the lock structures
at the time of commit
3. Instead of removing twophase-file in recovery, rename it to
*.removed. (it will be ignored by postgresql after that, because it
doesn't follow the normal naming rules of 2PC state files)
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Index: src/backend/access/transam/twophase.c
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/transam/twophase.c,v
retrieving revision 1.25.2.1
diff -c -r1.25.2.1 twophase.c
*** src/backend/access/transam/twophase.c 13 Feb 2007 19:39:48 -0000 1.25.2.1
--- src/backend/access/transam/twophase.c 23 Apr 2007 21:58:29 -0000
***************
*** 1258,1263 ****
--- 1258,1276 ----
char path[MAXPGPATH];
TwoPhaseFilePath(path, xid);
+
+ if (InRecovery)
+ {
+ char newpath[MAXPGPATH+10];
+ sprintf(newpath, "%s.removed", path);
+ if(rename(path, newpath))
+ if (errno != ENOENT || giveWarning)
+ ereport(WARNING,
+ (errcode_for_file_access(),
+ errmsg("could not remove two-phase state file \"%s\": %m",
+ path)));
+ }
+ else
if (unlink(path))
if (errno != ENOENT || giveWarning)
ereport(WARNING,
Index: src/backend/storage/lmgr/lock.c
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/storage/lmgr/lock.c,v
retrieving revision 1.174
diff -c -r1.174 lock.c
*** src/backend/storage/lmgr/lock.c 4 Oct 2006 00:29:57 -0000 1.174
--- src/backend/storage/lmgr/lock.c 23 Apr 2007 21:52:23 -0000
***************
*** 1796,1801 ****
--- 1796,1817 ----
HASH_SEQ_STATUS status;
LOCALLOCK *locallock;
+ #ifdef LOCK_DEBUG
+ {
+ int i;
+ /*
+ * Must grab LWLocks in partition-number order to avoid LWLock deadlock.
+ */
+ for (i = 0; i < NUM_LOCK_PARTITIONS; i++)
+ LWLockAcquire(FirstLockMgrLock + i, LW_SHARED);
+
+ DumpLocks(MyProc);
+
+ for (i = 0; i < NUM_LOCK_PARTITIONS; i++)
+ LWLockRelease(FirstLockMgrLock + i);
+ }
+ #endif
+
/*
* We don't need to touch shared memory for this --- all the necessary
* state information is in the locallock table.
***************
*** 1830,1835 ****
--- 1846,1854 ----
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot PREPARE a transaction that has operated on temporary tables")));
+ PROCLOCK_PRINT("AtPrepare_Locks", locallock->proclock);
+ LOCK_PRINT("AtPrepare_Locks", locallock->lock, locallock->tag.mode);
+
/*
* Create a 2PC record.
*/
***************
*** 2430,2436 ****
HASH_FIND,
NULL);
if (!lock)
! elog(PANIC, "failed to re-find shared lock object");
/*
* Re-find the proclock object (ditto).
--- 2449,2462 ----
HASH_FIND,
NULL);
if (!lock)
! elog(PANIC, "failed to re-find shared lock object: %u %u %u %u %u %u, mode %s",
! locktag->locktag_field1,
! locktag->locktag_field2,
! locktag->locktag_field3,
! locktag->locktag_field4,
! locktag->locktag_type,
! locktag->locktag_lockmethodid,
! LockMethods[lockmethodid]->lockModeNames[lockmode]);
/*
* Re-find the proclock object (ditto).
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings