On Wed, Nov 24, 2010 at 1:27 PM, Fujii Masao <masao.fu...@gmail.com> wrote: > Hi, > > http://archives.postgresql.org/pgsql-hackers/2010-11/msg01303.php > > When I did unusual operations (e.g., suspend bgwriter by gdb, > pgbench -i and issue txid_current many times) on the master > in order to try to reproduce the above HS error, I encountered > the following assertion error. > > Since I compiled the standby postgres with WAL_DEBUG and > ran it with wal_debug = on, all the replayed WAL records were > logged. > > ------------------------ > sby LOG: REDO @ 0/134C0490; LSN 0/134C04D0: prev 0/134C0450; xid > 23253; len 32: Transaction - commit: 2010-11-24 12:15:02.315634+09 > sby LOG: REDO @ 0/134C04D0; LSN 0/134C0510: prev 0/134C0490; xid > 23254; len 32: Transaction - commit: 2010-11-24 12:15:02.325252+09 > sby LOG: consistent recovery state reached at 0/134C0510 > sby LOG: REDO @ 0/134C0510; LSN 0/134C0550: prev 0/134C04D0; xid > 23255; len 32: Transaction - commit: 2010-11-24 12:15:09.224343+09 > sby LOG: REDO @ 0/134C0550; LSN 0/134C0580: prev 0/134C0510; xid 0; > len 16: Standby - AccessExclusive locks: xid 0 db 11910 rel 16409 > sby LOG: REDO @ 0/134C0580; LSN 0/134C05B8: prev 0/134C0550; xid 0; > len 20: Standby - running xacts: nextXid 23256 latestCompletedXid > 23255 oldestRunningXid 23256 > TRAP: FailedAssertion("!(((xid) != ((TransactionId) 0)))", File: > "twophase.c", Line: 1209) > sby LOG: database system is ready to accept read only connections > sby LOG: startup process (PID 32666) was terminated by signal 6: Aborted > sby LOG: terminating any other active server processes > ------------------------ > > Does anyone know what the cause of the problem is?
I was able to reproduce this problem. This happens because CHECKPOINT can write the WAL record indicating that the transaction with XID = 0 has taken the AccessExclusive lock. This WAL record causes that assertion failure in the standby. Here is the procedure to reproduce the problem: ------------------- 1. Execute "DROP TABLE" and suspend the execution before calling RemoveRelations -> LockRelationOid -> LockAcquire -> LockAcquireExtended -> LogAccessExclusiveLock by, for example, using gdb. 2. While "DROP TABLE" is being suspended, execute CHECKPOINT. This CHECKPOINT will generate the above-mentioned WAL record. ------------------- To solve the problem, ISTM that XID should be assigned before the information about AccessExclusive lock becomes visible to another process. Or CHECKPOINT (i.e., GetRunningTransactionLocks) should ignore the locks with XID = 0. Thought? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers