Hi, On 2014-02-24 17:55:14 -0300, Alvaro Herrera wrote: > Greg Stark wrote: > > I have a database where a a couple rows don't appear in index scans > > but do appear in sequential scans. It looks like the same problem as > > Peter reported but this is a different database. I've extracted all > > the xlogdump records and below are the ones I think are relevant. You > > can see that lp 2 gets a few HOT updates and concurrently has someone > > create a MultiXact NO KEY UPDATE lock while one of those HOT updates > > is pending but not committed.
Per se the sequence of records doesn't look bad (even though I am not happy that we log intermediate and final rows first, and only then the start of the chain). > > The net result seems to be that the ctid > > update chain got broken. The index of course points to the head of the > > HOT chain so it doesn't find the live tail whereas the sequential scan > > picks it up. Yea, that's the problem. > > rmgr: Heap len (rec/tot): 291/ 323, tx: 5943849, lsn: > > FD/2F0ADFC0, prev FD/2F0ADF90, bkp: 0000, desc: hot_update: rel > > 1663/16385/212653; tid 13065/2 xmax 5943849 ; new tid 13065/3 xmax 0 > > rmgr: Heap2 len (rec/tot): 25/ 57, tx: 5943851, lsn: > > FD/2F0AE450, prev FD/2F0AE408, bkp: 0000, desc: lock updated: xmax 5943851 > > msk 000a; rel 1663/16385/212653; tid 13065/3 > > rmgr: MultiXact len (rec/tot): 28/ 60, tx: 5943851, lsn: > > FD/2F0AE490, prev FD/2F0AE450, bkp: 0000, desc: create mxid 728896 offset > > 1632045 nmembers 2: 5943849 (nokeyupd) 5943851 (keysh) > > rmgr: Heap len (rec/tot): 25/ 57, tx: 5943851, lsn: > > FD/2F0AE4D0, prev FD/2F0AE490, bkp: 0000, desc: lock 728896: rel > > 1663/16385/212653; tid 13065/2 IS_MULTI EXCL_LOCK > > lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid > > | t_infomask2 | t_infomask | t_hoff | > > ----+--------+----------+--------+---------+---------+----------+------------+-------------+------------+--------+- > > 2 | 3424 | 1 | 232 | 5943845 | 728896 | 0 | (13065,2) > > | 32 | 4419 | 32 | > > 3 | 3152 | 1 | 272 | 5943849 | 5943879 | 0 | (13065,4) > > | 49184 | 9475 | 32 | > > 4 | 2864 | 1 | 287 | 5943879 | 5943880 | 0 | (13065,7) > > | 49184 | 9475 | 32 | > > 7 | 2576 | 1 | 287 | 5943880 | 0 | 0 | (13065,7) > > | 32800 | 10499 | 32 | Those together explain the story. Note this bit: static void heap_xlog_lock(XLogRecPtr lsn, XLogRecord *record) { ... HeapTupleHeaderClearHotUpdated(htup); HeapTupleHeaderSetXmax(htup, xlrec->locking_xid); HeapTupleHeaderSetCmax(htup, FirstCommandId, false); /* Make sure there is no forward chain link in t_ctid */ htup->t_ctid = xlrec->target.tid; ... } So, the replay of FD/2F0AE4D0 breaks the ctid chain *and* unsets the HOT_UPDATED flag. Which means fkey locks have never properly worked across SR/crash recovery. Haven't thought about how to fix it yet, I hope won't have to (hint hint). We somehow need to have a policy of testing changes to the WAL format without full_page_writes. They hide bugs in replay far, far too often. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers