On Friday, September 21, 2012 03:30:31 PM Marko Tiikkaja wrote: > On 9/20/12 11:55 PM, Andres Freund wrote: > > On Monday, September 17, 2012 03:58:37 PM Tom Lane wrote: > >> OK, that explains why we've not seen a blizzard of trouble reports. > >> Still seems like a good idea to fix it ASAP, though. > > > > Btw, I think RhodiumToad/Andrew Gierth and I some time ago helped a user > > in the IRC Channel that had symptoms matching this bug. > > Another such user reporting in. :-( > > Our slave started accumulating WAL files and ran out of disk space > yesterday. After investigation from Andres and Andrew, it turns out > that we were most likely hit by this very same bug. > > Here's what they have to say: > "If the db crashes between logging the split and the parent-node insert, > then in recovery, since relpersistence is not initialized correctly, > when the recovery process tries to complete the operation, no xlog > record is written for the insert. If there's a slave server, then the > missing xlog record for the insert means that the slave's > incomplete_actions queue never becomes empty, therefore the slave can no > longer do recovery restartpoints." > > Some relevant information: > > [cur:92/314BC870, xid:76872047, rmid:10(Heap), ... insert: ... > [cur:92/314BC8F0, xid:76872047, rmid:11(Btree), ... split_r: ... > [cur:92/314BCBD0, xid:0, rmid:0(XLOG), len/tot_len:56/88, info:0, > prev:92/314BC8F0] checkpoint: redo 146/314BCBD0; ... shutdown > ... "redo done at 92/314BC8F0",,,,,,,,"StartupXLOG, xlog.c:6641","" Which means that an insert into the heap, triggered a btree split. At that point the database crashed. During recovery the split was supposed to be finished by the btree cleanup code.
> And apparently the relpersistence check in RelationNeedsWAL() call in > _bt_insertonpg had a role in this as well. When detecting an incomplete split the nbtree cleanup code calls _bt_insert_parent, which calls _bt_insertonpg. Which finishes the split. BUT: it doesn't log that it finished because RelationNeedsWal() says it doesn't need to. That means: * indexes on stanbys will *definitely* be corrupted * a standby won't perform any restartpoints anymore till restarted * if the primary crashes corruption is likely. Hrm. I retract my earlier statement about the low likelihood of corruption due to this. Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers