Re: [HACKERS] [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.

Andres Freund Fri, 21 Sep 2012 07:41:41 -0700

On Friday, September 21, 2012 03:30:31 PM Marko Tiikkaja wrote:
> On 9/20/12 11:55 PM, Andres Freund wrote:
> > On Monday, September 17, 2012 03:58:37 PM Tom Lane wrote:
> >> OK, that explains why we've not seen a blizzard of trouble reports.
> >> Still seems like a good idea to fix it ASAP, though.
> > 
> > Btw, I think RhodiumToad/Andrew Gierth and I some time ago helped a user
> > in the IRC Channel that had symptoms matching this bug.
> 
> Another such user reporting in. :-(
> 
> Our slave started accumulating WAL files and ran out of disk space
> yesterday.  After investigation from Andres and Andrew, it turns out
> that we were most likely hit by this very same bug.
> 
> Here's what they have to say:
> "If the db crashes between logging the split and the parent-node insert,
> then in recovery, since relpersistence is not initialized correctly,
> when the recovery process tries to complete the operation, no xlog
> record is written for the insert.  If there's a slave server, then the
> missing xlog record for the insert means that the slave's
> incomplete_actions queue never becomes empty, therefore the slave can no
> longer do recovery restartpoints."
> 
> Some relevant information:
> 
> [cur:92/314BC870, xid:76872047, rmid:10(Heap), ... insert: ...
> [cur:92/314BC8F0, xid:76872047, rmid:11(Btree), ... split_r: ...
> [cur:92/314BCBD0, xid:0, rmid:0(XLOG), len/tot_len:56/88, info:0,
> prev:92/314BC8F0] checkpoint: redo 146/314BCBD0; ... shutdown
> ... "redo done at 92/314BC8F0",,,,,,,,"StartupXLOG, xlog.c:6641",""
Which means that an insert into the heap, triggered a btree split. At that 
point the database crashed. During recovery the split was supposed to be 
finished by the btree cleanup code.


> And apparently the relpersistence check in RelationNeedsWAL() call in
> _bt_insertonpg had a role in this as well.
When detecting an incomplete split the nbtree cleanup code calls 
_bt_insert_parent, which calls _bt_insertonpg. Which finishes the split. BUT: 
it doesn't log that it finished because RelationNeedsWal() says it doesn't need 
to.

That means:
* indexes on stanbys will *definitely* be corrupted
* a standby won't perform any restartpoints anymore till restarted
* if the primary crashes corruption is likely.

Hrm. I retract my earlier statement about the low likelihood of corruption due 
to this.

Greetings,

Andres
-- 
 Andres Freund                     http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.

Reply via email to