On Monday, September 17, 2012 03:58:37 PM Tom Lane wrote: > Andres Freund <and...@2ndquadrant.com> writes: > > Btw, I played with this some more on Saturday and I think, while > > definitely a bad bug, the actual consequences aren't as bad as at least > > I initially feared. > > > > Fake relcache entries are currently set in 3 scenarios during recovery: > > 1. removal of ALL_VISIBLE in heapam.c > > 2. incomplete splits and incomplete deletions in nbtxlog.c > > 3. incomplete splits in ginxlog.c > > [ #1 doesn't really hurt in 9.1, and the others are low probability ] > > OK, that explains why we've not seen a blizzard of trouble reports. > Still seems like a good idea to fix it ASAP, though. Btw, I think RhodiumToad/Andrew Gierth and I some time ago helped a user in the IRC Channel that had symptoms matching this bug.
Situation was that he started to get very high IO and xid wraparound shutdown warnings due to never finishing and not canceleable autovacuums. After some investigation it turned out that btree indexes were processed at that time. We found they had cyclic btpo_next pointers leading to an endless loop in _bt_pagedel. We solved the issue by forcing leftsib = P_NONE inside the while (P_ISDELETED(opaque) || opaque->btpo_next != target) which let a queue DROP INDEX get the necessary locks. Unfortuantely this was on a busy production system with a nearing shutdown, so not much was kept for further diagnosis. After this bug was discovered I asked the user and indeed they previously shutdown the database twice in quick succession during heavy activity with -m immediate which could exactly lead to such a problem due to incompletely processed page splits. Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers