During normal running, operations such as btree page splits are extremely careful about the order in which they acquire and release buffer locks, if they're doing something that concurrently modifies multiple pages.
During WAL replay, that all goes out the window. Even if an individual WAL-record replay function does things in the right order for "standard" cases, RestoreBkpBlocks has no idea what it's doing. So if one or more of the referenced pages gets treated as a full-page image, we are left with no guarantee whatsoever about what order the pages are restored in. That never mattered when the code was originally designed, but it sure matters during Hot Standby when other queries might be able to see the intermediate states. I can't prove that this is the cause of bug #7648, but it's fairly easy to see that it could explain the symptom. You only need to assume that the page-being-split had been handled as a full-page image, and that the new right-hand page had gotten allocated by extending the relation. Then there will be an interval just after RestoreBkpBlocks does its thing where the updated left-hand sibling is in the index and is not locked in any way, but its right-link points off the end of the index. If a few indexscans come along before the replay process gets to continue, you'd get exactly the reported errors. I'm inclined to think that we need to fix this by getting rid of RestoreBkpBlocks per se, and instead having the per-WAL-record restore routines dictate when each full-page image is restored (and whether or not to release the buffer lock immediately). That's not going to be a small change unfortunately :-( regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers