On 20/05/2023 05:17, Peter Geoghegan wrote:
It has come to my attention that there is a remaining issue of the same general nature in nbtree VACUUM's page deletion code. Though this remaining issue seems significantly less likely to come up in practice, there is no reason to take any chances here.
Yeah, let's be consistent. Any idea what might cause this corruption?
Attached patch fixes it. + /* + * Validate target's right sibling page's left link points back to target. + * + * This is known to fail in the field in the presence of index corruption, + * so we go to the trouble of avoiding a hard ERROR. This is fairly close + * to what we did earlier on when we located the target's left sibling + * (iff target has a left sibling). + */
This comment notes that this is similar to what we did with the left sibling, but there isn't really any mention at the left sibling code about avoiding hard ERRORs. Feels a bit backwards. Maybe move the comment about avoiding the hard ERROR to where the left sibling is handled. Or explain it in the function comment and just have short "shouldn't happen, but avoid hard ERROR if the index is corrupt" comment here.
Also attached is a bugfix for a minor issue in amcheck's bt_index_parent_check() function, which I noticed in passing, while I tested the first patch. We assumed that we'd always land on the leftmost page on each level first (the leftmost according to internal pages one level up). That assumption is faulty because page deletion of the leftmost page is quite possible. Page deletion can be interrupted, leaving a half-dead leaf page (possibly the leftmost leaf page) without any downlink one level up, while still leaving a left sibling link on the leaf level (in the leaf page that isn't about to become the leftmost, but won't until the interrupted page deletion can be completed).
You could check that the left sibling is indeed a half-dead page.
ereport(DEBUG1, (errcode(ERRCODE_NO_DATA), errmsg("block %u is not leftmost in index \"%s\"", current, RelationGetRelationName(state->rel))));
ERRCODE_NO_DATA doesn't look right. Let's just leave out the errcode. -- Heikki Linnakangas Neon (https://neon.tech)