On Tue, Apr 16, 2019 at 10:00 PM Peter Geoghegan <p...@bowt.ie> wrote: > > On Mon, Apr 15, 2019 at 7:30 PM Alexander Korotkov > <a.korot...@postgrespro.ru> wrote: > > Currently we amcheck supports lossy checking for missing parent > > downlinks. It collects bitmap of downlink hashes and use it to check > > subsequent tree level. We've experienced some large corrupted indexes > > which pass this check due to its looseness. > > Can you be more specific? What was the cause of the corruption? I'm > always very interested in hearing about cases that amcheck could have > detected, but didn't.
AFAIR, the cause of corruption in this case was our (Postgres Pro) modification. Not something really present in PostgreSQL core. > > Was the issue that the Bloom filter was simply undersized/ineffective? Yes, increasing of Bloom filter size also helps. But my intention was to make non-lossy check here. > > > However, it seems to me we can implement this check in non-lossy > > manner without making it significantly slower. We anyway traverse > > downlinks from parent to children in order to verify that hikeys are > > corresponding to downlink keys. > > Actually, we don't check the high keys in children against the parent > (all other items are checked, though). We probably *should* do > something with the high key when verifying consistency across levels, > but currently we don't. (We only use the high key for the same-page > high key check -- more on this below.) Nice idea. > > We can also traverse from one > > downlink to subsequent using rightlinks. So, if there are some > > intermediate pages between them, they are candidates to have missing > > parent downlinks. The patch is attached. > > > > With this patch amcheck could successfully detect corruption for our > > customer, which unpatched amcheck couldn't find. > > Maybe we can be a lot less conservative about sizing the Bloom filter > instead? That would be an easier fix IMV -- we can probably change > that logic to be a lot more aggressive without anybody noticing, since > the Bloom filter is already usually tiny. We are already not very > careful about saving work within bt_index_parent_check(), but with > this patch we follow each downlink twice instead of once. That seems > wasteful. It wouldn't be really wasteful, because the same children were accessed recently. So, they are likely not yet evicted from shared_buffers. I think we can fit both checks into one loop over downlinks instead of two. > The reason I used a Bloom filter here is because I would eventually > like the missing downlink check to fingerprint entire tuples, not just > block numbers. In L&Y terms, the check could in the future fingerprint > the separator key and the downlink at the same time, not just the > downlink. That way, we could probe the Bloom filter on the next level > down for its high key (with the right sibling pointer set to be > consistent with the parent) iff we don't see that the page split was > interrupted (i.e. iff P_INCOMPLETE_SPLIT() bit is not set). Obviously > this would be a more effective form of verification, since we would > notice high key values that don't agree with the parent's values for > the same sibling/cousin/child block. > > I didn't do it that way for v11 because of "minus infinity" items on > internal pages, which don't store the original key (the key remains > the high key of the left sibling page, but we truncate the original to > 0 attributes in _bt_pgaddtup()). I think that we should eventually > stop truncating minus infinity items, and actually store a "low key" > at P_FIRSTDATAKEY() within internal pages instead. That would be > useful for other things anyway (e.g. prefix compression). Yes, we can do more checks with bloom filter. But assuming that we anyway iterate over children for each non-leaf page, can we do: 1) All the checks, which bt_downlink_check() does for now 2) Check there are no missing downlinks 3) Check hikeys in one pass, can't we? ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company