On Fri, Nov 5, 2021 at 8:18 PM Mark Dilger <mark.dil...@enterprisedb.com> wrote: > Thanks, Peter, for this analysis.
It's strategically important work. What you've done so far has every chance of catching corruption involving storage issues, which is great. But we ought to do more to catch certain known general patterns of corruption. The so-called "freeze the dead" bug (see commit 9c2f0a6c for the final fix) is the single weirdest and scariest bug that I can recall (there were a couple of incorrect bug fixes that had to be reverted), and it is very much the kind of thing that I'd like to see verify_heapam.c handle exhaustively. That would de-risk a lot of things. Note that the same variety of HOT chain corruption bugs besides that one, so it's really a general class of corruption, not one specific bug. The heapallindexed verification option actually detected the "freeze the dead" bug [1], although that was a total accident -- that checking option happens to use CREATE INDEX infrastructure, which happens to sanitize HOT chains by calling heap_get_root_tuples() and detecting contradictions -- heap_get_root_tuples() is code that is right next to the pruning code I mentioned. That level of detection is certainly better than nothing, but it's still not good enough -- we need this in verify_heapam, too. Here's why I think that we need very thorough HOT chain verification that lives directly in verify_heapam: First of all, the heapallindexed option is rather expensive, whereas the additional overhead for verify_heapam would be minimal -- you probably wouldn't even need to make it optional. Second of all, why should you need to use bt_index_check() to detect what is after all heap corruption? And third of all, the approach of detecting corruption by outsourcing detection to heapam_index_build_range_scan() (which calls heap_get_root_tuples() itself) probably risks missing corrupt HOT chains where the corruption doesn't look a certain way -- it was not designed with amcheck in mind. heapam_index_build_range_scan() + heap_get_root_tuples() will notice a heap-only tuple without a parent, which is a good start. But what about an LP_REDIRECT item whose lp_off (i.e. page offset number redirect) somehow points to any item on the page other than a valid heap-only tuple? Also doesn't catch contradictory commit states for heap-only tuples that form a hot chain through their t_ctid links. I'm sure that there are other things that I haven't thought of, too -- there is a lot going on here. This HOT chain verification business is a little like the checking that we do in verify_nbtree.c, actually. The individual tuples (in this case heap tuples) may well not be corrupt (or demonstrably corrupt) in isolation. But taken together, in a particular page offset number sequence order, they can nevertheless *imply* corruption -- it's implied if the tuples tell a contradictory story about what's going on at the level of the whole page. A HOT chain LP_REDIRECT is after all a little like a "mini index" stored on a heap page. Sequential scans don't care about LP_REDIRECTs at all. But index scans expect a great deal from LP_REDIRECTs. [1] https://www.postgresql.org/message-id/cah2-wznm4rcrhfaiwkpwtpew2bxdtgrozk7jwwgucxeh3d1...@mail.gmail.com -- Peter Geoghegan