> On Mar 13, 2021, at 11:06 AM, Peter Geoghegan <p...@bowt.ie> wrote:
> 
> On Sat, Mar 13, 2021 at 10:35 AM Mark Dilger
> <mark.dil...@enterprisedb.com> wrote:
>> The testing strategy I'm using is to corrupt heap and btree pages in schema 
>> "public" of the "regression" database created by `make installcheck`, by 
>> overwriting random bytes in randomly selected locations on pages after the 
>> page header.  Then I run `pg_amcheck regression` and see if anything 
>> segfaults.  Doing this repeatedly, with random bytes and locations within 
>> files not the same from one run to the next, I can find the locations of 
>> segfaults that are particularly common.
> 
> That seems like a reasonable starting point to me.
> 
>> The first common problem [1] happens at verify_nbtree.c:1422 when a 
>> corruption report is being generated.  The generation does not seem entirely 
>> safe, and the problematic bit can be avoided
> 
>> The temPointerGetBlockNumberNoCheck(tid) seems to be unsafe here.
> 
> I think that the best way to further harden verify_nbtree.c at this
> point is to do the most basic validation of IndexTuples in our own new
> variant of a core bufpage.h macro: PageGetItemCareful(). In other
> words, we should invent the IndexTuple equivalent of the existing
> PageGetItemIdCareful() function (which already does the same thing for
> item pointers), and then replace all current PageGetItem() calls with
> PageGetItemCareful() calls -- we ban raw PageGetItem() calls from
> verify_nbtree.c forever.
> 
> Here is some of the stuff that could go in PageGetItemCareful(), just
> off the top of my head:
> 
> * The existing "if (tupsize != ItemIdGetLength(itemid))" check at the
> top of bt_target_page_check() -- make sure the length from the
> caller's line pointer agrees with IndexTupleSize().
> 
> * Basic validation against the index's tuple descriptor -- in
> particular, that varlena headers are basically sane, and that the
> apparent range of datums is safely within the space on the page for
> the tuple.
> 
> * Similarly, BTreeTupleGetHeapTID() should not be able to return a
> pointer that doesn't actually point somewhere inside the space that
> the target page has for the IndexTuple.
> 
> * No external TOAST pointers, since this is an index AM, and so
> doesn't allow that.
> 
> In general this kind of very basic validation should be pushed down to
> the lowest level code, so that it detects the problem as early as
> possible, before slightly higher level code has the opportunity to
> run. Higher level code is always going to be at risk of making
> assumptions about the data not being corrupt, because there is so much
> more of it, and also because it tends to roughly look like idiomatic
> AM code.

Excellent write-up!  Thanks!

I'll work on this and get a patch set going if time allows.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Reply via email to