> On Mar 16, 2021, at 12:52 PM, Robert Haas <robertmh...@gmail.com> wrote:
> 
> On Mon, Mar 15, 2021 at 10:10 PM Mark Dilger
> <mark.dil...@enterprisedb.com> wrote:
>> It is unfortunate that the failing test only runs pg_amcheck after creating 
>> numerous corruptions, as we can't know if pg_amcheck would have complained 
>> about pg_statistic before the corruptions were created in other tables, or 
>> if it only does so after.  The attached patch v7-0003 adds a call to 
>> pg_amcheck after all tables are created and populated, but before any 
>> corruptions are caused.  This should help narrow down what is happening, and 
>> doesn't hurt to leave in place long-term.
>> 
>> I don't immediately see anything wrong with how pg_statistic uses a 
>> pseudo-type, but it leads me to want to poke a bit more at pg_statistic on 
>> hornet and tern, though I don't have any regression tests specifically for 
>> doing so.
>> 
>> Tests v7-0001 and v7-0002 are just repeats of the tests posted previously.
> 
> Since we now know that shutting autovacuum off makes the problem go
> away, I don't see a reason to commit 0001. We should fix pg_amcheck
> instead, if, as presently seems to be the case, that's where the
> problem is.

If you get unlucky, autovacuum will process one of the tables that the test 
intentionally corrupted, with bad consequences, ultimately causing build farm 
intermittent test failures.  We could wait to see if this ever happens before 
fixing it, if you prefer.  I'm not sure what that buys us, though.

The right approach, I think, is to extend the contrib/amcheck tests to include 
regressing this particular case to see if it fails.  I've written that test and 
verified that it fails against the old code and passes against the new.

> I just committed 0002.

Thanks!

> I think 0003 is probably a good idea, but I haven't committed it yet.

It won't do anything for us in this particular case, but it might make 
debugging failures quicker in the future.

> As for 0004, it seems to me that we might want to do a little more
> rewording of these messages and perhaps we should try to do it all at
> once. Like, for example, your other change to print out the toast
> value ID seems like a good idea, and could apply to any new messages
> as well as some existing ones. Maybe there are also more fields in the
> TOAST pointer for which we could add checks.

Of the toast pointer fields:

    int32       va_rawsize;     /* Original data size (includes header) */
    int32       va_extsize;     /* External saved size (doesn't) */
    Oid         va_valueid;     /* Unique ID of value within TOAST table */
    Oid         va_toastrelid;  /* RelID of TOAST table containing it */

all seem worth getting as part of any toast error message, even if these fields 
themselves are not corrupt.  It just makes it easier to understand the context 
of the error you're looking at.  At first I tried putting these into each 
message, but it is very wordy to say things like "toast pointer with rawsize %u 
and extsize %u pointing at relation with oid %u" and such.  It made more sense 
to just add these four fields to the verify_heapam tuple format.  That saves 
putting them in the message text itself, and has the benefit that you could 
filter the rows coming from verify_heapam() for ones where valueid is or is not 
null, for example.  This changes the external interface of verify_heapam, but I 
didn't bother with a amcheck--1.3--1.4.sql because amcheck--1.2--1.3. sql was 
added as part of the v14 development work and has not yet been released.  My 
assumption is that I can just change it, rather than making a new upgrade file.

These patches fix the visibility rules and add extra toast checking.  Some of 
the previous patch material is not included, since it is not clear to me if you 
wanted to commit any of it.


Attachment: v9-0001-Fixing-amcheck-tuple-visibility-rules.patch
Description: Binary data

Attachment: v9-0002-pg_amcheck-provide-additional-toast-corruption-in.patch
Description: Binary data

 

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Reply via email to