> On Apr 14, 2020, at 6:17 PM, Peter Geoghegan <p...@bowt.ie> wrote:
> 
> On Wed, Apr 8, 2020 at 3:51 PM Mark Dilger <mark.dil...@enterprisedb.com> 
> wrote:
>> Recently, as part of testing something else, I had need of a tool to create
>> surgically precise corruption within heap pages.  I wanted to make the
>> corruption from within TAP tests, so I wrote the tool as a set of perl 
>> modules.
> 
> There is also pg_hexedit:
> 
> https://github.com/petergeoghegan/pg_hexedit

I steered away from software released under the GPL, such as pg_hexedit, owing 
to difficulties in getting anything I develop accepted.  (That's a hard enough 
problem without licensing issues.).  I'm not taking a political stand for or 
against the GPL here, just a pragmatic position that I wouldn't be able to 
integrate pg_hexedit into a postgres submission.

(Thanks for writing pg_hexedit, BTW.  I'm not criticizing it.)

The purpose of these perl modules is not the viewing of files, but the 
intentional and targeted corruption of files from within TAP tests.  There are 
limited examples of tests in the postgres source tree that intentionally 
corrupt files, and as I read them, they employ a blunt force trauma approach:

In src/bin/pg_basebackup/t/010_pg_basebackup.pl:

> # induce corruption
> system_or_bail 'pg_ctl', '-D', $pgdata, 'stop';
> open $file, '+<', "$pgdata/$file_corrupt1";
> seek($file, $pageheader_size, 0);
> syswrite($file, "\0\0\0\0\0\0\0\0\0");
> close $file;
> system_or_bail 'pg_ctl', '-D', $pgdata, 'start';

In src/bin/pg_checksums/t/002_actions.pl:
>     # Time to create some corruption
>     open my $file, '+<', "$pgdata/$file_corrupted";
>     seek($file, $pageheader_size, 0);
>     syswrite($file, "\0\0\0\0\0\0\0\0\0");
>     close $file;

These blunt force trauma tests are fine, as far as they go.  But I wanted to be 
able to do things like

        # Corrupt the tuple to look like it has lots of attributes, some of
        # them null.  This falsely creates the impression that the t_bits
        # array is longer than just one byte, but t_hoff still says otherwise.
        $tup->{HEAP_HASNULL} = 1;
        $tup->{HEAP_NATTS_MASK} = 0x3FF;
        $tup->{t_bits} = 0xAA;

or

        # Same as above, but this time t_hoff plays along
        $tup->{HEAP_HASNULL} = 1;
        $tup->{HEAP_NATTS_MASK} = 0x3FF;
        $tup->{t_bits} = 0xAA;
        $tup->{t_hoff} = 32;

That's hard to do from a TAP test without modules like this, as you have to 
calculate by hand the offsets where you're going to write the corruption, and 
the bit pattern you are going to write to that location.  Even if you do all 
that, nobody else is likely going to be able to read and maintain your tests.

I'd like an easy way from within TAP tests to selectively corrupt files, to 
test whether various parts of the system fail gracefully in the presence of 
corruption.  What happens when a child partition is corrupted?  Does that 
impact queries that only access other partitions?  What kinds of corruption 
cause pg_upgrade to fail? ...to expand the scope of the corruption?  What 
happens to logical replication when there is corruption on the primary? ...on 
the standby?  What kinds of corruption cause a query to return data from 
neighboring tuples that the querying role has not permission to view?  What 
happens when a NAS is only intermittently corrupt?

The modules I've submitted thus far are incomplete for this purpose.  They 
don't yet handle toast tables, btree, hash, gist, gin, fsm, or vm, and I might 
be forgetting a few other things in the list.  Before I go and implement all of 
that, I thought perhaps others would express preferences about how this should 
all work, even stuff like, "Don't bother implementing that in perl, as I'm 
reimplementing the entire testing structure in COBOL", or similarly unexpected 
feedback.


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Reply via email to