> On Apr 14, 2020, at 6:17 PM, Peter Geoghegan <p...@bowt.ie> wrote:
>
> On Wed, Apr 8, 2020 at 3:51 PM Mark Dilger <mark.dil...@enterprisedb.com>
> wrote:
>> Recently, as part of testing something else, I had need of a tool to create
>> surgically precise corruption within heap pages. I wanted to make the
>> corruption from within TAP tests, so I wrote the tool as a set of perl
>> modules.
>
> There is also pg_hexedit:
>
> https://github.com/petergeoghegan/pg_hexedit
I steered away from software released under the GPL, such as pg_hexedit, owing
to difficulties in getting anything I develop accepted. (That's a hard enough
problem without licensing issues.). I'm not taking a political stand for or
against the GPL here, just a pragmatic position that I wouldn't be able to
integrate pg_hexedit into a postgres submission.
(Thanks for writing pg_hexedit, BTW. I'm not criticizing it.)
The purpose of these perl modules is not the viewing of files, but the
intentional and targeted corruption of files from within TAP tests. There are
limited examples of tests in the postgres source tree that intentionally
corrupt files, and as I read them, they employ a blunt force trauma approach:
In src/bin/pg_basebackup/t/010_pg_basebackup.pl:
> # induce corruption
> system_or_bail 'pg_ctl', '-D', $pgdata, 'stop';
> open $file, '+<', "$pgdata/$file_corrupt1";
> seek($file, $pageheader_size, 0);
> syswrite($file, "\0\0\0\0\0\0\0\0\0");
> close $file;
> system_or_bail 'pg_ctl', '-D', $pgdata, 'start';
In src/bin/pg_checksums/t/002_actions.pl:
> # Time to create some corruption
> open my $file, '+<', "$pgdata/$file_corrupted";
> seek($file, $pageheader_size, 0);
> syswrite($file, "\0\0\0\0\0\0\0\0\0");
> close $file;
These blunt force trauma tests are fine, as far as they go. But I wanted to be
able to do things like
# Corrupt the tuple to look like it has lots of attributes, some of
# them null. This falsely creates the impression that the t_bits
# array is longer than just one byte, but t_hoff still says otherwise.
$tup->{HEAP_HASNULL} = 1;
$tup->{HEAP_NATTS_MASK} = 0x3FF;
$tup->{t_bits} = 0xAA;
or
# Same as above, but this time t_hoff plays along
$tup->{HEAP_HASNULL} = 1;
$tup->{HEAP_NATTS_MASK} = 0x3FF;
$tup->{t_bits} = 0xAA;
$tup->{t_hoff} = 32;
That's hard to do from a TAP test without modules like this, as you have to
calculate by hand the offsets where you're going to write the corruption, and
the bit pattern you are going to write to that location. Even if you do all
that, nobody else is likely going to be able to read and maintain your tests.
I'd like an easy way from within TAP tests to selectively corrupt files, to
test whether various parts of the system fail gracefully in the presence of
corruption. What happens when a child partition is corrupted? Does that
impact queries that only access other partitions? What kinds of corruption
cause pg_upgrade to fail? ...to expand the scope of the corruption? What
happens to logical replication when there is corruption on the primary? ...on
the standby? What kinds of corruption cause a query to return data from
neighboring tuples that the querying role has not permission to view? What
happens when a NAS is only intermittently corrupt?
The modules I've submitted thus far are incomplete for this purpose. They
don't yet handle toast tables, btree, hash, gist, gin, fsm, or vm, and I might
be forgetting a few other things in the list. Before I go and implement all of
that, I thought perhaps others would express preferences about how this should
all work, even stuff like, "Don't bother implementing that in perl, as I'm
reimplementing the entire testing structure in COBOL", or similarly unexpected
feedback.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company