On Thu, Nov 10, 2022 at 1:31 AM David Christensen <david.christen...@crunchydata.com> wrote: >
Thanks for providing the v7 patch, please see my comments and responses below. > > 2. I'm unable to understand the use-case for --fixup-fpi option. > > pg_waldump is supposed to be just WAL reader, and must not return any > > modified information, with --fixup-fpi option, the patch violates this > > principle i.e. it sets page LSN and returns. Without actually > > replaying the WAL record on the page, how is it correct to just set > > the LSN? How will it be useful? ISTM, we must ignore this option > > unless there's a strong use-case. > > How I was envisioning this was for cases like extreme surgery for > corrupted pages, where you extract the page from WAL but it has lsn > and checksum set so you could do something like `dd if=fixup-block > of=relation ...`, so it *simulates* the replay of said fullpage blocks > in cases where for some reason you can't play the intermediate > records; since this is always a fullpage block, it's capturing what > would be the snapshot so you could manually insert somewhere as needed > without needing to replay (say if dealing with an incomplete or > corrupted WAL stream). Recovery sets the page LSN after it replayed the WAL record on the page right? Recovery does this - base_page/FPI + apply_WAL_record_and_then_set_applied_WAL_record's_LSN = new_version_of_page. Essentially, in your patch, you are just setting the WAL record LSN with the page contents being the base page's. I'm still not sure what's the real use-case here. We don't have an independent function in postgres, given a base page and a WAL record that just replays the WAL record and output's the new version of the page, so I think what you do in the patch with fixup option seems wrong to me. > > 5. > > + if (!RestoreBlockImage(record, block_id, page)) > > + continue; > > + > > + /* we have our extracted FPI, let's save it now */ > > After extracting the page from the WAL record, do we need to perform a > > checksum on it? I think you just need to do the following, this will ensure the authenticity of the page that pg_waldump returns. if ((PageHeader) page)->pd_checksum != pg_checksum_page((char *) page, blk)) { pg_fatal("page checksum failed"); } > > case 'W': > > config.save_fpw_path = pg_strdup(optarg); > > case 'X': > > config.fixup_fpw = true; > > config.save_fpw_path = pg_strdup(optarg); > > Like separate opt processing with their own `break` statement? > Probably a bit more readable/conventional. Yes. Some more comments: 1. + PGAlignedBlock zerobuf; Essentially, it's not a zero buffer, please rename the variable to something like 'buf' or 'page_buf' or someother? 2. + if (pg_pwrite(fileno(OPF), page, BLCKSZ, 0) != BLCKSZ) Replace pg_pwrite with fwrite() and avoid fileno() system calls that should suffice here, AFICS, we don't need pg_pwrite. 3. + if (config.save_fpw_path != NULL) + { + /* Create the dir if it doesn't exist */ + if (pg_mkdir_p(config.save_fpw_path, pg_dir_create_mode) < 0) I think you still need pg_check_dir() here, how about something like below? if (pg_check_dir(config.save_fpw_path) == 0) { if (pg_mkdir_p(config.save_fpw_path, pg_dir_create_mode) < 0) { /* error */ } } 4. + /* Create the dir if it doesn't exist */ + if (pg_mkdir_p(config.save_fpw_path, pg_dir_create_mode) < 0) + { + pg_log_error("could not create output directory \"%s\": %m", + config.save_fpw_path); + goto bad_argument; Why is the directory creation error a bad_argument? I think you need just pg_fatal() here. 5. + fsync(fileno(OPF)); + fclose(OPF); I think just the fsync() isn't enough, you still need fsync_fname() and/or fsync_parent_path(), perhaps after for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++) loop. 6. Speaking of which, do we need to do fsync()'s optionally? If we were to write many such FPI files, aren't there going to be more fsync() calls and imagine this feature being used on a production server and a lot of fsync() will definitely make running server fsync() ops slower. I think we need a new option whether pg_waldump ever do fsync() or not, something similar to --no-sync of pg_receivewal/pg_upgrade/pg_dump/pg_initdb/pg_checksums etc. I would like it if the pg_waldump's --no-sync is coded as 0001 and 0002 can make use of it. 7. + pg_fatal("couldn't write out complete fullpage image to file: %s", filename); We typically use "full page image" in the output strings, please correct. 8. + + if (((PageHeader) page)->pd_checksum) + ((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blk); Why do you need to set the page's checksum by yourself? I don't think this is the right way, pg_waldump should just return what it sees in the WAL record, of course, after verifying a few checks (like checksum is correct or not), but it mustn't set or compute anything new in the returned page. Few comments on the tests: 1. +$primary->init('-k'); +$primary->append_conf('postgresql.conf', "max_wal_size='100MB'"); +$primary->append_conf('postgresql.conf', "wal_level='replica'"); +$primary->append_conf('postgresql.conf', 'archive_mode=on'); +$primary->append_conf('postgresql.conf', "archive_command='/bin/false'"); +$primary->start; I don't think we need these many append_conf calls here, see how others are doing it with just one single call: $node->append_conf( 'postgresql.conf', qq{ listen_addresses = '$hostaddr' krb_server_keyfile = '$keytab' log_connections = on lc_messages = 'C' }); 2. +$primary->append_conf('postgresql.conf', "max_wal_size='100MB'"); Do you really need 100MB max_wal_size? Why can't you just initialize your cluster with 1MB wal files instead of 16MB and set max_wal_size to 4MB or a bit more, something like 019_replslot_limit.pl does? 3. +# generate data/wal to examine +$primary->safe_psql('postgres', q(CREATE DATABASE db1)); +$primary->safe_psql('db1', <<EOF); +CREATE TABLE test_table AS SELECT generate_series(1,100) a; +CHECKPOINT; +SELECT pg_switch_wal(); +UPDATE test_table SET a = a + 1; +CHECKPOINT; +SELECT pg_switch_wal(); +UPDATE test_table SET a = a + 1; +CHECKPOINT; +SELECT pg_switch_wal(); +EOF I don't think you need these many complex things to generate WAL, multiple CHECKPOINT;s can make tests slower. To keep it simple, you can just create a table, insert a single row, checkpoint, update the row, switch the wal - no need to test if your feature generates multiple WAL files, it's enough to test if it generates just one. Please simplfiy the tests. 4. +$primary->append_conf('postgresql.conf', "wal_level='replica'"); +$primary->append_conf('postgresql.conf', 'archive_mode=on'); +$primary->append_conf('postgresql.conf', "archive_command='/bin/false'"); Why do you need to set wal_level to replica, out of the box your cluster comes with replica only no? And why do you need archive_mode on and set the command to do nothing? Why archiving is needed for testing your feature firstly? 5. +my $primary = PostgreSQL::Test::Cluster->new('primary'); Can you rename your node to other than primary? Because this isn't a test of replication where primary and standby nodes get created. How about just 'node'? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com