On Wed, 18 Oct 2023 at 14:55, Hayato Kuroda (Fujitsu) <kuroda.hay...@fujitsu.com> wrote: > > Dear Peter, > > Thank you for reviewing! PSA new version. > Note that 0001 and 0002 are combined into one patch. > > > Here are some review comments for v51-0001 > > > > ====== > > src/bin/pg_upgrade/check.c > > > > 0. > > +check_old_cluster_for_valid_slots(bool live_check) > > +{ > > + char output_path[MAXPGPATH]; > > + FILE *script = NULL; > > + > > + prep_status("Checking for valid logical replication slots"); > > + > > + snprintf(output_path, sizeof(output_path), "%s/%s", > > + log_opts.basedir, > > + "invalid_logical_relication_slots.txt"); > > > > 0a > > typo /invalid_logical_relication_slots/invalid_logical_replication_slots/ > > Fixed. > > > 0b. > > Since the non-upgradable slots are not strictly "invalid", is this an > > appropriate filename for the bad ones? > > > > But I don't have very good alternatives. Maybe: > > - non_upgradable_logical_replication_slots.txt > > - problem_logical_replication_slots.txt > > Per discussion [1], I kept current style. > > > src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl > > > > 1. > > +# ------------------------------ > > +# TEST: Confirm pg_upgrade fails when wrong GUC is set on new cluster > > +# > > +# There are two requirements for GUCs - wal_level and > > max_replication_slots, > > +# but only max_replication_slots will be tested here. This is because to > > +# reduce the execution time of the test. > > > > > > SUGGESTION > > # TEST: Confirm pg_upgrade fails when the new cluster has wrong GUC values. > > # > > # Two GUCs are required - 'wal_level' and 'max_replication_slots' - but to > > # reduce the test execution time, only 'max_replication_slots' is tested > > here. > > First part was fixed. Second part was removed per [1]. > > > 2. > > +# Preparations for the subsequent test: > > +# 1. Create two slots on the old cluster > > +$old_publisher->start; > > +$old_publisher->safe_psql('postgres', > > + "SELECT pg_create_logical_replication_slot('test_slot1', > > 'test_decoding', false, true);" > > +); > > +$old_publisher->safe_psql('postgres', > > + "SELECT pg_create_logical_replication_slot('test_slot2', > > 'test_decoding', false, true);" > > +); > > > > > > Can't you combine those SQL in the same $old_publisher->safe_psql. > > Combined. > > > 3. > > +# Clean up > > +rmtree($new_publisher->data_dir . "/pg_upgrade_output.d"); > > +# Set max_replication_slots to the same value as the number of slots. Both > > of > > +# slots will be used for subsequent tests. > > +$new_publisher->append_conf('postgresql.conf', "max_replication_slots = > > 1"); > > > > The code doesn't seem to match the comment - is this correct? The > > old_publisher created 2 slots, so why are you setting new_publisher > > "max_replication_slots = 1" again? > > Fixed to "max_replication_slots = 2" Note that previous test worked well > because > GUC checking on new cluster is done after checking the status of slots. > > > 4. > > +# Preparations for the subsequent test: > > +# 1. Generate extra WAL records. Because these WAL records do not get > > consumed > > +# it will cause the upcoming pg_upgrade test to fail. > > +$old_publisher->start; > > +$old_publisher->safe_psql('postgres', > > + "CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;"); > > + > > +# 2. Advance the slot test_slot2 up to the current WAL location > > +$old_publisher->safe_psql('postgres', > > + "SELECT pg_replication_slot_advance('test_slot2', NULL);"); > > + > > +# 3. Emit a non-transactional message. test_slot2 detects the message so > > that > > +# this slot will be also reported by upcoming pg_upgrade. > > +$old_publisher->safe_psql('postgres', > > + "SELECT count(*) FROM pg_logical_emit_message('false', 'prefix', > > 'This is a non-transactional message');" > > +); > > > > > > I felt this test would be clearer if you emphasised the state of the > > test_slot1 also. e.g. > > > > 4a. > > BEFORE > > +# 1. Generate extra WAL records. Because these WAL records do not get > > consumed > > +# it will cause the upcoming pg_upgrade test to fail. > > > > SUGGESTION > > # 1. Generate extra WAL records. At this point neither test_slot1 nor > > test_slot2 > > # has consumed them. > > Fixed. > > > 4b. > > BEFORE > > +# 2. Advance the slot test_slot2 up to the current WAL location > > > > SUGGESTION > > # 2. Advance the slot test_slot2 up to the current WAL location, but > > test_slot2 > > # still has unconsumed WAL records. > > IIUC, test_slot2 is caught up by pg_replication_slot_advance('test_slot2'). I > think > "but test_slot1 still has unconsumed WAL records." is appropriate. Fixed. > > > 5. > > +# pg_upgrade will fail because the slot still has unconsumed WAL records > > +command_checks_all( > > > > /because the slot still has/because there are slots still having/ > > Fixed. > > > 6. > > + [qr//], > > + 'run of pg_upgrade of old cluster with slot having unconsumed WAL records' > > +); > > > > /slot/slots/ > > Fixed. > > > 7. > > +# And check the content. Both of slots must be reported that they have > > +# unconsumed WALs after confirmed_flush_lsn. > > > > SUGGESTION > > # Check the file content. Both slots should be reporting that they have > > # unconsumed WAL records. > > Fixed. > > > > > 8. > > +# Preparations for the subsequent test: > > +# 1. Setup logical replication > > +my $old_connstr = $old_publisher->connstr . ' dbname=postgres'; > > + > > +$old_publisher->start; > > + > > +$old_publisher->safe_psql('postgres', > > + "SELECT * FROM pg_drop_replication_slot('test_slot1');"); > > +$old_publisher->safe_psql('postgres', > > + "SELECT * FROM pg_drop_replication_slot('test_slot2');"); > > + > > +$old_publisher->safe_psql('postgres', > > + "CREATE PUBLICATION regress_pub FOR ALL TABLES;"); > > > > > > 8a. > > /Setup logical replication/Setup logical replication (first, cleanup > > slots from the previous tests)/ > > Fixed. > > > 8b. > > Can't you combine all those SQL in the same $old_publisher->safe_psql. > > Combined. > > > 9. > > + > > +# Actual run, successful upgrade is expected > > +command_ok( > > + [ > > + 'pg_upgrade', '--no-sync', > > + '-d', $old_publisher->data_dir, > > + '-D', $new_publisher->data_dir, > > + '-b', $bindir, > > + '-B', $bindir, > > + '-s', $new_publisher->host, > > + '-p', $old_publisher->port, > > + '-P', $new_publisher->port, > > + $mode, > > + ], > > + 'run of pg_upgrade of old cluster'); > > > > Now that the "Dry run" part is removed, it seems unnecessary to say > > "Actual run" for this part. > > > > > > SUGGESTION > > # pg_upgrade should be successful. > > Fixed.
Few comments: 1) Even if we comment 3rd point "Emit a non-transactional message", test_slot2 still appears in the invalid_logical_replication_slots.txt file. There is something wrong here. + # 2. Advance the slot test_slot2 up to the current WAL location, but + # test_slot1 still has unconsumed WAL records. + $old_publisher->safe_psql('postgres', + "SELECT pg_replication_slot_advance('test_slot2', NULL);"); + + # 3. Emit a non-transactional message. test_slot2 detects the message so + # that this slot will be also reported by upcoming pg_upgrade. + $old_publisher->safe_psql('postgres', + "SELECT count(*) FROM pg_logical_emit_message('false', 'prefix', 'This is a non-transactional message');" + ); 2) If the test fails here, it is difficult to debug as the pg_upgrade_output.d directory was removed, so better to keep the directory as it is this case: + # Check the file content. Both slots should be reporting that they have + # unconsumed WAL records. + like( + slurp_file($slots_filename), + qr/The slot \"test_slot1\" has not consumed the WAL yet/m, + 'the previous test failed due to unconsumed WALs'); + like( + slurp_file($slots_filename), + qr/The slot \"test_slot2\" has not consumed the WAL yet/m, + 'the previous test failed due to unconsumed WALs'); + + # Clean up + rmtree($new_publisher->data_dir . "/pg_upgrade_output.d"); 3) The below could be changed: + # Check the file content. Both slots should be reporting that they have + # unconsumed WAL records. + like( + slurp_file($slots_filename), + qr/The slot \"test_slot1\" has not consumed the WAL yet/m, + 'the previous test failed due to unconsumed WALs'); + like( + slurp_file($slots_filename), + qr/The slot \"test_slot2\" has not consumed the WAL yet/m, + 'the previous test failed due to unconsumed WALs'); to: my $result = slurp_file($slots_filename); is( $result, qq(The slot "test_slot1" has not consumed the WAL yet The slot "test_slot2" has not consumed the WAL yet ), 'the previous test failed due to unconsumed WALs'); Regards, Vignesh