On Fri, Jul 08, 2022 at 08:56:14AM -0400, Robert Haas wrote:
> On Thu, Jul 7, 2022 at 10:58 AM Fujii Masao <masao.fu...@oss.nttdata.com> 
> wrote:
>> But if many think that it's worth adding the test, I will give a
>> try. But even in that case, I think it's better to commit the
>> proposed patch at first to fix the bug, and then to write the patch
>> adding the test.

I have looked at that in details, and it is possible to rely on
pg_stat_activity.wait_event to be BaseBackupThrottle, which would make
sure that the checkpoint triggered at the beginning of the backup
finishes and that we are in the middle of the base backup.  The
command for the test should be a psql command with two -c switches
without ON_ERROR_STOP, so as the second pg_backup_stop() starts after
BASE_BACKUP is cancelled using the same connection, for something like
that:
psql -c "BASE_BACKUP (CHECKPOINT 'fast', MAX_RATE 32);" \
     -c "select pg_backup_stop()" "replication=database"

The last part of the test should do a pump_until() and capture "backup
is not in progress" from the stderr output of the command run.

This is leading me to the attached, that crashes quickly without the
fix and passes with the fix.

> It's true that we don't really have good test coverage of write-ahead
> logging and recovery, but this doesn't seem like the most important
> thing to be testing in that area, either, and developing stable tests
> for stuff like this can be a lot of work.

Well, stability does not seem like a problem to me here.

> I do kind of feel like the patch is fixing two separate bugs. The
> change to SendBaseBackup() is fixing the problem that, because there's
> SQL access on replication connections, we could try to start a backup
> in the middle of another backup by mixing and matching the two
> different methods of doing backups. The change to do_pg_abort_backup()
> is fixing the fact that, after aborting a base backup, we don't reset
> the session state properly so that another backup can be tried
> afterwards.
> 
> I don't know if it's worth committing them separately - they are very
> small fixes. But it would probably at least be good to highlight in
> the commit message that there are two different issues.

Grouping both fixes in the same commit sounds fine by me.  No
objections from here.
--
Michael
From 062bf8a06f68b7d543288260dd0cdf50bb5e9a72 Mon Sep 17 00:00:00 2001
From: Michael Paquier <mich...@paquier.xyz>
Date: Thu, 14 Jul 2022 16:56:17 +0900
Subject: [PATCH] Add TAP test for BASE_BACKUP cancellation with
 pg_stop_backup()

---
 src/test/recovery/t/033_basebackup_cancel.pl | 60 ++++++++++++++++++++
 1 file changed, 60 insertions(+)
 create mode 100644 src/test/recovery/t/033_basebackup_cancel.pl

diff --git a/src/test/recovery/t/033_basebackup_cancel.pl b/src/test/recovery/t/033_basebackup_cancel.pl
new file mode 100644
index 0000000000..56cbaf83d0
--- /dev/null
+++ b/src/test/recovery/t/033_basebackup_cancel.pl
@@ -0,0 +1,60 @@
+# Copyright (c) 2021-2022, PostgreSQL Global Development Group
+
+# BASE_BACKUP cancellation with replication database connection.
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Initialize primary node
+my $node = PostgreSQL::Test::Cluster->new('node');
+$node->init(allows_streaming => 1);
+$node->append_conf('postgresql.conf', 'log_replication_commands = on');
+$node->start;
+
+note "testing BASE_BACKUP cancellation";
+
+my $sigchld_bb_timeout =
+  IPC::Run::timer($PostgreSQL::Test::Utils::timeout_default);
+
+# This test requires a replication connection with a database, as it mixes
+# a replication command and a SQL command.  The first BASE_BACKUP is throttled
+# to give enough room for the cancellation running below.  The second command
+# for pg_backup_stop() should fail.
+my $connstr =
+  $node->connstr('postgres') . " replication=database dbname=postgres";
+my ($sigchld_bb_stdin, $sigchld_bb_stdout, $sigchld_bb_stderr) = ('', '', '');
+my $sigchld_bb = IPC::Run::start(
+	[
+		'psql', '-X', '-c', "BASE_BACKUP (CHECKPOINT 'fast', MAX_RATE 32);",
+		'-c',   'SELECT pg_backup_stop()',
+		'-d',   $connstr
+	],
+	'<',
+	\$sigchld_bb_stdin,
+	'>',
+	\$sigchld_bb_stdout,
+	'2>',
+	\$sigchld_bb_stderr,
+	$sigchld_bb_timeout);
+
+# Waiting on the wait event BaseBackupThrottle ensures that the checkpoint
+# issued at backup start completes, making the cancellation happen in the
+# middle of the base backup sent.
+is( $node->poll_query_until(
+		'postgres',
+		"SELECT pg_cancel_backend(pid) FROM pg_stat_activity WHERE "
+		  . "wait_event = 'BaseBackupThrottle' "
+		  . "AND backend_type = 'walsender' AND query ~ 'BASE_BACKUP'"),
+	"1",
+	"WAL sender sending base backup killed");
+
+# The psql command should fail on pg_stop_backup().
+ok( pump_until(
+		$sigchld_bb,         $sigchld_bb_timeout,
+		\$sigchld_bb_stderr, qr/backup is not in progress/),
+	'base backup cleanly cancelled');
+$sigchld_bb->finish();
+
+done_testing();
-- 
2.36.1

Attachment: signature.asc
Description: PGP signature

Reply via email to