On 2022/07/14 17:00, Michael Paquier wrote:
On Fri, Jul 08, 2022 at 08:56:14AM -0400, Robert Haas wrote:
On Thu, Jul 7, 2022 at 10:58 AM Fujii Masao <masao.fu...@oss.nttdata.com> wrote:
But if many think that it's worth adding the test, I will give a
try. But even in that case, I think it's better to commit the
proposed patch at first to fix the bug, and then to write the patch
adding the test.
I have looked at that in details,
Thanks!
and it is possible to rely on
pg_stat_activity.wait_event to be BaseBackupThrottle, which would make
ISTM that you can also use pg_stat_progress_basebackup.phase.
sure that the checkpoint triggered at the beginning of the backup
finishes and that we are in the middle of the base backup. The
command for the test should be a psql command with two -c switches
without ON_ERROR_STOP, so as the second pg_backup_stop() starts after
BASE_BACKUP is cancelled using the same connection, for something like
that:
psql -c "BASE_BACKUP (CHECKPOINT 'fast', MAX_RATE 32);" \
-c "select pg_backup_stop()" "replication=database"
The last part of the test should do a pump_until() and capture "backup
is not in progress" from the stderr output of the command run.
This is leading me to the attached, that crashes quickly without the
fix and passes with the fix.
Thanks for the patch! But I'm still not sure if it's worth adding only this
test for the corner case while we don't have basic tests for BASE_BACKUP,
pg_backup_start and pg_backup_stop.
BTW, if we decide to add that test, are you planning to back-patch it?
It's true that we don't really have good test coverage of write-ahead
logging and recovery, but this doesn't seem like the most important
thing to be testing in that area, either, and developing stable tests
for stuff like this can be a lot of work.
Well, stability does not seem like a problem to me here.
I do kind of feel like the patch is fixing two separate bugs. The
change to SendBaseBackup() is fixing the problem that, because there's
SQL access on replication connections, we could try to start a backup
in the middle of another backup by mixing and matching the two
different methods of doing backups. The change to do_pg_abort_backup()
is fixing the fact that, after aborting a base backup, we don't reset
the session state properly so that another backup can be tried
afterwards.
I don't know if it's worth committing them separately - they are very
small fixes. But it would probably at least be good to highlight in
the commit message that there are two different issues.
Grouping both fixes in the same commit sounds fine by me. No
objections from here.
This sounds fine to me, too. On the other hand, it's also fine for me to push
the changes separately so that we can easily identify each change later. So I
separated the patch into two ones.
Since one of them failed to be applied to v14 or before cleanly, I also created
the patch for those back branches. So I attached three patches.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
From 1d110bf0ff3bfb508374930eb8947a2a0d5ffe5e Mon Sep 17 00:00:00 2001
From: Fujii Masao <fu...@postgresql.org>
Date: Tue, 12 Jul 2022 09:31:57 +0900
Subject: [PATCH] Prevent BASE_BACKUP in the middle of another backup in the
same session.
Multiple non-exclusive backups are able to be run conrrently in different
sessions. But, in the same session, only one non-exclusive backup can be
run at the same moment. If pg_backup_start (pg_start_backup in v14 or before)
is called in the middle of another non-exclusive backup in the same session,
an error is thrown.
However, previously, in logical replication walsender mode, even if that
walsender session had already called pg_backup_start and started
a non-exclusive backup, it could execute BASE_BACKUP command and
start another non-exclusive backup. Which caused subsequent pg_backup_stop
to throw an error because BASE_BACKUP unexpectedly reset the session state
marked by pg_backup_start.
This commit prevents BASE_BACKUP command in the middle of another
non-exclusive backup in the same session.
Back-patch to all supported branches.
Author: Fujii Masao
Reviewed-by: Kyotaro Horiguchi, Masahiko Sawada, Michael Paquier, Robert Haas
Discussion:
https://postgr.es/m/3374718f-9fbf-a950-6d66-d973e027f...@oss.nttdata.com
---
src/backend/replication/basebackup.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/src/backend/replication/basebackup.c
b/src/backend/replication/basebackup.c
index 95440013c0..637c0ce459 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -949,6 +949,12 @@ SendBaseBackup(BaseBackupCmd *cmd)
{
basebackup_options opt;
bbsink *sink;
+ SessionBackupState status = get_backup_status();
+
+ if (status == SESSION_BACKUP_RUNNING)
+ ereport(ERROR,
+
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("a backup is already in progress in
this session")));
parse_basebackup_options(cmd->options, &opt);
--
2.36.0
From 116b4ebe813edee660b63309004ff8ffe14c533c Mon Sep 17 00:00:00 2001
From: Fujii Masao <fu...@postgresql.org>
Date: Tue, 12 Jul 2022 09:31:57 +0900
Subject: [PATCH] Prevent BASE_BACKUP in the middle of another backup in the
same session.
Multiple non-exclusive backups are able to be run conrrently in different
sessions. But, in the same session, only one non-exclusive backup can be
run at the same moment. If pg_backup_start (pg_start_backup in v14 or before)
is called in the middle of another non-exclusive backup in the same session,
an error is thrown.
However, previously, in logical replication walsender mode, even if that
walsender session had already called pg_backup_start and started
a non-exclusive backup, it could execute BASE_BACKUP command and
start another non-exclusive backup. Which caused subsequent pg_backup_stop
to throw an error because BASE_BACKUP unexpectedly reset the session state
marked by pg_backup_start.
This commit prevents BASE_BACKUP command in the middle of another
non-exclusive backup in the same session.
Back-patch to all supported branches.
Author: Fujii Masao
Reviewed-by: Kyotaro Horiguchi, Masahiko Sawada, Michael Paquier, Robert Haas
Discussion:
https://postgr.es/m/3374718f-9fbf-a950-6d66-d973e027f...@oss.nttdata.com
---
src/backend/replication/basebackup.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/src/backend/replication/basebackup.c
b/src/backend/replication/basebackup.c
index e09108d0ec..d142cc2131 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -932,6 +932,12 @@ void
SendBaseBackup(BaseBackupCmd *cmd)
{
basebackup_options opt;
+ SessionBackupState status = get_backup_status();
+
+ if (status == SESSION_BACKUP_NON_EXCLUSIVE)
+ ereport(ERROR,
+
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("a backup is already in progress in
this session")));
parse_basebackup_options(cmd->options, &opt);
--
2.36.0
From a79891e7998f7e0e33a209472647a7a141d63134 Mon Sep 17 00:00:00 2001
From: Fujii Masao <fu...@postgresql.org>
Date: Tue, 12 Jul 2022 11:53:29 +0900
Subject: [PATCH] Fix assertion failure and segmentation fault in backup code.
When a non-exclusive backup is canceled, do_pg_abort_backup() is called
and resets some variables set by pg_backup_start (pg_start_backup in v14
or before). But previously it forgot to reset the session state indicating
whether a non-exclusive backup is in progress or not in this session.
This issue could cause an assertion failure when the session running
BASE_BACKUP is terminated after it executed pg_backup_start and
pg_backup_stop (pg_stop_backup in v14 or before). Also it could cause
a segmentation fault when pg_backup_stop is called after BASE_BACKUP
in the same session is canceled.
This commit fixes the issue by making do_pg_abort_backup reset
that session state.
Back-patch to all supported branches.
Author: Fujii Masao
Reviewed-by: Kyotaro Horiguchi, Masahiko Sawada, Michael Paquier, Robert Haas
Discussion:
https://postgr.es/m/3374718f-9fbf-a950-6d66-d973e027f...@oss.nttdata.com
---
src/backend/access/transam/xlog.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/src/backend/access/transam/xlog.c
b/src/backend/access/transam/xlog.c
index b809a2152c..9854b51c6a 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8791,6 +8791,8 @@ do_pg_abort_backup(int code, Datum arg)
{
XLogCtl->Insert.forcePageWrites = false;
}
+
+ sessionBackupState = SESSION_BACKUP_NONE;
WALInsertLockRelease();
if (emit_warning)
--
2.36.0