On Thu, Sep 21, 2017 at 5:56 PM, Michael Paquier <michael.paqu...@gmail.com> wrote: > > Yep, but the deficiency is caused by the use before_shmem_exit() in > the SQL-level functions present in 9.6~ which make the cleanup happen > potentially twice. If you look at the code of basebackup.c, you would > notice that the cleanups of the counters only happen within the > PG_ENSURE_ERROR_CLEANUP() blocks, so a backpatch to 9.6 should be > enough.
Thank you for explaining. I understood and agreed.. On Fri, Sep 22, 2017 at 8:02 AM, Michael Paquier <michael.paqu...@gmail.com> wrote: > On Thu, Sep 21, 2017 at 5:56 PM, Michael Paquier > <michael.paqu...@gmail.com> wrote: >> On Thu, Sep 21, 2017 at 4:40 PM, Masahiko Sawada <sawada.m...@gmail.com> >> wrote: >>> On Thu, Sep 21, 2017 at 2:25 PM, Michael Paquier >>> <michael.paqu...@gmail.com> wrote: >>>> +- Assert(XLogCtl->Insert.nonExclusiveBackups >= 0); >>>> + if (XLogCtl->Insert.nonExclusiveBackups > 0) >>>> + XLogCtl->Insert.nonExclusiveBackups--; >>>> Hm, no, I don't agree. I think that instead you should just leave >>>> do_pg_abort_backup() immediately if sessionBackupState is set to >>>> SESSION_BACKUP_NONE. This variable is the link between the global >>>> counters and the session stopping the backup so I don't think that we >>>> should touch this assertion of this counter. I think that this method >>>> would be safe as well for backup start as pg_start_backup_callback >>>> takes care of any cleanup. Also because the counters are incremented >>>> before entering in the PG_ENSURE_ERROR_CLEANUP block, and >>>> sessionBackupState is updated just after leaving the block. >>> >>> I think that the assertion failure still can happen if the process >>> aborts after decremented the counter and before setting to >>> SESSION_BACKUP_NONE. Am I missing something? >> >> The process would stop at the next CHECK_FOR_INTERRUPTS() and trigger >> the cleanup at this moment. So this happens when waiting for the >> archives to be done, and the session flag is set to NONE at this >> point. > > And actually, with two non-exclusive backups taken in parallel, it is > still possible to fail on another assertion in do_pg_stop_backup() > even with your patch. Imagine the following: > 1) Two backups are taken, counter for non-exclusive backups is set at 2. > 2) One backup is stopped, then interrupted. This causes the counter to > be decremented twice, once in do_pg_stop_backup, and once when > aborting. Counter is at 0, switching as well forcePageWrites to > false.. > 3) The second backup stops, a new assertion failure is triggered. > Without assertions the counter would get at -1. You're right. I updated the patch so that it exits do_pg_abort_backup if the state is NONE and setting the state to NONE in do_pg_stop_backup before releasing the WAL insert lock. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
fix_do_pg_abort_backup_v2.patch
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers