On Mon, Nov 20, 2017 at 4:12 PM, Masahiko Sawada <sawada.m...@gmail.com> wrote: > On Fri, Nov 17, 2017 at 11:00 AM, Michael Paquier > <michael.paqu...@gmail.com> wrote: >> On Thu, Nov 16, 2017 at 8:17 PM, Masahiko Sawada <sawada.m...@gmail.com> >> wrote: >>> On Thu, Nov 16, 2017 at 1:11 PM, Michael Paquier >>> <michael.paqu...@gmail.com> wrote: >>>> + /* >>>> + * Quick exit if session is not keeping around a non-exclusive backup >>>> + * already started. >>>> + */ >>>> + if (sessionBackupState != SESSION_BACKUP_NON_EXCLUSIVE) >>>> + return; >>>> I think that it would be more solid to use SESSION_BACKUP_NONE for the >>>> comparison, and complete the assertion after the quick exit as follows >>>> as this code path should never be taken for an exclusive backup: >>> >>> Agreed. >> >> I have spent some time doing an extra lookup with tests involving one >> and two sessions doing backups checking for failure code paths while >> waiting for archives: >> - One session with non-exclusive backup. >> - One session with exclusive backup. >> - One session is exclusive and the second is non-exclusive >> - Both are exclusive. >> Also double-checked on the way the logic around the cleanup callback >> and sessionBackupState during startup, and those look clean to me. One >> thing that was bothering me is that the callback >> nonexclusive_base_backup_cleanup is called after do_pg_start_backup() >> finishes. But between the moment sessionBackupState is set and the >> callback is registered there is no CHECK_FOR_INTERRUPTS or even elog() >> calls so even if the process starting a non-exclusive backup is tried >> to be terminated between the moment the session lock is set and the >> callback is registered things are handled correctly. >> >> So I think that we are basically safe for backups running with the SQL >> interface. > > Thank you for double checking! > >> However, things are not completely clean for base backups taken using >> the replication protocol. While monitoring more the code, I have >> noticed that perform_base_backup() calls do_pg_stop_backup() *without* >> taking any cleanup action. So if a base backup is interrupted, say >> with SIGTERM, while do_pg_stop_backup() is called and before the >> session lock is updated then it is possible to finish with >> inconsistent in-memory counters. Oops. > > Good catch! I'd missed this case. > >> No need to play with breakpoints and signals in this case, using >> something like that is enough to create inconsistent counters. >> --- a/src/backend/replication/basebackup.c >> +++ b/src/backend/replication/basebackup.c >> @@ -327,6 +327,8 @@ perform_base_backup(basebackup_options *opt, DIR >> *tblspcdir) >> } >> PG_END_ENSURE_ERROR_CLEANUP(base_backup_cleanup, (Datum) 0); >> >> + elog(ERROR, "base backups don't decrement counters here, stupid!"); >> + >> endptr = do_pg_stop_backup(labelfile->data, !opt->nowait, &endtli); >> >> A simple fix for this one is to call do_pg_stop_backup() before >> PG_END_ENSURE_ERROR_CLEANUP(). By doing so, we also make the callback >> cleanup logic consistent with what is done for the SQL equivalent >> where the callback is removed after finishing going through >> do_pg_stop_backup(). A comment would be adapted here, say something >> like "finish the backup while still holding the cleanup callback to >> avoid inconsistent in-memory data should the this call fail before >> sessionBackupState is updated." >> >> For the start phase, the current logic is fine, because in the case of >> the SQL interface the cleanup callback is registered after finishing >> do_pg_start_backup(). >> >> What do you think? > > I agree with your approach. It makes sense to me. > > Attached updated patch. Please review it.
Thanks for updating the patch! The patch basically looks good to me. + /* + * Clean up session-level lock. To avoid calling CHECK_FOR_INTERRUPTS by + * LWLockReleaseClearVar before changing the backup state we change it + * while holding the WAL insert lock. + */ I think that you should comment *why* we need to avoid calling CHECK_FOR_INTERRUPTS before changing the backup state, here. Regards, -- Fujii Masao