Hello, While the exact cause of the situation is not known yet but we have apparently forgot that pg_stop_backup can be called simultaneously with pg_start_backup. It seems anyway to me that pg_stop_backup should be called before the end of pg_start_backup from their definition and what they do. And it should fix the assertion failure.
On solution is exclusiveBackupStarted (I'd like to rename the variable) on shared memory as the patch does. Another place where we can have something with the same effect is file system. We can create 'backup_label.tmp" at very early in pg_start_backup and rename it to backup_label at the end of the function. Anyway exclusive pg_stop_backup needs that. A defect of that would be the remaining backup_label.tmp file after crash during pg_start_backup. Renaming tmp to (none) is atomic enough for this usage but I'm not sure if it's in a network file system. exclusiveBackup is also used this kind of exclusion, this also is replasable with the .tmp file. But after some thoughts, it found to need to add a bunch of error handling code for the file operations and it should become too complex. So I abandoned it. At Fri, 5 Aug 2016 12:16:13 +0900, Michael Paquier <michael.paqu...@gmail.com> wrote in <cab7npqqemq8k3vwh+t7gq0o3i-3kirynpue9wrgqjqvufab...@mail.gmail.com> > On Thu, Aug 4, 2016 at 4:38 PM, Michael Paquier > <michael.paqu...@gmail.com> wrote: > > Andreas, with the patch attached is the assertion still triggered? > > > > Thoughts from others are welcome as well. > > Self review: > * of streaming base backups currently in progress. forcePageWrites is > set > * to true when either of these is non-zero. lastBackupStart is the > latest > * checkpoint redo location used as a starting point for an online > backup. > + * exclusiveBackupStarted is true while pg_start_backup() is being called > + * during an exclusive backup. > */ > bool exclusiveBackup; > int nonExclusiveBackups; > XLogRecPtr lastBackupStart; > + bool exclusiveBackupStarted; > It would be better to just use one variable that uses an enum to track > the following states of an exclusive backup: none, starting and > in-progress. What I thought when I looked the first patch is that. Addition to that I don't see the name exclusiveBackupStated is appropriate. One more argument is the necessity of the WALInsertLock at the end of pg_start_backup. No other backend cannot reach there concurrently and possible latency from cache coherency won't be a matter for the purpose, I suppose. What do you think it is needed for? regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers