Hi, On 2015-01-22 19:56:07 +0100, Andres Freund wrote: > Hi, > > On 2015-01-20 16:28:19 +0100, Andres Freund wrote: > > I'm analyzing a problem in which a customer had a pg_basebackup (from > > standby) created 9.2 cluster that failed with "WAL contains references to > > invalid pages". The failed record was a "xlog redo visible" > > i.e. XLOG_HEAP2_VISIBLE. > > > > First I thought there might be another bug along the line of > > 17fa4c321cc. Looking at the code and the WAL that didn't seem to be the > > case (man, I miss pg_xlogdump). Other, slightly older, standbys, didn't > > seem to have any problems. > > > > Logs show that a ALTER DATABASE ... SET TABLESPACE ... was running when > > the basebackup was started and finished *before* pg_basebackup finished. > > > > movedb() basically works in these steps: > > 1) lock out users of the database > > 2) RequestCheckpoint(IMMEDIATE|WAIT) > > 3) DropDatabaseBuffers() > > 4) copydir() > > 5) XLogInsert(XLOG_DBASE_CREATE) > > 6) RequestCheckpoint(CHECKPOINT_IMMEDIATE) > > 7) rmtree(src_dbpath) > > 8) XLogInsert(XLOG_DBASE_DROP) > > 9) unlock database > > > > If a basebackup starts while 4) is in progress and continues until 7) > > happens I think a pretty wide race opens: The basebackup can end up with > > a partial copy of the database in the old tablespace because the > > rmtree(old_path) concurrently was in progress. Normally such races are > > fixed during replay. But in this case, the replay of the > > XLOG_DBASE_CREATE will just try to do a rmtree(new); copydiar(old, new);. > > fixing nothing. > > > > Besides making AD .. ST use sane WAL logging, which doesn't seem > > backpatchable, I don't see what could be done against this except > > somehow making basebackups fail if a AD .. ST is in progress. Which > > doesn't look entirely trivial either. > > I basically have two ideas to fix this. > > 1) Make do_pg_start_backup() acquire a SHARE lock on > pg_database. That'll prevent it from starting while a movedb() is > still in progress. Then additionally add pg_backup_in_progress() > function to xlog.c that checks (XLogCtl->Insert.exclusiveBackup || > XLogCtl->Insert.nonExclusiveBackups != 0). Use that in createdb() and > movedb() to error out if a backup is in progress.
Attached is a patch trying to this. Doesn't look too bad and lead me to discover missing recovery conflicts during a AD ST. But: It doesn't actually work on standbys, because lock.c prevents any stronger lock than RowExclusive from being acquired. And we need need a lock that can conflict with WAL replay of DBASE_CREATE, to handle base backups that are executed on the primary. Those obviously can't detect whether any standby is currently doing a base backup... I currently don't have a good idea how to mangle lock.c to allow this. I've played with doing it like in the second patch, but that doesn't actually work because of some asserts around ProcSleep - leading to locks on database objects not working in the startup process (despite already being used). The easiest thing would be to just use a lwlock instead of a heavyweight lock - but those aren't canceleable... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers