Re: [HACKERS] basebackups during ALTER DATABASE ... SET TABLESPACE ... not safe?

Andres Freund Mon, 26 Jan 2015 13:04:41 -0800

Hi,

On 2015-01-22 19:56:07 +0100, Andres Freund wrote:
> Hi,
> 
> On 2015-01-20 16:28:19 +0100, Andres Freund wrote:
> > I'm analyzing a problem in which a customer had a pg_basebackup (from
> > standby) created 9.2 cluster that failed with "WAL contains references to
> > invalid pages". The failed record was a "xlog redo visible"
> > i.e. XLOG_HEAP2_VISIBLE.
> >
> > First I thought there might be another bug along the line of
> > 17fa4c321cc. Looking at the code and the WAL that didn't seem to be the
> > case (man, I miss pg_xlogdump). Other, slightly older, standbys, didn't
> > seem to have any problems.
> >
> > Logs show that a ALTER DATABASE ... SET TABLESPACE ... was running when
> > the basebackup was started and finished *before* pg_basebackup finished.
> >
> > movedb() basically works in these steps:
> > 1) lock out users of the database
> > 2) RequestCheckpoint(IMMEDIATE|WAIT)
> > 3) DropDatabaseBuffers()
> > 4) copydir()
> > 5) XLogInsert(XLOG_DBASE_CREATE)
> > 6) RequestCheckpoint(CHECKPOINT_IMMEDIATE)
> > 7) rmtree(src_dbpath)
> > 8) XLogInsert(XLOG_DBASE_DROP)
> > 9) unlock database
> >
> > If a basebackup starts while 4) is in progress and continues until 7)
> > happens I think a pretty wide race opens: The basebackup can end up with
> > a partial copy of the database in the old tablespace because the
> > rmtree(old_path) concurrently was in progress.  Normally such races are
> > fixed during replay. But in this case, the replay of the
> > XLOG_DBASE_CREATE will just try to do a rmtree(new); copydiar(old, new);.
> > fixing nothing.
> >
> > Besides making AD .. ST use sane WAL logging, which doesn't seem
> > backpatchable, I don't see what could be done against this except
> > somehow making basebackups fail if a AD .. ST is in progress. Which
> > doesn't look entirely trivial either.
> 
> I basically have two ideas to fix this.
> 
> 1) Make do_pg_start_backup() acquire a SHARE lock on
>    pg_database. That'll prevent it from starting while a movedb() is
>    still in progress. Then additionally add pg_backup_in_progress()
>    function to xlog.c that checks (XLogCtl->Insert.exclusiveBackup ||
>    XLogCtl->Insert.nonExclusiveBackups != 0). Use that in createdb() and
>    movedb() to error out if a backup is in progress.


Attached is a patch trying to this. Doesn't look too bad and lead me to
discover missing recovery conflicts during a AD ST.

But: It doesn't actually work on standbys, because lock.c prevents any
stronger lock than RowExclusive from being acquired. And we need need a
lock that can conflict with WAL replay of DBASE_CREATE, to handle base
backups that are executed on the primary. Those obviously can't detect
whether any standby is currently doing a base backup...

I currently don't have a good idea how to mangle lock.c to allow
this. I've played with doing it like in the second patch, but that
doesn't actually work because of some asserts around ProcSleep - leading
to locks on database objects not working in the startup process (despite
already being used).

The easiest thing would be to just use a lwlock instead of a heavyweight
lock - but those aren't canceleable...

Greetings,

Andres Freund

-- 
 Andres Freund                     http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] basebackups during ALTER DATABASE ... SET TABLESPACE ... not safe?

Reply via email to