Re: Netapp SnapCenter

Stephen Frost Thu, 18 Jun 2020 12:27:24 -0700

Greetings,

* Paul Förster (paul.foers...@gmail.com) wrote:
> > On 18. Jun, 2020, at 16:19, Magnus Hagander <mag...@hagander.net> wrote:
> > I don't know specifically about SnapCenter, but for snapshots in general, 
> > it does require backup mode *unless* all your data is on the same disk and 
> > you have an atomic snapshot across that disk (in theory it can be on 
> > different disk as well, as long as the snapshots in that case are atomic 
> > across *all* those disks, not just individually, but that is unusual).
> 
> according to what I know from our storage guys, Netapp does atomic snapshots 
> for each volume. We have the database and its corresponding WAL files (pg_wal 
> directory) on the same volume and the archived WALs (archive_command) on 
> another. And the snapshots on those two volumes are not taken at the same 
> time. Currently, the database is set to backup mode (using cron) and the 
> storage guys have a window during which they can take the snapshots.


If the entire database, all tablespaces, and pg_wal, are on the same
volume and the snapshot of the volume is atomic, then you don't actually
need to go through the start/stop backup- a snapshot being restored will
look just like a system crash and PG will just go back to the last
checkpoint and replay the WAL that's in pg_wal and it should reach
consistency and come up.

> > So the upthread suggestion of putting data and wal on different disk and 
> > snapshoting them at different times is *NOT* safe. Unless the reference to 
> > the directory for the logs means a directory where log files are copied out 
> > with archive_command, and it's actually the log archive (in which case it 
> > will work, but the recommendation is that the log archive should not be on 
> > the same machine).
> 
> as I said above, pg_wal is a directory in PGDATA at the default location and 
> WALs are archived using the archive_command to a different volume. So I 
> guess, we should be safe then.

Yes, that should be alright.

> > The normal case is that snapshots are guaranteed to be atomic, and thus 
> > this millisecond window cannot appear. But that usually only applies across 
> > individual volumes. It's also worth noticing that taking the backups 
> > without using the backup mode and archive_command means you cannot use them 
> > for PITR, only for restore onto that specific snapshot.
> > 
> > While our documentation on backups in general definitely needs improvement, 
> > this particular requirement is documented at 
> > https://www.postgresql.org/docs/current/backup-file.html.
> 
> since we use backup mode, we should be good with PITR too. I didn't test that 
> myself, but a workmate did and he said it worked nicely.
> 
> So bottom line, SnapCenter means for PostgreSQL just a plain volume snapshot 
> like we currently do with SnapCreator, using backup mode scripts and 
> "archive_command"-ing WALs to a diffent volume, i.e. no change in strategy. 
> It's just that SnapCenter can't control backup mode as it can with Oracle.

The one issue here is that if you're using the deprecated exxclusive
backup API, then PG will create a backup_label file in the data
directory.  If the system reboots while that file exists, there's a good
chance that PG won't start up cleanly since, due to the file existing,
it thinks that it's restoring from a backup when it isn't.

Of course, if you're actually restoring from a backup, then that file is
absolutely critical to have in place, otherwise PG won't realize it's
being restored from a backup and you'll end up with a corrupted database
on restore.

Better is to use the newer non-exclusive API and arrange to collect the
necessary contents of the backup_label file from PG and store that with
the snapshot that you've taken.

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: Netapp SnapCenter

Reply via email to