On Feb 28, 2019, at 8:04 AM, Stephen Frost <sfr...@snowman.net> wrote:
> 
> Greetings,
> 
> * Julie Nishimura (juliez...@hotmail.com) wrote:
>> Hello everybody, I am new to postgresql environment, but trying to get up to 
>> speed.
>> Can you please share your experience on how you can automate refreshment of 
>> dev environment on regular basis (desirably weekly), taking for 
>> consideration some of prod dbs can be very large (like 20+ TB
>> 
>> Any suggestions?
> 
> The approach that I like to recommend is to have your backup/restore
> solution be involved in this refreshing process, so that you're also
> testing that your backup/restore process works correctly.  For dealing
> with larger databases, using a backup tool which has parallel backup,
> parallel restore, and is able to restore just the files which are
> different from the backup can make the restore take much less time (this
> is what the 'delta-restore' option in pgbackrest does, and it was
> specifically written to support exactly this kind of prod->dev periodic
> refresh, though other tools may also support that these days).
> 
> As mentioned elsewhere on this thread, using snapshots can also be a
> good approach though you have to be sure that the snapshot is completely
> atomic across all filesystems that PostgreSQL is using, or you have to
> deal with running pg_start/stop_backup and putting a backup_label into
> place for the restored snapshot and a recovery.conf to provide a way for
> PG to get at any WAL which was generated while the snapshot (or
> snapshots) was being taken.

Very much yes to everything Stephen says. Regularly refreshing nonprod via your 
normal backup/restore process is an efficient way to test your backups, and 
snapshots are a great way to do backups when your data volume is greater than 
your churn between backups. (And at 20+ TB, I hope that's the case for you.)

Reply via email to