On Feb 28, 2019, at 8:04 AM, Stephen Frost <sfr...@snowman.net> wrote: > > Greetings, > > * Julie Nishimura (juliez...@hotmail.com) wrote: >> Hello everybody, I am new to postgresql environment, but trying to get up to >> speed. >> Can you please share your experience on how you can automate refreshment of >> dev environment on regular basis (desirably weekly), taking for >> consideration some of prod dbs can be very large (like 20+ TB >> >> Any suggestions? > > The approach that I like to recommend is to have your backup/restore > solution be involved in this refreshing process, so that you're also > testing that your backup/restore process works correctly. For dealing > with larger databases, using a backup tool which has parallel backup, > parallel restore, and is able to restore just the files which are > different from the backup can make the restore take much less time (this > is what the 'delta-restore' option in pgbackrest does, and it was > specifically written to support exactly this kind of prod->dev periodic > refresh, though other tools may also support that these days). > > As mentioned elsewhere on this thread, using snapshots can also be a > good approach though you have to be sure that the snapshot is completely > atomic across all filesystems that PostgreSQL is using, or you have to > deal with running pg_start/stop_backup and putting a backup_label into > place for the restored snapshot and a recovery.conf to provide a way for > PG to get at any WAL which was generated while the snapshot (or > snapshots) was being taken.
Very much yes to everything Stephen says. Regularly refreshing nonprod via your normal backup/restore process is an efficient way to test your backups, and snapshots are a great way to do backups when your data volume is greater than your churn between backups. (And at 20+ TB, I hope that's the case for you.)