Re: [GENERAL] Restore postgresql data directory to tablespace on new host? Or swap tablespaces?

Antman, Jason (CMG-Atlanta) Sat, 15 Feb 2014 10:16:11 -0800

I also asked this question on dba.stackexchange.com, where it received a very 
detailed enumeration of the associated problems from Craig Ringer:
http://dba.stackexchange.com/questions/58896/restore-postgres-data-tablespace-to-new-tablespace-at-new-mount-point/58967?noredirect=1#58967

Perhaps there's a postgres internals expert around, someone intimitely familiar
with pg_xlog/pg_clog/pg_control, who can comment on whether it's possible to
take the on-disk files from a single database in a single tablespace, and make
them usable by a different postgres server, running multiple databases?

Thanks,
Jason

On 02/13/2014 04:41 PM, Antman, Jason (CMG-Atlanta) wrote:

I have a bunch of test/development databases which we currently refresh with
production data as-needed using a NetApp filer's snapshot capabilities - we
have a production slave with its datadir on a filer mount (NFS), and once a
night (via cron) we shutdown the slave, snapshot the filer volume, and then
start the database back up. When we need to do data refresh in a test/dev
environment, we stop postgres there, clone the NetApp snapshot, and use that
clone as the test/dev environment data directory. This is wonderful from a
storage point of view, as the clones only store changed blocks on the filer. We
have a ~1.5TB datadir, but these clones are only a few MB each, because there's
little changed data. (We're running postgres 9.0.13, but if what I'm about to
ask is "more possible" with a newer version, that's a vague possibility)

Up until now, each test/dev environment has had its own postgres server. That's
pretty bad on resources, since they're largely idle most of the time.

Now, we have to spin up somewhere around 100-150 of these environments. The
NetApp is about the only way we can do it, because we simply don't have
150-225TB of disk to spare. It would also be a real pain (and inefficient) to
run 100-150 separate machines, each running a single instance of postgres.

What I'd like to do is take a disk/filer snapshot of a stopped database (i.e.
the actual files on disk, not a pg_dump) on one postgres instance (one physical
server) and restore it on a different one. Ideally the database would have a
different name, but that's flexible. Even more ideally this would all happen
without a restart of the destination postgres instance, but I suppose we can
work around that too.

Is this even possible? Anyone have experience with, essentially, creating a
tablespace that points to an existing data directory? Or, alternatively,
swapping out the data directory of one tablespace with that of another?

Any advice or suggestions would be greatly appreciated.
I apologize if there's something obvious in the documentation that I missed,
but I'm not much of a database guru, and am relatively new to pgsql in general.
Jason Antman

PS - The slightly-less-abstract explanation:

We're spinning up nearly 100 (maybe 150) new test environments. We simply don't
have the ~200T of disk to maintain a separate DB for each of them, so using the
NetApp filer and letting it do thin clones is an absolute requirement. We also
need to be able to do quick restores to the latest daily production snapshot.
Quick as in, the developers and testers run a script in their test environment
that does the restore. We're currently doing this for ~50 environments, and we
run a separate VM with postgres for each one, so it's relatively
straightforward - stop postgres, unmount the datadir, do the filer magic to
refresh it with a clone of this morning's production DB, then re-mount the new
filer volume (clone) and start postgres. Unfortunately, the massive overhead of
running a separate virtual machine with a separate postgres instance for every
DB is more than we can handle when scaling 2-3x. In addition to that, the
resource waste is awful (a dedicated VM running a dedicated postgres instance
for each DB, most of which are idle about 14 hours/day). So, we'd like to run
multiple databases in separate tablespaces, but on the same host and the same
postgres instance, so N databases can use the same shared memory, etc.

The developers and testers who use these instances need to be able to "get
fresh data" as often as need (sometimes 3x/day). Ideally we'd be able to
refresh one of these filer volumes without stopping postgres. But it's also an
acceptable alternative to build out, say, 3x the number of DBs we need, and
refresh all of the not-currently-in-use ones on a schedule every night, during
a maintenance/downtime window.

Click
here<https://www.mailcontrol.com/sr/QwiAGi+yMoHGX2PQPOmvUtXHXnVjuGgHVlw067c1F0ADGY7A52sTGD6hw82BzZbvBmVY+vX5tF1m4uq9tqSB7g==>
to report this email as spam.

Jason Antman | Systems Engineer | CMGdigital
jason.ant...@coxinc.com<mailto:jason.ant...@coxinc.com> | p: 678-645-4155

Re: [GENERAL] Restore postgresql data directory to tablespace on new host? Or swap tablespaces?

Reply via email to