I also asked this question on dba.stackexchange.com, where it received a very 
detailed enumeration of the associated problems from Craig Ringer:
http://dba.stackexchange.com/questions/58896/restore-postgres-data-tablespace-to-new-tablespace-at-new-mount-point/58967?noredirect=1#58967

Perhaps there's a postgres internals expert around, someone intimitely familiar 
with pg_xlog/pg_clog/pg_control, who can comment on whether it's possible to 
take the on-disk files from a single database in a single tablespace, and make 
them usable by a different postgres server, running multiple databases?

Thanks,
Jason

On 02/13/2014 04:41 PM, Antman, Jason (CMG-Atlanta) wrote:

I have a bunch of test/development databases which we currently refresh with 
production data as-needed using a NetApp filer's snapshot capabilities - we 
have a production slave with its datadir on a filer mount (NFS), and once a 
night (via cron) we shutdown the slave, snapshot the filer volume, and then 
start the database back up. When we need to do data refresh in a test/dev 
environment, we stop postgres there, clone the NetApp snapshot, and use that 
clone as the test/dev environment data directory. This is wonderful from a 
storage point of view, as the clones only store changed blocks on the filer. We 
have a ~1.5TB datadir, but these clones are only a few MB each, because there's 
little changed data. (We're running postgres 9.0.13, but if what I'm about to 
ask is "more possible" with a newer version, that's a vague possibility)

Up until now, each test/dev environment has had its own postgres server. That's 
pretty bad on resources, since they're largely idle most of the time.

Now, we have to spin up somewhere around 100-150 of these environments. The 
NetApp is about the only way we can do it, because we simply don't have 
150-225TB of disk to spare. It would also be a real pain (and inefficient) to 
run 100-150 separate machines, each running a single instance of postgres.

What I'd like to do is take a disk/filer snapshot of a stopped database (i.e. 
the actual files on disk, not a pg_dump) on one postgres instance (one physical 
server) and restore it on a different one. Ideally the database would have a 
different name, but that's flexible. Even more ideally this would all happen 
without a restart of the destination postgres instance, but I suppose we can 
work around that too.

Is this even possible? Anyone have experience with, essentially, creating a 
tablespace that points to an existing data directory? Or, alternatively, 
swapping out the data directory of one tablespace with that of another?

Any advice or suggestions would be greatly appreciated.
I apologize if there's something obvious in the documentation that I missed, 
but I'm not much of a database guru, and am relatively new to pgsql in general.
Jason Antman

PS - The slightly-less-abstract explanation:

We're spinning up nearly 100 (maybe 150) new test environments. We simply don't 
have the ~200T of disk to maintain a separate DB for each of them, so using the 
NetApp filer and letting it do thin clones is an absolute requirement. We also 
need to be able to do quick restores to the latest daily production snapshot. 
Quick as in, the developers and testers run a script in their test environment 
that does the restore. We're currently doing this for ~50 environments, and we 
run a separate VM with postgres for each one, so it's relatively 
straightforward - stop postgres, unmount the datadir, do the filer magic to 
refresh it with a clone of this morning's production DB, then re-mount the new 
filer volume (clone) and start postgres. Unfortunately, the massive overhead of 
running a separate virtual machine with a separate postgres instance for every 
DB is more than we can handle when scaling 2-3x. In addition to that, the 
resource waste is awful (a dedicated VM running a dedicated postgres instance 
for each DB, most of which are idle about 14 hours/day). So, we'd like to run 
multiple databases in separate tablespaces, but on the same host and the same 
postgres instance, so N databases can use the same shared memory, etc.

The developers and testers who use these instances need to be able to "get 
fresh data" as often as need (sometimes 3x/day). Ideally we'd be able to 
refresh one of these filer volumes without stopping postgres. But it's also an 
acceptable alternative to build out, say, 3x the number of DBs we need, and 
refresh all of the not-currently-in-use ones on a schedule every night, during 
a maintenance/downtime window.



Click 
here<https://www.mailcontrol.com/sr/QwiAGi+yMoHGX2PQPOmvUtXHXnVjuGgHVlw067c1F0ADGY7A52sTGD6hw82BzZbvBmVY+vX5tF1m4uq9tqSB7g==>
 to report this email as spam.


--

Jason Antman | Systems Engineer | CMGdigital
jason.ant...@coxinc.com<mailto:jason.ant...@coxinc.com> | p: 678-645-4155

Reply via email to