[zfs-discuss] practicality of zfs send/receive for failover

Paul B. Henson Fri, 12 Oct 2007 13:24:55 -0700

We've been evaluating ZFS as a possible enterprise file system for our
campus. Initially, we were considering one large cluster, but it doesn't
look like that will scale to meet our needs. So, now we are thinking about
breaking our storage across multiple servers, probably three.


However, I don't necessarily want to incur the expense and hassle of
maintaining three clusters, but think I might have three standalone servers
instead. If one of them happens to break, we're only down 1/3 Of our files,
not all of them. Given our budget, that's probably an acceptable
compromise.

On the other hand, it would be nice to have some level of redundancy, so
I'm toying with the idea of having each server be primary for some amount
of storage, and secondary for a different set of storage. Each server would
use zfs send to replicate snapshots to its backup server.

I've read a number of threads and blog posts discussing zfs send/receive
and its applicability is such an implementation, but I'm curious if anyone
has actually done something like that in practice, and if so how well it
worked.

What authentication/authorization was used to transfer the zfs snapshots
between servers? I'm thinking about using ssh with public-key
authentication over an internal private network the servers are connected
to with different ethernet interfaces than the ones facing the world and
actually serving files. Does zfs send/receive have to be done with root
privileges, or can RBAC or some other mechanism be used so a lower
privileged account could be used?

In the various threads I read about this type of failover, there was some
issue about marking the filesystems readonly on the slave, or else changes
would cause snapshots to fail? Supposedly there was some feature added to
zfs receive to rectify this problem, did that make it into S10U4, or is
that still only in the development version?

Did you have automatic or manual failover? I'm thinking about having a
manual failover process, if the process were automatic given the
replication is only one way if a failover happened, and the secondary
server started providing service, updates would happen there that would not
be on the primary server if it suddenly came back to life and took over
again.

How did you implement the failover at the network level? DNS change?
Virtual IP address switched from one server to the other?

Thanks much for any feedback...


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  [EMAIL PROTECTED]
California State Polytechnic University  |  Pomona CA 91768
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] practicality of zfs send/receive for failover

Reply via email to