[zfs-discuss] Re: zfs send/receive incremental

Starfox Sun, 03 Jun 2007 11:16:55 -0700

Yes, I've read through tons of blogs.sun.com/* entries, went through the 
mailing list looking for the proper way to do it, etc.  Unfortunately, zfs 
send/recv remains a hack that requires an elaborate script wrapper precisely 
because zfs send/recv is by nature a send-only/recv-only operation (which was 
renamed from backup/restore a while ago).  It's fine if you are back up the 
incrementals to tape and restore it at a later date, but lacking if you have 
two ZFS that are able to communicate.


So on the high-end, you have something like:
ZFS-exported-shareiscsi/nfs-put-in-fault-tolerant-ZFS-pool
This basically puts one big file on the exporting ZFS, which is required to be 
at least as large as the smallest pool device of the mirror/raidx pool.  So if 
your local devices are xGB in size, you probably need a x*1.5GB remote device 
in order for the exported flat file to allow the remote ZFS to do what it needs 
to do (checksum, etc.)

AVS-over-network-to-fault-tolerant-ZFS-pool
>From what I can tell (and I'm having problems loading the Flash movie for the 
>demos), AVS basically sits between ZFS and the pool device and monitors any 
>block commands, saves that, and sends it to remote.  If you have identical 
>setups, it works fine, but it will not work if you don't have an idetical 
>setup of x device doing y mirror/raidx on z pool, because it just sends block 
>commands.  So I can't have a near-line backup of a pool unless I mirror the 
>pool setup even with II, so nothing like raidz for pool-a, mirror for pool-b, 
>although both pool-a and pool-b might be an identical size from the 
>perspective of zfs.

And on the "low" end, you have:
Ghetto-lofiadm-NFS-mirror-and-let-ZFS-bitch-at-you
This was off one of the blogs, where the author (you) basically exported a file 
via NFS, used lofiadm to create a "device", and added that to a pool as a 
mirror.  When you needed a "consistent" state, you just connected the device 
and let it resilver, and when it was done you disconnect it.  And after this 
you tried using iscsi at a later date.
The issue with this setup is that it a) requires slicing off a portion of the 
fs tree where you actually want a mirrored pool, which basically means ZFS 
won't be able to use write cache (goes against let-ZFS-manage-whole-devices 
philosophy) b) still needs a lot more disk space than the pool "device" size 
(same problem as iscsi-device-pool) and c) cannot use it where raidx is 
involved, because the "remote" device is liable to be disconnected at any time.

Nonexistent-export-a-whole-device-over-network
This I could not find.  Basically let the drive sit on one machine, and let it 
be used as a pool device on another machine.  Basically solves issue #b of the 
lofiadm setup because it doesn't have the overhead of the underlying ZFS, but 
still runs into issue #a and #c.

Script-a-mirror:
I've seen couple of different ways of doing this.  One is yours, another is 
ZetaBack, and another is zfs-auto-snapshot.  This is why I asked all these 
weird questions, because it seems that in this point-in-time this is the only 
way of doing it on a per-filesystem basis.  But being that (as I said earlier) 
send is send-only and recv is recv-only, ZFS will just happily bitch at you if 
you recv the wrong source incremental, even though both the send and recv might 
have a common snapshot that they could work off of, it will never know because 
it won't communicate with each other.

Ideal setup:
Ideally, this is what I want to have.  Put couple of large HD into a server, 
let zfs admin the entire device (so it can manage write-cache).  Create a 
Mirror/RAIDx/Hot-spare pool with as much as your budget allows (and in my case, 
very little - probably 2x160G or so mirror), and create file systems as needed.

Put another large HD into another machine, and connect it via a dedicated 
network segment (which is a wise thing to do with any SAN stuff).  Create a 
Mirror/RAIDx/Hot-spare pool with whatever is left in your budget (in my case, 
I'll just take a 60G hard drive from another machine).

Now, from my perspective, I have no need/desire to let my 60G near-line the 
entire content of the 160G.  And if I ended up lofiadm/iscsi'ing the 60GB and 
create a pool with the 160G, I'll end up wasting 120GB off the 160s unless I 
slice off that 40GB "near-line mirror" and lose write cache by doing so 
(correct me if I'm wrong here).

I don't need fail-over or anything.  All I want is for what I consider 
important (ie, Documents, settings, etc.) off a portion of a pool to be 
replicated onto another machine, so if a catastrophic failure happens where I 
lose both mirrors due to PS frying or ZFS bit-rotting an important document 
during a save, I have access to a "recent" copy of a file on another machine.

There was a discussion on this forum recently (when I searched) that said that 
doing a ghetto-mirror is a lot more "easier" than setting up a script-a-mirror, 
and it is.  Since ZFS can see the content of all the devices in the pool, it 
can take whatever steps are necessary to get it to a consistent and up-to-date.

Now the point was raised that doing a ghetto-mirror was not recommended because 
ZFS had no way to insure that NFS/NIC/target machine didn't corrupt the stream 
mid-way.  Which is why they recommended doing ZFS-backed iSCSI/NFS share to be 
used as a ZFS device on another machine.  For me, I just don't see the 
advantage in this.  If NFS bit-flips something, then the ZFS store just wrote a 
bit-flipped stream of ZFS raw data, which won't help it one bit when it comes 
times to read it.  The ZFS mirror will just throw a checksum error and ignore 
that block.  Doing the export-a-device seems no different from the results 
point of view (it'll still throw a checksum error), albeit with a lot more 
"potential" paths of error than having the device local, _if_ export-a-device 
can even be actually done.

So what's preventing ZFS from saying "the user wants to mirror xyz filesystem 
on another ZFS with matching ZFS version."  Is that truly that different from a 
device mirror that it can't track changes done to that file system (not pool!), 
be it a modification, snapshot creation, etc. that  it can't send those changes 
over-the-wire, be it another ZFS pool on the same machine or on another machine 
every so often?

-- Starfox
 
 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: zfs send/receive incremental

Reply via email to