Re: [zfs-discuss] x4500 vs AVS ?

Jim Dunham Sun, 07 Sep 2008 17:59:11 -0700

Ralf,

> [EMAIL PROTECTED] wrote:
>
>>      War wounds?  Could you please expand on the why a bit more?
>
> - ZFS is not aware of AVS. On the secondary node, you'll always have  
> to
> force the `zfs import` due to the unnoticed changes of metadata (zpool
> in use).


This is not true. If on the primary node invokes "zpool export" while  
replication is still active, then a forced "zpool import" is not  
required. This behavior is the same as with a zpool on dual-ported or  
SAN storage, and is NOT specific to AVS.

> No mechanism to prevent data loss exists, e.g. zpools can be
> imported when the replicator is *not* in logging mode.

This behavior is the same as with a zpool on dual-ported or SAN  
storage, and is NOT specific to AVS.

> - AVS is not ZFS aware.

AVS is not UFS, QFS, Oracle, Sybase aware either. This makes AVS, and  
other host based and controller based replication services multi- 
functional. If you desire ZFS aware functionality, use ZFS send and  
recv.

> For instance, if ZFS resilves a mirrored disk,
> e.g. after replacing a drive, the complete disk is sent over the  
> network
> to the secondary node, even though the replicated data on the  
> secondary
> is intact.

The complete disk IS NOT sent of the over the network to the secondary  
node, only those disk blocks that re-written by ZFS. This has to be  
this way, since ZFS does not differentiate between writes caused by re- 
silvering, and writes caused my new ZFS filesystem operations.  
Furthermore, only those portions of the ZFS storage pool are  
replicated in this scenario, not every block in the entire storage pool.

> That's a lot of fun with today's disk sizes of 750 GB and 1 TB drives,
> resulting in usually 10+ hours without real redundancy (customers who
> use Thumpers to store important data usually don't have the budget to
> connect their data centers with 10 Gbit/s, so expect 10+ hours *per  
> disk*).

If once creates a ZFS Storage pool whose size is 1 TB, then enables  
AVS after the fact, AVS can not differentiate between blocks that are  
in use by ZFS from those that are not, therefore AVS needs to  
replicate then entire TB of storage.

If one enables AVS first, before the volumes are places in a ZFS  
storage pool, then the "sndradm -E ...", option can be used. Then when  
the ZFS storage pool is created, only those I/Os need to initial the  
pool need be replicated.

If one has a ZFS storage pool that is quite large, but in actuality  
there is little of the storage pool in use, by enabling SNDR first on  
a placement volume, then invoking "zpool replace ..." on multiple  
'vdevs' in the storage pool, and optimal replication of the ZFS  
storage pool can be done.
>
>
> - ZFS & AVS & X4500 leads to a bad error handling. The Zpool may not  
> be
> imported on the secondary node during the replication.

This behavior is the same as with a zpool on dual-ported or SAN  
storage, and is NOT specific to AVS.

> The X4500 does
> not have a RAID controller which signals (and handles) drive faults.
> Drive failures on the secondary node may happen unnoticed until the
> primary nodes goes down and you want to import the zpool on the
> secondary node with the broken drive. Since ZFS doesn't offer a  
> recovery
> mechanism like fsck, data loss of up to 20 TB may occur.
> If you use AVS with ZFS, make sure that you have a storage which  
> handles
> drive failures without OS interaction.
>
> - 5 hours for scrubbing a 1 TB drive. If you're lucky. Up to 48 drives
> in total.
>
> - An X4500 has no battery buffered write cache. ZFS uses the server's
> RAM as a cache, 15 GB+. I don't want to find out how much time a
> resilver over the network after a power outage may take (a full  
> reverse
> replication would take up to 2 weeks and is no valid option in a  
> serious
> production environment). But the underlying question I asked myself is
> why I should I want to replicate data in such an expensive way, when I
> think the 48 TB data itself are not important enough to be protected  
> by
> a battery?

I don't understand the relevance to AVS in the prior three paragraphs?

> - I gave AVS a set of 6 drives just for the bitmaps (using SVM soft
> partitions). Weren't enough, the replication was still very slow,
> probably because of an insane amount of head movements, and scales
> badly. Putting the bitmap of a drive on the drive itself (if I  
> remember
> correctly, this is recommended in one of the most referenced howto  
> blog
> articles) is a bad idea. Always use ZFS on whole disks, if performance
> and caching matters to you.

When you have the time, can you replace the "probably because of ... "  
with some real performance numbers?

> - AVS seems to require an additional shared storage when building
> failover clusters with 48 TB of internal storage. That may be hard to
> explain to the customer. But I'm not 100% sure about this, because I
> just didn't find a way, I didn't ask on a mailing list for help.

When you have them time, can you replace the "AVS seems to ... " with  
some specific references to what you are referring to?


> If you want a fail-over solution for important data, use the external
> JBODs. Use AVS only to mirror complete clusters, don't use it to
> replicate single boxes with local drives. And, in case OpenSolaris is
> not an option for you due to your company policies or support  
> contracts,
> building a real cluster also A LOT cheaper.

You are offering up these position statements based on what?


> -- 
>
> Ralf Ramge
> Senior Solaris Administrator, SCNA, SCSA
>
> Tel. +49-721-91374-3963
> [EMAIL PROTECTED] - http://web.de/
>
> 1&1 Internet AG
> Brauerstraße 48
> 76135 Karlsruhe
>
> Amtsgericht Montabaur HRB 6484
>
> Vorstand: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Thomas
> Gottschlich, Matthias Greve, Robert Hoffmann, Markus Huhn, Oliver  
> Mauss,
> Achim Weiss
> Aufsichtsratsvorsitzender: Michael Scheeren
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Jim Dunham
Engineering Manager
Storage Platform Software Group
Sun Microsystems, Inc.

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] x4500 vs AVS ?

Reply via email to