Re: [zfs-discuss] ZFS Features when Using Enterprise Arrays

Wade . Stuart Fri, 03 Aug 2007 13:47:06 -0700



[EMAIL PROTECTED] wrote on 08/03/2007 01:35:04 PM:

> The OP here is posting the "Z"illion dollar question ... And
> apologies in advance for the verbal diarrhea.
>
> Most of the Enterprise Level systems people here (my company) look
> at ZFS and say, "Wow that's really cool...but..."  What comes after
> the "but..." is a host of questions that ultimately come down to how
> much does ZFS cost?
>
> What is the cost of running ZFS RAIDZ on top of a Enterprise Storage
> System that's already RAID 1+0 or RAID 5?
>

I really don't get what you are asking?  VS vxfs/vxvm and svm? then the
additional cost is none (or negative cost vs licensing vxfs/vxvm/snap).


> As a "manager" of systems, how do I justify switching from tried and
> true SVM/UFS or VxFS/VxVM on high speed redundant storage storage?
> The nice features are the never needing to go offline to manage
> storage the server sees, snapshots, clones, etc. don't seem to make
> up for the loss of the ability to repair in the event of a failure.
> We monitor heavily, we can schedule a maintenance window when
> necessary, and we can cope with an outage (however painful) that
> requires an FSCK or even a tape restore...for as often as it happens
> (once in the last 5 years I believe).  Does this mean that my
> environment is too low on the totem pole for ZFS?  I'm pretty sure
> we subscribe to a five 9's uptime SLA.
>

Yet you have no way to know if your uptime includes spewing out invalid
data.


> A gentleman yesterday posted the zpool status below that used SAN
> Devices.  Suppose each device is 100GB of RAID 1+0 storage.
>
>   pool: ms2
>  state: ONLINE
>  scrub: scrub completed with 0 errors on Sun Jul 22 00:47:51 2007
> config:
>         NAME                                       STATE     READ WRITE
CKSUM
>         ms2                                        ONLINE       0     0
0
>           mirror                                   ONLINE       0     0
0
>             c4t600C0FF0000000000A7E0A0E6F8A1000d0  ONLINE       0     0
0
>             c4t600C0FF0000000000A7E8D1EA7178800d0  ONLINE       0     0
0
>           mirror                                   ONLINE       0     0
0
>             c4t600C0FF0000000000A7E0A7219D78100d0  ONLINE       0     0
0
>             c4t600C0FF0000000000A7E8D7B3709D800d0  ONLINE       0     0
0
> errors: No known data errors
>
> This configuration shows 400GB (800GB actual behind the SAN) and my
> usable space is 200GB.  That's 25% in storage capacity alone, and
> I'm sure there are other costs in RAID X over RAID Y that are less
> tangible.  So, is that worth it?

25% vs 25% for vxfs/vxvm and svm in similar configurations.  Striped config
(which is what you are using in vxvm and svm now right?) has no additional
penalty.

>
> Am I supposed to suggest that we go double the capacity in our RAID
> 1+0 CLARiioN so that I can implement ZFS 1+0 and not sacrifice any
> storage capacity?  Am I supposed to suggest that the storage crew
> abandon RAID 1+0 on their devices in order for ZFS to provide fault
> tolerance?  Either way, this makes ZFS a very tough sell.

Well the easy sell is to use ZFS as you use vxfs/vxvm and svm now (stripe)
-- you still gain checksumming data (but not self heal data -- only
metadata), snaps (free vs license), compression, pooling, etc...

> How would
> I historically show that the investment was worth it when ZFS
> probably never sees a checksum error because the Storage System
> hides failures so well?
>

You can't.  Maybe show the last time your emc had a failed disk on a RAID
lun group -- emc went to scrub before replace and showed yet another disk
in the same lun group that was bad and you had to restore from tape or emc
fibbed and replaced the disk with suspect parity data.

> On the other hand, if I were to configure my pool like this...
>
>   pool: ms2
>  state: ONLINE
>  scrub: scrub completed with 0 errors on Sun Jul 22 00:47:51 2007
> config:
>         NAME                                       STATE     READ WRITE
CKSUM
>         ms2                                        ONLINE       0     0
0
>           c4t600C0FF0000000000A7E0A0E6F8A1000d0    ONLINE       0     0
0
>           c4t600C0FF0000000000A7E8D1EA7178800d0    ONLINE       0     0
0
>           c4t600C0FF0000000000A7E0A7219D78100d0    ONLINE       0     0
0
>           c4t600C0FF0000000000A7E8D7B3709D800d0    ONLINE       0     0
0
> errors: No known data errors
>
> I'd have a nice 400GB (800GB actual) pool.  I'd still have my hard
> RAID 1+0, but now a single checksum error on any one LUN would
> render the entire file system unusable.  >

No,  copied from another thread:

To clarify - ditto blocks are used: 3 copies for pool metadata, each
copy on different lun if possible, 2 copies for each file system
metadata with each copy on different lun. This means that file system
meta data corruptions should self-heal in a non-redundand config
(symlinks being an exception now, but there's RFE to fix it).




> There is NO way to replace
> or repair with out destroying the entire pool.

Delete the file and restore -- also you may want to call EMC and ask why
your host is being fed corrupted data without any failures showing on the
EMC. If the checksum error overlaps the zfs metadata ditto blocks and makes
metadata self heal fail then you restore from tape.  If you lose access to
a lun in the stripe you go down -- just like with vxvm and svm.  How is
this not better then vxvm and svm?


> What is the
> likelihood of that happening?  And what would cause such a thing?  I
> have run the Self Healing Demo against a both of the above pool
> configurations, the latter is not pretty.

Depends what opensolaris/solaris bits you are on.  Newer bits handle this
better and should keep you up and heal the metadata if metadata dittos are
available. Either way how the heck does svm and vxvm handle this for you
currently? =)


>
> With Storage Systems providing their own snap/clone facilities (like
> BCVs with EMC) it only gets more difficult as Storage and Server
> teams work largely independent of each other.

Hmm,  in most environments I have seen,  BCVs have been used on the os/app
side after the admin states the machine -- what good are random snaps of a
unknown state?  Sure the storage guys grant them to you (BCV space), but do
you really not own the snap side too?

> I'd really like to
> push ZFS for data storage on all of our new hardware going forward,
> but unless I can justify over ruling the Storage System's RAID 1+0
> or dropping my capacity utilization from 50% to 25%, I haven't got
> much ground to stand on.  Is anyone else paddling in my canoe?

It may help if you don't sabotage your own arguments for ZFS.  Bottom line
is ZFS (even in stripe mode) buys you more than vxvm or svm for less cost
($$ and time).  Try to compare ZFS stripe to vxvm stripe.  ZFS raidz to
vxvm raid. ZFS should come out ahead, except for a few places such as user
quotas and evacuating luns.  Those are coming sometime.



_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Features when Using Enterprise Arrays

Reply via email to