On Sep 1, 2009, at 1:28 PM, Jason wrote:
I guess I should come at it from the other side:
If you have 1 iscsi target box and it goes down, you're dead in the
water.
Yep.
If you have 2 iscsi target boxes that replicate and one dies, you
are OK but you then have to have a 2:1 total storage to usable ratio
(excluding expensive shared disks).
Servers cost more than storage, especially when you consider power.
If you have 2 tiers where you have n cheap back-end iSCSI targets
that have the physical disks in them and present them to 2 clustered
virtual iSCSI target servers (assuming this can be done with disks
over iSCSI) that are presenting the iSCSI targets to the VMware
hosts, then any one server could go down but everything would keep
running. It would create a virtual clustered pair that is basically
doing RAID over the network (iSCSI). Since you already have the
VMware hosts, the 2 virtual ones are "free". None of the back-end
servers would need redundant components because any one can fail, so
you should be able to build them with inexpensive parts.
This will certainly work. But it is, IMHO, too complicated to be
effective
at producing high availability services. Too many parts means too many
opportunities for failure (yes, even VMWare fails). The problem with
your
approach is that you seem to only be considering failures of the type
"its broke, so it is completely dead." Those aren't the kind of
failures that
dominate real life.
When we design highly available systems for the datacenter, we spend
a lot of time on rapid recovery. We know things will break, so we try
to build
systems and processes that can recover as quickly as possible. This
leads
to the observation that reliability trumps redundancy -- though we build
fast recovery systems, it is better to not need to recover. Hence we
developed
dependability benchmarks which expose the cost/dependability trade-offs.
More reliable parts tend to cost more, but the best approach is to have
fewer reliable parts rather than more unreliable parts.
This would also allow you to add/replace storage easily (I hope).
Perhaps you'd have to RAIDZ the backend disks together and then
present them to the front-end which would RAIDZ all the back-ends
together. For example, if you had 5 backend boxes with 8 drives
each you'd have a 10:7 ratio. I'm sure the RAID combinations could
be played with to get the balance of redundancy and capacity that
you need. I don't know what kind of performance hit you would take
doing that over iSCSI but I thought it might work as long as you
have gigabit speeds. Or I could be completely off my rocker. :) Am I?
Don't worry about bandwidth. It is the latency that will kill
performance.
Adding more stuff between your CPU and the media means increasing
latency.
-- richard
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss