In our data center on CRITICAL systems we plan to survive chains of several 
single-type failures. The HA standards we apply to a mail-server for 30,000 
people are neccessarily quite high.

A fully redundant 2-node failover clustered system can survive failures of half 
or more of it's systems and still operate.  My server nodes are in different 
rows on different power systems, your example wouldn't affect me.  Ditto for 
arrays and SAN switches. If the active node lost power and it's other PSU had a 
dead UPS, the other node in the cluster would take over.  It pulls from a 
different circuit and UPS.
The only interruption we have had on this cluster was heat-related when the AC 
failed and lots of systems had to be shutdown entirely.  And yes this did spark 
discussions about further measures, we are that paranoid.

Anyhow, I stand by my observation that ZFS sparing currently offers no 
mechanism for specifying *what* it's a valid spare for.  It's a spare for the 
entire pool.  I see it as entirely valid criticism that this is a capability 
offered by "traditional array" storage controllers which is missing.   If I had 
a chassis in one building, and my SAN switch linked to another building and a 
second chassis over there, I can't control what happens when a drive fails and 
a spare is called for.  Or at least, I don't see how.

If you want to start an argument about what is proper HA design, we can 
certainly do that in another thread.
 
 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to