In our data center on CRITICAL systems we plan to survive chains of several single-type failures. The HA standards we apply to a mail-server for 30,000 people are neccessarily quite high.
A fully redundant 2-node failover clustered system can survive failures of half or more of it's systems and still operate. My server nodes are in different rows on different power systems, your example wouldn't affect me. Ditto for arrays and SAN switches. If the active node lost power and it's other PSU had a dead UPS, the other node in the cluster would take over. It pulls from a different circuit and UPS. The only interruption we have had on this cluster was heat-related when the AC failed and lots of systems had to be shutdown entirely. And yes this did spark discussions about further measures, we are that paranoid. Anyhow, I stand by my observation that ZFS sparing currently offers no mechanism for specifying *what* it's a valid spare for. It's a spare for the entire pool. I see it as entirely valid criticism that this is a capability offered by "traditional array" storage controllers which is missing. If I had a chassis in one building, and my SAN switch linked to another building and a second chassis over there, I can't control what happens when a drive fails and a spare is called for. Or at least, I don't see how. If you want to start an argument about what is proper HA design, we can certainly do that in another thread. This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss