On 10/4/2012 1:56 PM, Jim Klimov wrote:
What if the backup host is down (i.e. the ex-master after the failover)? Will your failed-over pool accept no writes until both storage machines are working? What if internetworking between these two heads has a glitch, and as a result both of them become masters of their private copies (mirror halves), and perhaps both even manage to accept writes from clients? This is the clustering part, which involves "fencing" around the node which is considered dead, perhaps including a hardware reset request just to make sure it's dead, before taking over resources it used to master (STONITH - Shoot The Other Node In The Head). In particular, clusters suggest that for hearbeats so as to make sure both machines work indeed, you use at least two separate wires (i.e. serial and LAN) without active hardware (switches) in-between, separate from data networking.
this all makes a lot of sense. didn't mean to imply there are no failure modes that can take you down entirely. i was aware of the split-brain issue. i was not sure what richard was getting at...
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss