Tim Haley wrote: > Vincent Fox wrote: > >> Just make SURE the other host is actually truly DEAD! >> >> If for some reason it's simply wedged, or you have lost console access but >> the hostA is still "live", then you can end up with 2 systems having access >> to same ZFS pool. >> >> I have done this in test, 2 hosts accessing same pool, and the result is >> catastrophic pool corruption. >> >> I use the simple method if I think hostA is dead, I call the operators and >> get them to pull the power cords out of it just to be certain. Then I force >> import on hostB with certainty. >> -- >> This message posted from opensolaris.org >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss@opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> > > This is a common cluster scenario, you need to make sure the other node is > dead, so you force that result. In lustre set-ups they recommend a STONITH > (Shoot the Other Node in the Head) approach. They use a combo of a heartbeat > setup like described here: > > http://www.linux-ha.org/Heartbeat > > and then something like the powerman framework to 'kill' the offline node. > > > Perhaps those things could be made to run on Solaris if they don't already. >
Of course, Solaris Cluster (and the corresponding open source effort: Open HA Cluster) manage cluster membership and data access. We also use SCSI reservations, so that a rogue node cannot even see the data. IMHO, if you do this without reservations, then you are dancing with the devil in the details. -- richard _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss