Richard Elling wrote: > Tim Haley wrote: >> Vincent Fox wrote: >> >>> Just make SURE the other host is actually truly DEAD! >>> >>> If for some reason it's simply wedged, or you have lost console >>> access but the hostA is still "live", then you can end up with 2 >>> systems having access to same ZFS pool. >>> >>> I have done this in test, 2 hosts accessing same pool, and the >>> result is catastrophic pool corruption. >>> >>> I use the simple method if I think hostA is dead, I call the >>> operators and get them to pull the power cords out of it just to be >>> certain. Then I force import on hostB with certainty. >>> -- >>> This message posted from opensolaris.org >>> _______________________________________________ >>> zfs-discuss mailing list >>> zfs-discuss@opensolaris.org >>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >>> >> >> This is a common cluster scenario, you need to make sure the other >> node is dead, so you force that result. In lustre set-ups they >> recommend a STONITH (Shoot the Other Node in the Head) approach. >> They use a combo of a heartbeat setup like described here: >> >> http://www.linux-ha.org/Heartbeat >> >> and then something like the powerman framework to 'kill' the offline >> node. >> >> Perhaps those things could be made to run on Solaris if they don't >> already. >> > > Of course, Solaris Cluster (and the corresponding open source effort: > Open HA Cluster) manage cluster membership and data access. We > also use SCSI reservations, so that a rogue node cannot even see the > data. IMHO, if you do this without reservations, then you are dancing > with the devil in the details.
No sooner had I mentioned this, when the optional fencing project was integrated into Open HA Cluster. So you will be able to dance with the devil, even with Solaris Cluster, if you want. http://blogs.sun.com/sc/ -- richard _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss