Re: [zfs-discuss] ZFS on Hitachi SAN, pool recovery

Richard Elling Tue, 23 Sep 2008 11:52:31 -0700

Tim Haley wrote:
> Vincent Fox wrote:
>   
>> Just make SURE the other host is actually truly DEAD!
>>
>> If for some reason it's simply wedged, or you have lost console access but 
>> the hostA is still "live", then you can end up with 2 systems having access 
>> to same ZFS pool.
>>
>> I have done this in test, 2 hosts accessing same pool, and the result is 
>> catastrophic pool corruption.
>>
>> I use the simple method if I think hostA is dead, I call the operators and 
>> get them to pull the power cords out of it just to be certain.  Then I force 
>> import on hostB with certainty.
>> --
>> This message posted from opensolaris.org
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss@opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>     
>
> This is a common cluster scenario, you need to make sure the other node is 
> dead, so you force that result.  In lustre set-ups they recommend a STONITH 
> (Shoot the Other Node in the Head) approach.  They use a combo of a heartbeat 
> setup like described here:
>
> http://www.linux-ha.org/Heartbeat
>
> and then something like the powerman framework to 'kill' the offline node.
>
>   
> Perhaps those things could be made to run on Solaris if they don't already.
>


Of course, Solaris Cluster (and the corresponding open source effort:
Open HA Cluster) manage cluster membership and data access.  We
also use SCSI reservations, so that a rogue node cannot even see the
data.  IMHO, if you do this without reservations, then you are dancing
with the devil in the details.
  -- richard


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS on Hitachi SAN, pool recovery

Reply via email to