Re: [zfs-discuss] [ha-clusters-discuss] (ZFS) file corruption with HAStoragePlus

Armin Ollig Mon, 27 Oct 2008 08:24:10 -0700

Hi Venku and all others,

 thanks for your suggestions. 
I wrote a script to do some IO from both hosts (in non-cluster-mode) to the 
FC-LUNs in questions and check the md5sums of all files afterwards. As expected 
there was no corruption.


After recreating the cluster-resource and a few failovers i found the HASP 
resource in this state, with the vb1 zfs concurrently mounted on *both* nodes:

# clresource status vb1-storage
=== Cluster Resources ===
Resource Name      Node Name      State         Status Message
-------------      ---------      -----         --------------
vb1-storage        siegfried      Offline       Offline
                        voelsung       Starting      Unknown - Starting


 siegfried# zpool status
  pool: vb1
 state: ONLINE
 scrub: none requested
config:
        NAME                                       STATE     READ WRITE CKSUM
        vb1                                        ONLINE       0     0     0
          c4t600D0230000000000088824BC4228807d0s0  ONLINE       0     0     0

errors: No known data errors
voelsung# zpool status                                                          
                                                
  pool: vb1
 state: ONLINE
 scrub: none requested
config:
        NAME                                       STATE     READ WRITE CKSUM
        vb1                                        ONLINE       0     0     0
          c4t600D0230000000000088824BC4228807d0s0  ONLINE       0     0     
0[/i]


In this state filesystem-corruption can occur easily. 
The zpool was created using the cluster-wide did device:
zpool create vb1 /dev/did/dsk/d12s0         

There was no fc path failure to the LUNs, both interconnects are normal.
After some some minutes in this state a kernel panic is triggered and both 
nodes reboot.

Oct 27 16:09:10 voelsung Cluster.RGM.fed: [ID 922870 daemon.error] tag 
vb1.vb1-storage.10: unable to kill process with SIGKILL
Oct 27 16:09:10 voelsung Cluster.RGM.rgmd: [ID 904914 daemon.error] fatal: 
Aborting this node because method <hastorageplus_prenet_start> on resource 
<vb1-storage> for node <voelsung> is unkillable
 

Best wishes,
 Armin
--
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] [ha-clusters-discuss] (ZFS) file corruption with HAStoragePlus

Reply via email to