Re: [zfs-discuss] [ha-clusters-discuss] (ZFS) file corruption with HAStoragePlus

Victor Latushkin Mon, 27 Oct 2008 08:45:58 -0700

Armin Ollig пишет:
> Hi Venku and all others,
> 
> thanks for your suggestions. I wrote a script to do some IO from both
> hosts (in non-cluster-mode) to the FC-LUNs in questions and check the
> md5sums of all files afterwards. As expected there was no corruption.
> 
> 
> After recreating the cluster-resource and a few failovers i found the
> HASP resource in this state, with the vb1 zfs concurrently mounted on
> *both* nodes:


Have you imported your pool manually on on eof the hosts? Do you have 
file /etc/zfs/zpool.cache on these boxes? If yes, could you please 
provide it?

> # clresource status vb1-storage
> === Cluster Resources ===
> Resource Name      Node Name      State         Status Message
> -------------      ---------      -----         --------------
> vb1-storage        siegfried      Offline       Offline
>                         voelsung       Starting      Unknown - Starting
> 
> 
>  siegfried# zpool status
>   pool: vb1
>  state: ONLINE
>  scrub: none requested
> config:
>         NAME                                       STATE     READ WRITE CKSUM
>         vb1                                        ONLINE       0     0     0
>           c4t600D0230000000000088824BC4228807d0s0  ONLINE       0     0     0
> 
> errors: No known data errors
> voelsung# zpool status                                                        
>                                                   
>   pool: vb1
>  state: ONLINE
>  scrub: none requested
> config:
>         NAME                                       STATE     READ WRITE CKSUM
>         vb1                                        ONLINE       0     0     0
>           c4t600D0230000000000088824BC4228807d0s0  ONLINE       0     0     
> 0[/i]
> 
> 
> In this state filesystem-corruption can occur easily. 

Yes, this is bad and can lead to corruption.

> The zpool was created using the cluster-wide did device:
> zpool create vb1 /dev/did/dsk/d12s0

But this differs from above status reported by 'zpool status' - it shows 
pool is made from device c4t600D0230000000000088824BC4228807d0s0, and 
you suggest that it was created from /dev/did/dsk/d12s0

victor

> There was no fc path failure to the LUNs, both interconnects are normal.
> After some some minutes in this state a kernel panic is triggered and both 
> nodes reboot.
> 
> Oct 27 16:09:10 voelsung Cluster.RGM.fed: [ID 922870 daemon.error] tag 
> vb1.vb1-storage.10: unable to kill process with SIGKILL
> Oct 27 16:09:10 voelsung Cluster.RGM.rgmd: [ID 904914 daemon.error] fatal: 
> Aborting this node because method <hastorageplus_prenet_start> on resource 
> <vb1-storage> for node <voelsung> is unkillable

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] [ha-clusters-discuss] (ZFS) file corruption with HAStoragePlus

Reply via email to