Re: [zfs-discuss] more ZFS recovery

Cromar Scott Tue, 12 Aug 2008 14:32:55 -0700

Richard Elling <[EMAIL PROTECTED]>
Cromar Scott wrote:
> Chris Siebenmann <[EMAIL PROTECTED]>
>
>  I'm not Anton Rang, but:
> | How would you describe the difference between the data recovery
> | utility and ZFS's normal data recovery process?
>
> cks> The data recovery utility should not panic 
> cks> my entire system if it runs into some situation 
> cks> that it utterly cannot handle. Solaris 10 U5 
> cks> kernel ZFS code does not have this property; 
> cks> it is possible to wind up with ZFS pools that 
> cks> will panic your system when you try to touch them.
> ...
>
> I'll go you one worse.  Imagine a Sun Cluster with several resource
> groups and several zpools.  You blow a proc on one of the servers.  As
a
> result, the metadata on one of the pools becomes corrupted.
>


re> This failure mode affects all shared-storage 
re> clusters.  I don't see how ZFS should or should 
re> not be any different than raw, UFS, et.al.

Absolutely true.  The file system definitely had a problem.

>
http://mail.opensolaris.org/pipermail/zfs-discuss/2008-April/046951.html
>
> Now, each of the servers in your cluster attempts to import the
> zpool--and panics.
>
> As a result of a singe part failure on a single server, your entire
> cluster (and all the services on it) are sitting in a smoking heap on
> your machine room floor.
>   

re> Yes, but your data is corrupted.  

My data was only corrupted on ONE of the zpools.  In a cluster with
several zpools and several resource groups, we ended up with ALL of the
pools and ALL of the resource groups offline as one node after another
panicked.

re> If you were my bank, then I would greatly 
re> appreciate you getting the data corrected 
re> prior to bringing my account online.  

Fair enough, but do we have to take Fred's and Joe's accounts offline
too?  

re> If you study highly available clusters and services
re> then you will see many cases where human interaction 
re> is preferred to automation for just such cases. 

I see your point about requiring intervention to deal with a potentially
corrupt file system.

I would have preferred a behavior more like we get with VxVM and VxFS,
where the corrupted file system fails to mount without human
intervention, but the nodes don't panic on the failed vxdg import.  That
particular service group and that particular file system are offline,
but everything else keeps running because none of the other nodes
panics.

We handled the issue of not corrupting the file system further by
panicking the original node, but I don't understand why we need to panic
each other successive node in the cluster.  Why can't we just refuse to
import automatically?

> I'm just glad that our pool corruption experience happened during
> testing, and not after the system had gone into production.  Not
exactly
> a resume-enhancing experience.

re> I'm glad you found this in testing.  

I'm a believer.  Some people wanted us to just throw the box into
production, but I insisted on keeping our test schedule.  I'm glad I
did.

re> BTW, what was the root cause?

It appears that the metadata on that pool became corrupted when the
processor failed.  The exact mechanism is a bit of a mystery, since we
didn't get a valid crash dump.

The other pools were fine, once we imported them after a boot -x.

We ended up converting to VxVM and VxFS on that server because we could
not guarantee that the same thing wouldn't just happen again after we
went into production.  

If we had a tool that had allowed us to roll back to a previous snapshot
or something, it might have made a difference.  

We were told that the probability of metadata corruption would have been
reduced but not eliminated by having a mirrored LUN.  We were also told
that the issue will be fixed in U6.

--Scott
 
 
 
 
This message may contain information that is confidential or privileged. 
If you are not the intended recipient, please advise the sender immediately
and delete this message.
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] more ZFS recovery

Reply via email to