Re: [zfs-discuss] more ZFS recovery

Claus Guttesen Mon, 11 Aug 2008 15:24:23 -0700

> | How would you describe the difference between the data recovery
> | utility and ZFS's normal data recovery process?
>
>  The data recovery utility should not panic my entire system if it runs
> into some situation that it utterly cannot handle. Solaris 10 U5 kernel
> ZFS code does not have this property; it is possible to wind up with ZFS
> pools that will panic your system when you try to touch them.


I do agree. The last three weeks I have been testing an
arc-1680-sas-controller with an external cabinet with 16 sas-disk at 1
TB. The server is a E5405 quad-core with 8 GB RAM. Setting the card to
jbod-mode gave me a somewhat unstable setup where the disks would stop
responding. After I had put all the disk on the controller in
passthrough-mode the setup did stabilize and I was able to copy 3.8 of
4 TB of small files when some of the disks bailed out. I brought the
disks back online and restarted the server. A zpool status that show
online disks also told me:

errors: Permanent errors have been detected in the following files:
        ef1/image/z018_16:<0x0>

The zpool consisted of five disks in three seperate raidz in one zpool
including one spare.

The only time I've experienced that the server could not get the zpool
back online was when the disks for some reason failed. I find it
completely valid that the server panics rather than write inconsisten
data.

Everytime when our internal file-server suffered a unplanned restart
(power failure) it always recovered (solaris 10/08 and zfs ver. 4).
But this Sunday Aug. the 10'th the same file-server was brought down
by a faulty UPS. When power was restored the zpool had become
inconsistent. This time the storage was also affected by the
power-outage.

Is it a valid point that zfs is able to recover more gracefully when
the server itself goes down rather than when some of the disks/LUN's
bails out? The reason I ask is because that is the only time I've
personally seen zfs unable to recover.

>  The data recovery utility can ask me questions about what I want it
> to do in an ambiguous situation, or give me only partial results.

Our nfs-server was also on this faulty UPS. This is running solaris 9
on sparc with vxfs and is managing 109 TB of storage on a HDS. When I
switched on the server I saw the it replayed the journal and marked
the partition as clean and came online. I know that there is no
guarantee that the data are consistent but at least vxfs have had many
years to mature.

I had initially planned to migrate some of the older partitions to zfs
and thereby test it. But I've changed that and will try the setup with
the arc-1680-controller and sas-disks as an internal file-server
instead for a while and rather add additional storage to our solaris
9-server and vxfs.

Zfs have changed the way I look at filesystems and I'm very glad that
Sun gives it so much exposure. But atm. I'd give vxfs the edge. :-)

-- 
regards
Claus

When lenity and cruelty play for a kingdom,
the gentlest gamester is the soonest winner.

Shakespeare
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] more ZFS recovery

Reply via email to