Re: [zfs-discuss] I/O freeze after a disk failure

Paul Kraus Wed, 05 Sep 2007 12:28:29 -0700

On 9/4/07, Gino <[EMAIL PROTECTED]> wrote:

> yesterday we had a drive failure on a fc-al jbod with 14 drives.
> Suddenly the zpool using that jbod stopped to respond to I/O requests
> and we get tons of the following messages on /var/adm/messages:


<snip>

> "cfgadm -al" or "devfsadm -C" didn't solve the problem.
> After a reboot  ZFS recognized the drive as failed and all worked well.
>
> Do we need to restart Solaris after a drive failure??

        I would hope not but ... prior to putting some ZFS volumes
into production we did some failure testing. The hardware I was
testing with was a couple SF-V245 with 4 x 72 GB disks each. Two disks
were setup with SVM/UFS as mirrored OS, the other two were handed to
ZFS as a mirrored zpool. I did some large file copies to generate I/O.
While a large copy was going on (lots of disk I/O) I pulled one of the
drives.

        If the I/O was to the zpool the system would hang (just like
it was hung waiting on an I/O operation). I let it sit this way for
over an hour with no recovery. After rebooting it found the existing
half of the ZFS mirror just fine. Just to be clear, once I pulled the
disk, over about a 5 minute period *all* activity on the box hung.
Even a shell just running prstat.

        If the I/O was to one of the SVM/UFS disks there would be a
60-90 second pause in all activity (just like the ZFS case), but then
operation would resume. This is what I am used to seeing for a disk
failure.

        In the ZFS case I could replace the disk and the zpool would
resilver automatically. I could also take the removed disk and put it
into the second system and have it recognize the zpool (and that it
was missing half of a mirror) and the data was all there.

        In no case did I see any data loss or corruption. I had
attributed the system hanging to an interaction between the SAS and
ZFS layers, but the previous post makes me question that assumption.

        As another data point, I have an old Intel box at home I am
running x86 on with ZFS. I have a pair of 120 GB PATA disks. OS is on
SVM/UFS mirrored partitions and /export home is on a pair of
partitions in a zpool (mirror). I had a bad power connector and
sometime after booting lost one of the drives. The server kept running
fine. Once I got the drive powered back up (while the server was shut
down), the SVM mirrors resync'd and the zpool resilvered. The zpool
finished substantially before the SVM.

        In all cases the OS was Solaris 10 U 3 (11/06) with no
additional patches.

-- 
Paul Kraus
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] I/O freeze after a disk failure

Reply via email to