On 9/4/07, Gino <[EMAIL PROTECTED]> wrote: > yesterday we had a drive failure on a fc-al jbod with 14 drives. > Suddenly the zpool using that jbod stopped to respond to I/O requests > and we get tons of the following messages on /var/adm/messages:
<snip> > "cfgadm -al" or "devfsadm -C" didn't solve the problem. > After a reboot ZFS recognized the drive as failed and all worked well. > > Do we need to restart Solaris after a drive failure?? I would hope not but ... prior to putting some ZFS volumes into production we did some failure testing. The hardware I was testing with was a couple SF-V245 with 4 x 72 GB disks each. Two disks were setup with SVM/UFS as mirrored OS, the other two were handed to ZFS as a mirrored zpool. I did some large file copies to generate I/O. While a large copy was going on (lots of disk I/O) I pulled one of the drives. If the I/O was to the zpool the system would hang (just like it was hung waiting on an I/O operation). I let it sit this way for over an hour with no recovery. After rebooting it found the existing half of the ZFS mirror just fine. Just to be clear, once I pulled the disk, over about a 5 minute period *all* activity on the box hung. Even a shell just running prstat. If the I/O was to one of the SVM/UFS disks there would be a 60-90 second pause in all activity (just like the ZFS case), but then operation would resume. This is what I am used to seeing for a disk failure. In the ZFS case I could replace the disk and the zpool would resilver automatically. I could also take the removed disk and put it into the second system and have it recognize the zpool (and that it was missing half of a mirror) and the data was all there. In no case did I see any data loss or corruption. I had attributed the system hanging to an interaction between the SAS and ZFS layers, but the previous post makes me question that assumption. As another data point, I have an old Intel box at home I am running x86 on with ZFS. I have a pair of 120 GB PATA disks. OS is on SVM/UFS mirrored partitions and /export home is on a pair of partitions in a zpool (mirror). I had a bad power connector and sometime after booting lost one of the drives. The server kept running fine. Once I got the drive powered back up (while the server was shut down), the SVM mirrors resync'd and the zpool resilvered. The zpool finished substantially before the SVM. In all cases the OS was Solaris 10 U 3 (11/06) with no additional patches. -- Paul Kraus _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss