Re: [zfs-discuss] I/O freeze after a disk failure

Mark Ashley Tue, 04 Sep 2007 14:02:23 -0700

I'm going to go out on a limb here and say you have an A5000 with the 
1.6" disks in it. Because of their design, (all drives seeing each other 
on both the A and B loops), it's possible for one disk that is behaving 
badly to take over the FC-AL loop and require human intervention. You 
can physically go up to the A5000 and remove the faulty drive if your 
volume manager software (SVM, VxVM, ZFS, etc) can still run without the 
drive.


In the above case the WWN (ending in 81b9f) is printed on the label so 
it's easy to locate the faulty drive. Keep in mind sometimes the /next/ 
functioning drive in the loop can be the reporting one sometimes. It's 
just a quirk of that storage unit.

These days devices will usually have an individual internal FC-AL loop 
to each drive to alleviate this sort of problem.

Cheers,
Mark.

> Hi all,
>
> yesterday we had a drive failure on a fc-al jbod with 14 drives.
> Suddenly the zpool using that jbod stopped to respond to I/O requests and we 
> get tons of the following messages on /var/adm/messages:
>
> Sep  3 15:20:10 fb2 scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/[EMAIL 
> PROTECTED] (sd52):
> Sep  3 15:20:10 fb2     SCSI transport failed: reason 'timeout': giving up
>
> "cfgadm -al" or "devfsadm -C" didn't solve the problem.
> After a reboot  ZFS recognized the drive as failed and all worked well.
>
> Do we need to restart Solaris after a drive failure??
>
> Gino
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] I/O freeze after a disk failure

Reply via email to