> If you take a look at these messages the somewhat unusual condition > that may lead to unexpected behaviour (ie. fast giveup) is that > whilst this is a SAN connection it is achieved through a non- > Leadville config, note the fibre-channel and sd references. In a > Leadville compliant installation this would be the ssd driver, hence > you'd have to investigate the specific semantics and driver tweaks > that this system has applied to sd in this instance.
If only it was possible to use the Leadville drivers... We've seen the same problems here (*instant* panic if the FC switch reboots due to ZFS - I wouldn't mind if it kept on retrying a tad bit longer - preferably configurable). And to panic? How can that in any sane way be good way to "protect" the application? *BANG* - no chance at all for the application to handle the problem... Btw. in our case we have also wrapped the raw FC-attached "disks" with SVM metadevices first because if a disk in a A3500FC units goes bad then we had the _other_ failure mode of ZFS - total hang until I noticed that by wrapping the device with a layer of SVM metadevices insulated ZFS from that problem - now it correctly notices that the disk is "gone/dead" and displays that when doing "zfs status" etc. (We (Lysator ACS - a students computer club) can't use the Leadville driver, since the 'ifp" driver (and hence use the "ssd" disks) for the Qlogic QLA2100 HBA boards is based on an older Qlogic firmware that only supports max 16 LUNs per target and we want more... So we use the Qlogic qla2100 driver instead which works really nicely but then it uses the "sd" disk devices instead. Being a computer club with limited funds means one finds ways to use old hardware in new and interesting ways :-) Hardware in use: Primary file server: Sun Ultra 450, two Qlogic QLA2100 HBAs. One connected via an 8-port FC-AL *hub* to two Sun A5000 JBOD boxes (filled with 9 and 18GB FC disks), the other via a Brocade 2400 8-port switch (running in "QuickLoop" mode) to a Compaq StorageWorks RA8000 RAID and two A3500FC systems. Now... What can *possibly* go wrong with that setup? :-) I'll tell you a couple: 1. When the server entered multiuser and started serving NFS to all the users $HOME - many many disks in the A5000 started resetting themself again and again and again... Solution: Tune down the maximum number of tagged commands that was sent to the disks in /kernel/drv/qla2100.conf: hba1-max-iocb-allocation=7; # was 256 hba1-execution-throttle=7; # was 31 (This problem wasn't there with the old Sun "ifp" driver, probably because it has less agressive limits - but since that driver is totally nonconfigurable it's impossible to tell). 2. The power cord got slightly lose to the Brocade switch causing it to reboot causing the server into an *Instant PANIC thanks to ZFS* This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss