[zfs-discuss] Re: ZFS related kernel panic

Peter Eriksson Mon, 04 Dec 2006 14:59:23 -0800

> If you take a look at these messages the somewhat unusual condition 
> that may lead to unexpected behaviour (ie. fast giveup) is that 
> whilst this is a SAN connection it is achieved through a non- 
> Leadville config, note the fibre-channel and sd references. In a 
> Leadville compliant installation this would be the ssd driver, hence 
> you'd have to investigate the specific semantics and driver tweaks 
> that this system has applied to sd in this instance.


If only it was possible to use the Leadville drivers... We've seen the same 
problems here (*instant* panic if the FC switch reboots due to ZFS - I wouldn't 
mind if it kept on retrying a tad bit longer - preferably configurable). And to 
panic? How can that in any sane way be good way to "protect" the application?
*BANG* - no chance at all for the application to handle the problem...


Btw. in our case we have also wrapped the raw FC-attached "disks" with SVM 
metadevices first because if a disk in a A3500FC units goes bad then we had the 
_other_ failure mode of ZFS - total hang until I noticed that by wrapping the 
device with a layer of SVM metadevices insulated ZFS from that problem - now it 
correctly notices that the disk is "gone/dead" and displays that when doing 
"zfs status" etc.


(We (Lysator ACS - a students computer club) can't use the Leadville driver, 
since the 'ifp" driver (and hence use the "ssd" disks) for the Qlogic QLA2100 
HBA boards is based on an older Qlogic firmware that only supports max 16 LUNs 
per target and we want more... So we use the Qlogic qla2100 driver instead 
which works really nicely but then it uses the "sd" disk devices instead. 

Being a computer club with limited funds means one finds ways to use old 
hardware in new and interesting ways :-)

Hardware in use: Primary file server: Sun Ultra 450, two Qlogic QLA2100 HBAs. 
One connected via an 8-port FC-AL *hub* to two Sun A5000 JBOD boxes (filled 
with 9 and 18GB FC disks), the other via a Brocade 2400 8-port switch (running 
in "QuickLoop" mode) to a Compaq StorageWorks RA8000 RAID and two A3500FC 
systems. 

Now... What can *possibly* go wrong with that setup? :-)

I'll tell you a couple:

1. When the server entered multiuser and started serving NFS to all the users 
$HOME - many many disks in the A5000 started resetting themself again and again 
and again... Solution: Tune down the maximum number of tagged commands that was 
sent to the disks in /kernel/drv/qla2100.conf: 
   hba1-max-iocb-allocation=7; # was 256
   hba1-execution-throttle=7; # was 31
(This problem wasn't there with the old Sun "ifp" driver, probably because it
has less agressive limits - but since that driver is totally nonconfigurable 
it's impossible to tell).

2. The power cord got slightly lose to the Brocade switch causing it to reboot 
causing the server into an *Instant PANIC thanks to ZFS*
 
 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: ZFS related kernel panic

Reply via email to