Peter Eriksson wrote:
If you take a look at these messages the somewhat unusual condition that may lead to unexpected behaviour (ie. fast giveup) is that whilst
this is a SAN connection it is achieved through a non- Leadville
config, note the fibre-channel and sd references. In a Leadville
compliant installation this would be the ssd driver, hence you'd have
to investigate the specific semantics and driver tweaks that this
system has applied to sd in this instance.

If only it was possible to use the Leadville drivers... We've seen the
same problems here (*instant* panic if the FC switch reboots due to ZFS -
I wouldn't mind if it kept on retrying a tad bit longer - preferably
configurable). And to panic? How can that in any sane way be good way to
"protect" the application? *BANG* - no chance at all for the application
to handle the problem...

The *application* should not be worrying about handling error
conditions in the kernel. That's the kernel's job, and in this
case, ZFS' job.

ZFS protects *your data* by preventing any more writes from
occurring when it cannot guarantee the integrity of your data.

Btw. in our case we have also wrapped the raw FC-attached "disks" with
SVM metadevices first because if a disk in a A3500FC units goes bad then
we had the _other_ failure mode of ZFS - total hang until I noticed that
by wrapping the device with a layer of SVM metadevices insulated ZFS from
that problem - now it correctly notices that the disk is "gone/dead" and
displays that when doing "zfs status" etc.

Hm. An extra layer of complexity. Kinda defeats one of stated goals
of ZFS.

(We (Lysator ACS - a students computer club) can't use the Leadville
driver, since the 'ifp" driver (and hence use the "ssd" disks) for the
Qlogic QLA2100 HBA boards is based on an older Qlogic firmware that only
supports max 16 LUNs per target and we want more... So we use the Qlogic
qla2100 driver instead which works really nicely but then it uses the
"sd" disk devices instead. Being a computer club with limited funds means one finds ways to use old
hardware in new and interesting ways :-)

Ebay.se ?

Hardware in use: Primary file server: Sun Ultra 450, two Qlogic QLA2100
HBAs. One connected via an 8-port FC-AL *hub* to two Sun A5000 JBOD boxes
(filled with 9 and 18GB FC disks), the other via a Brocade 2400 8-port
switch (running in "QuickLoop" mode) to a Compaq StorageWorks RA8000 RAID
and two A3500FC systems.
Now... What can *possibly* go wrong with that setup? :-)

Hmmm.... let's start with the mere existence of the EOL'd A3500fc
hardware in your config. Kinda goes downhill from there :)

I'll tell you a couple:

1. When the server entered multiuser and started serving NFS to all the
users $HOME - many many disks in the A5000 started resetting themself
again and again and again... Solution: Tune down the maximum number of
tagged commands that was sent to the disks in /kernel/drv/qla2100.conf: hba1-max-iocb-allocation=7; # was 256 hba1-execution-throttle=7; # was 31
 (This problem wasn't there with the old Sun "ifp" driver, probably
because it has less agressive limits - but since that driver is totally
nonconfigurable it's impossible to tell).

Ebay.se

2. The power cord got slightly lose to the Brocade switch causing it to
reboot causing the server into an *Instant PANIC thanks to ZFS*

Yes, as noted, this is by design in order to *protect your data*


James C. McPherson
--
Solaris kernel software engineer
Sun Microsystems
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to