Dennis Clarke wrote:
On Tue, 24 Mar 2009, Dennis Clarke wrote:
However, I have repeatedly run into problems when I need to boot after a
power failure. I see vdevs being marked as FAULTED regardless if there
are
actually any hard errors reported by the on disk SMART Firmware. I am
able
to remove these FAULTed devices temporarily and then re-insert the same
disk again and then run fine for months. Until the next long power
failure.
In spite of huge detail, you failed to describe to us the technology
used to communicate with these disks. The interface adaptors,
switches, and wiring topology could make a difference.
Nothing fancy. Dual QLogic ( Sun ) fibre cards directly connected to the
back of A5200's. Simple really.
Run away! Run away!
Save yourself a ton of grief and replace the A5200.
Is there *really* a severe fault in that disk ?
# luxadm -v display 21000018625d599d
This sounds some some sort of fiber channel.
Transport protocol: IEEE 1394 (SBP-2)
Interesting that it mentions the protocol used by FireWire.
I have no idea where that is coming from.
If you are using fiber channel, the device names in the pool
specification suggest that Solaris multipathing is not being used (I
would expect something long like
c4t600A0B800039C9B500000A9C47B4522Dd0). If multipathing is not used,
then you either have simplex connectivity, or two competing simplex
paths to each device. Multipathing is recommended if you have
redundant paths available.
Yes, I have another machine that has mpxio in place. However a power
failure also trips phantom faults.
If the disk itself is not aware of its severe faults then that
suggests that there is a transient problem with communicating with the
disk.
You would think so eh?
But a transient problem that only occurs after a power failure?
The problem could be in a device driver, adaptor card, FC
switch, or cable. If the disk drive also lost power, perhaps the disk
is unusually slow at spinning up.
All disks were up at boot, you can see that when I ask for a zpool status
at boot time in single user mode. No errors and no faults.
The issue seems to be when fmadm starts up or perhaps some other service
that can thrown a fault. I'm not sure.
The following will help you diagnose where the error messages
are generated from. I doubt it is a problem with the disk, per se, but
you will want to double check your disk firmware to make sure it is
up to date (I've got scars)
fmadm faulty
fmdump -eV
-- richard
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss