> On Tue, 24 Mar 2009, Dennis Clarke wrote: >> >> However, I have repeatedly run into problems when I need to boot after a >> power failure. I see vdevs being marked as FAULTED regardless if there >> are >> actually any hard errors reported by the on disk SMART Firmware. I am >> able >> to remove these FAULTed devices temporarily and then re-insert the same >> disk again and then run fine for months. Until the next long power >> failure. > > In spite of huge detail, you failed to describe to us the technology > used to communicate with these disks. The interface adaptors, > switches, and wiring topology could make a difference.
Nothing fancy. Dual QLogic ( Sun ) fibre cards directly connected to the back of A5200's. Simple really. >> Is there *really* a severe fault in that disk ? >> >> # luxadm -v display 21000018625d599d > > This sounds some some sort of fiber channel. > >> Transport protocol: IEEE 1394 (SBP-2) > > Interesting that it mentions the protocol used by FireWire. I have no idea where that is coming from. > If you are using fiber channel, the device names in the pool > specification suggest that Solaris multipathing is not being used (I > would expect something long like > c4t600A0B800039C9B500000A9C47B4522Dd0). If multipathing is not used, > then you either have simplex connectivity, or two competing simplex > paths to each device. Multipathing is recommended if you have > redundant paths available. Yes, I have another machine that has mpxio in place. However a power failure also trips phantom faults. > If the disk itself is not aware of its severe faults then that > suggests that there is a transient problem with communicating with the > disk. You would think so eh? But a transient problem that only occurs after a power failure? > The problem could be in a device driver, adaptor card, FC > switch, or cable. If the disk drive also lost power, perhaps the disk > is unusually slow at spinning up. All disks were up at boot, you can see that when I ask for a zpool status at boot time in single user mode. No errors and no faults. The issue seems to be when fmadm starts up or perhaps some other service that can thrown a fault. I'm not sure. > It is easy to blame ZFS for problems. It is easy to blame a power failure for problems as well as an nice shiney new APC Smart-UPS XL 3000VA RM 3U unit with external extended run time battery that doesn't signal a power failure. I never blame ZFS for anything. > On my system I was experiencing > system crashes overnight while running 'zfs scrub' via cron job. The > fiber channel card was locking up. Eventually I learned that it was > due to a bug in VirtualBox's device driver. If VirtualBox was not > left running overnight, then the system would not crash. VirtualBox ? This is a Solaris 10 machine. Nothing fancy. OKay, sorry, nothing way out in the field fancy like VirtualBox. Dennis _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss