We recently had a disk fail on one of our whitebox (SuperMicro) ZFS arrays (Solaris 10 U9).
The disk began throwing errors like this: May 5 04:33:44 dev-zfs4 scsi: [ID 243001 kern.warning] WARNING: /pci@0,0/pci8086,3410@9/pci15d9,400@0 (mpt_sas0): May 5 04:33:44 dev-zfs4 mptsas_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31110610 And errors for the drive were incrementing in iostat -En output. Nothing was seen in fmdump. Unfortunately, it took about three hours for ZFS (or maybe it was MPT) to decide the drive was actually dead: May 5 07:41:06 dev-zfs4 scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/disk@g5000c5002cbc76c0 (sd4): May 5 07:41:06 dev-zfs4 drive offline During this three hours the I/O performance on this server was pretty bad and caused issues for us. Once the drive "failed" completely, ZFS pulled in a spare and all was well. My question is -- is there a way to tune the MPT driver or even ZFS itself to be more/less aggressive on what it sees as a "failure" scenario? I suppose this would have been handled differently / better if we'd been using real Sun hardware? Our other option is to watch better for log entries similar to the above and either alert someone or take some sort of automated action .. I'm hoping there's a better way to tune this via driver or ZFS settings however. Thanks, Ray _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss