We have a Silicon Mechanics server with a SuperMicro X8DT3-F (Rev 1.02) (onboard LSI 1068E (firmware 1.28.02.00) and a SuperMicro SAS-846EL1 (Rev 1.1) backplane.
We have four Intel X-25E's attached to the backplane with two acting as ZIL and two as L2ARC. The remaining 21 drives are 1TB SATA. The system is being used as an NFS datastore for VMware ESX, and, while not too heavily loaded, we'll occasionally see these pop up in the logs: Feb 28 22:46:22 prodsys-t2-zfs1 scsi: [ID 365881 kern.info] /p...@0,0/pci8086,3...@8/pci15d9,1...@0 (mpt0): Feb 28 22:46:22 prodsys-t2-zfs1 Log info 31126000 received for target 31. Feb 28 22:46:22 prodsys-t2-zfs1 scsi_status=0, ioc_status=804b, scsi_state=c Feb 28 22:46:22 prodsys-t2-zfs1 scsi: [ID 365881 kern.info] /p...@0,0/pci8086,3...@8/pci15d9,1...@0 (mpt0): Feb 28 22:46:22 prodsys-t2-zfs1 Log info 31126000 received for target 31. Feb 28 22:46:22 prodsys-t2-zfs1 scsi_status=0, ioc_status=804b, scsi_state=c Feb 28 22:46:22 prodsys-t2-zfs1 scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci8086,3...@8/pci15d9,1...@0/s...@1f,0 (sd24): Feb 28 22:46:22 prodsys-t2-zfs1 Error for Command: write Error Level: Retryable Feb 28 22:46:22 prodsys-t2-zfs1 scsi: [ID 107833 kern.notice] Requested Block: 591744 Error Block: 591744 Feb 28 22:46:22 prodsys-t2-zfs1 scsi: [ID 107833 kern.notice] Vendor: ATA Serial Number: CVEM002600FD Feb 28 22:46:22 prodsys-t2-zfs1 scsi: [ID 107833 kern.notice] Sense Key: Unit Attention Feb 28 22:46:22 prodsys-t2-zfs1 scsi: [ID 107833 kern.notice] ASC: 0x29 (power on, reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0 Mar 1 01:10:40 prodsys-t2-zfs1 scsi: [ID 365881 kern.info] /p...@0,0/pci8086,3...@8/pci15d9,1...@0 (mpt0): Mar 1 01:10:40 prodsys-t2-zfs1 Log info 31126000 received for target 30. Mar 1 01:10:40 prodsys-t2-zfs1 scsi_status=0, ioc_status=804b, scsi_state=c Mar 1 01:10:40 prodsys-t2-zfs1 scsi: [ID 365881 kern.info] /p...@0,0/pci8086,3...@8/pci15d9,1...@0 (mpt0): Mar 1 01:10:40 prodsys-t2-zfs1 Log info 31126000 received for target 30. Mar 1 01:10:40 prodsys-t2-zfs1 scsi_status=0, ioc_status=804b, scsi_state=c Mar 1 01:10:41 prodsys-t2-zfs1 scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci8086,3...@8/pci15d9,1...@0/s...@1e,0 (sd23): Mar 1 01:10:41 prodsys-t2-zfs1 Error for Command: write Error Level: Retryable Mar 1 01:10:41 prodsys-t2-zfs1 scsi: [ID 107833 kern.notice] Requested Block: 958744 Error Block: 958744 Mar 1 01:10:41 prodsys-t2-zfs1 scsi: [ID 107833 kern.notice] Vendor: ATA Serial Number: CVEM0033003T Mar 1 01:10:41 prodsys-t2-zfs1 scsi: [ID 107833 kern.notice] Sense Key: Unit Attention Mar 1 01:10:41 prodsys-t2-zfs1 scsi: [ID 107833 kern.notice] ASC: 0x29 (power on, reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0 The errors _only_ correspond with whichever drives are being used for ZIL. The system is fully patched Solaris 10 U8, and the mpt driver is version 1.92: # modinfo | grep mpt 40 ffffffffef8bc000 3b5f0 169 1 mpt (MPT HBA Driver v1.92) The error messages above aren't fatal -- aparently the OS just retries the write and all is well. We haven't seen any performance impact either, but would like to track the problem down. We've already swapped out the SSD drives. The retries continue to occur as above.... The only thing that "solves" the problem is to either attach the SSD drives to the motherboard's SATA controllers or to attach them directly to the LSI controller (bypassing the backplane). This would seem to point the finger at the backplane, however, the other 21 SATA drives never throw errors and neither to the two SSD's being used for L2ARC. Could there be some sort of latency or timing issue with the mpt driver that might be causing this that only manifests itself with a high level of writes to SSD devices hanging off a backplane (potentially longer latency path?)? Are there some SCSI command timeout settings I can tweak to perhaps "mask" these errors for the mpt driver? The vendor will probably want to send us a backplane, but I'm not convinced it will fix the issue. Suggestions or thoughts? Thanks, Ray _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss