James: We are running Phase 16 on our LSISAS3801E's, and have also tried the recently released Phase 17 but it didn't help. All firmware NVRAM settings are default. Basically, when we put the disks behind this controller under load (e.g. scrubbing, recursive ls on large ZFS filesystem) we get this series of log entries that appear at random intervals:
scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci8086,6...@4/pci1000,3...@0/s...@34,0 (sd49): incomplete read- retrying scsi: [ID 243001 kern.warning] WARNING: /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0): mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31110b00 scsi: [ID 243001 kern.warning] WARNING: /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0): mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31110b00 scsi: [ID 243001 kern.warning] WARNING: /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0): mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31112000 scsi: [ID 243001 kern.warning] WARNING: /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0): mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31112000 scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0): Log info 0x31110b00 received for target 40. scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0): Log info 0x31110b00 received for target 40. scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0): Log info 0x31110b00 received for target 40. scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0): Log info 0x31110b00 received for target 40. scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci8086,6...@4/pci1000,3...@0/s...@2d,0 (sd42): incomplete read- retrying scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0): Rev. 8 LSI, Inc. 1068E found. scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0): mpt0 supports power management. scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0): mpt0: IOC Operational. It seems to be timing out accessing a disk, retrying, giving up and then doing a bus reset? This is happening with random disks behind the controller and on multiple systems with the same hardware config. We are running snv_118 right now and was hoping this was some magic mpt-related "bug" that was going to be fixed in snv_125 but it doesn't look like it. The LSI3801E is driving 2 x 23-disk JBOD's which, albeit a dense solution, it should be able to handle. We are also using wide raidz2 vdevs (22 disks each, one per JBOD) which agreeably is slower performance-wise, but the goal here is density not performance. I would have hoped that the system would just "slow down" if there was IO contention, but not experience things like bus resets. Your thoughts? -- This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss