What bug# is this under? I'm having what I believe is the same problem. Is it possible to just take the mpt driver from a prior build in the time being? The below is from the load the zpool scrub creates. This is on a dell t7400 workstation with a 1068E oemed lsi. I updated the firmware to the newest available from dell. The errors follow whichever of the 4 drives has the highest load.
Streaming doesn't seem to trigger it as I can push 60 MiB a second to a mirrored rpool all day, it's only when there are a lot of metadata operations. Oct 23 06:25:44 systurbo5 scsi: [ID 107833 kern.warning] WARNING: /p...@0 ,0/pci8086,4...@9/pci8086,3...@0/pci8086,3...@0/pci1028,2...@0 (mpt0): Oct 23 06:25:44 systurbo5 Disconnected command timeout for Target 1 Oct 23 06:27:15 systurbo5 scsi: [ID 107833 kern.warning] WARNING: /p...@0 ,0/pci8086,4...@9/pci8086,3...@0/pci8086,3...@0/pci1028,2...@0 (mpt0): Oct 23 06:27:15 systurbo5 Disconnected command timeout for Target 1 Oct 23 06:28:26 systurbo5 scsi: [ID 107833 kern.warning] WARNING: /p...@0 ,0/pci8086,4...@9/pci8086,3...@0/pci8086,3...@0/pci1028,2...@0 (mpt0): Oct 23 06:28:26 systurbo5 Disconnected command timeout for Target 1 Oct 23 06:29:47 systurbo5 scsi: [ID 107833 kern.warning] WARNING: /p...@0 ,0/pci8086,4...@9/pci8086,3...@0/pci8086,3...@0/pci1028,2...@0 (mpt0): Oct 23 06:29:47 systurbo5 Disconnected command timeout for Target 1 Oct 23 06:30:58 systurbo5 scsi: [ID 107833 kern.warning] WARNING: /p...@0 ,0/pci8086,4...@9/pci8086,3...@0/pci8086,3...@0/pci1028,2...@0 (mpt0): Oct 23 06:30:58 systurbo5 Disconnected command timeout for Target 1 Oct 23 06:31:28 systurbo5 scsi: [ID 243001 kern.warning] WARNING: /p...@0 ,0/pci8086,4...@9/pci8086,3...@0/pci8086,3...@0/pci1028,2...@0 (mpt0): Oct 23 06:31:28 systurbo5 mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31123000 Oct 23 06:31:28 systurbo5 scsi: [ID 243001 kern.warning] WARNING: /p...@0 ,0/pci8086,4...@9/pci8086,3...@0/pci8086,3...@0/pci1028,2...@0 (mpt0): Oct 23 06:31:28 systurbo5 mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31123000 Oct 23 06:31:29 systurbo5 scsi: [ID 365881 kern.info] /p...@0 ,0/pci8086,4...@9/pci8086,3...@0/pci8086,3...@0/pci1028,2...@0 (mpt0): Oct 23 06:31:29 systurbo5 Log info 0x31123000 received for target 1. Oct 23 06:31:29 systurbo5 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Oct 23 06:31:29 systurbo5 scsi: [ID 365881 kern.info] /p...@0 ,0/pci8086,4...@9/pci8086,3...@0/pci8086,3...@0/pci1028,2...@0 (mpt0): Oct 23 06:31:29 systurbo5 Log info 0x31123000 received for target 1. Oct 23 06:31:29 systurbo5 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Oct 23 06:31:29 systurbo5 scsi: [ID 365881 kern.info] /p...@0 ,0/pci8086,4...@9/pci8086,3...@0/pci8086,3...@0/pci1028,2...@0 (mpt0): Oct 23 06:31:29 systurbo5 Log info 0x31123000 received for target 1. Oct 23 06:31:29 systurbo5 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Oct 23 06:31:29 systurbo5 scsi: [ID 365881 kern.info] /p...@0 ,0/pci8086,4...@9/pci8086,3...@0/pci8086,3...@0/pci1028,2...@0 (mpt0): Oct 23 06:31:29 systurbo5 Log info 0x31123000 received for target 1. Oct 23 06:31:29 systurbo5 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc On Fri, Oct 23, 2009 at 7:13 AM, Adam Cheal <ach...@pnimedia.com> wrote: > Our config is: > OpenSolaris snv_118 x64 > 1 x LSISAS3801E controller > 2 x 23-disk JBOD (fully populated, 1TB 7.2k SATA drives) > Each of the two external ports on the LSI connects to a 23-disk JBOD. > ZFS-wise we use 1 zpool with 2 x 22-disk raidz2 vdevs (1 vdev per JBOD). > Each zpool has one ZFS filesystem containing millions of files/directories. > This data is served up via CIFS (kernel), which is why we went with snv_118 > (first release post-2009.06 that had stable CIFS server). Like I mentioned > to James, we know that the server won't be a star performance-wise > especially because of the wide vdevs but it shouldn't hiccup under load > either. A guaranteed way for us to cause these IO errors is to load up the > zpool with about 30 TB of data (90% full) then scrub it. Within 30 minutes > we start to see the errors, which usually evolves into "failing" disks > (because of excessive retry errors) which just makes things worse. > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss