Server using [b]Sun StorageTek 8-port external SAS PCIe HBA [/b](mpt driver) connected to external JBOD array with 12 disks.
Here is link to the exact SAS (Sun) adapter: http://www.sun.com/storage/storage_networking/hba/sas/PCIe.pdf (LSI SAS3801) When running IO intensive operations (zpool scrub) for couple of hours, the server locks with the following repeating messages: Nov 10 16:31:45 sunserver scsi: [ID 365881 kern.info] /p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0): Nov 10 16:31:45 sunserver Log info 0x31140000 received for target 17. Nov 10 16:31:45 sunserver scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc Nov 10 16:32:55 sunserver scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0): Nov 10 16:32:55 sunserver Disconnected command timeout for Target 19 Nov 10 16:32:56 sunserver scsi: [ID 365881 kern.info] /p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0): Nov 10 16:32:56 sunserver Log info 0x31140000 received for target 19. Nov 10 16:32:56 sunserver scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc Nov 10 16:34:16 sunserver scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0): Nov 10 16:34:16 sunserver Disconnected command timeout for Target 21 I tested this on two servers: - [b]Sun Fire X2200[/b] using [b]Sun Storage J4200 JBOD[/b] array and - [b]Dell R410 Server[/b] with [b]Promise VTJ-310SS JBOD array[/b] They both are showing the same repeating messages and locking after couple of hours of zpool scrub. Solaris appears to be more stable (than OpenSolaris) - it doesn't lock when scrubbing, but still locks after 5-6 hours reading from the JBOD array - 10TB size. So at this point this looks like an issue with the MPT driver or these SAS cards (I tested two) when under heavy load. I put the latest firmware for the SAS card from LSI's web site - v1.29.00 without any changes, server still locks. Any ideas, suggestions how to fix or workaround this issue? The adapter is suppose to be enterprise-class. Here is more detailed log info: ======================================================== Sun Fire X2200 and Sun Storage J4200 JBOD array SAS card: Sun StorageTek 8-port external SAS PCIe HBA http://www.sun.com/storage/storage_networking/hba/sas/PCIe.pdf (LSI SAS3801) Operation System: SunOS sunserver 5.11 snv_111b i86pc i386 i86pc Solaris Nov 10 16:30:33 sunserver scsi: [ID 365881 kern.info] /p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0): Nov 10 16:30:33 sunserver Log info 0x31140000 received for target 0. Nov 10 16:30:33 sunserver scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc Nov 10 16:31:43 sunserver scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0): Nov 10 16:31:43 sunserver Disconnected command timeout for Target 17 Nov 10 16:32:55 sunserver scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0): Nov 10 16:32:55 sunserver Disconnected command timeout for Target 19 Nov 10 16:32:56 sunserver scsi: [ID 365881 kern.info] /p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0): Nov 10 16:32:56 sunserver Log info 0x31140000 received for target 19. Nov 10 16:32:56 sunserver scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc Nov 10 16:34:16 sunserver scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0): Nov 10 16:34:16 sunserver Disconnected command timeout for Target 21 ---------------- Dell R410 Server and Promise VTJ-310SS JBOD array SAS card: Sun StorageTek 8-port external SAS PCIe HBA Operating System: SunOS dellserver 5.10 Generic_141445-09 i86pc i386 i86pc Nov 11 00:18:22 dellserver scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci8086,3...@3/pci1028,1...@0 (mpt0): Nov 11 00:18:22 dellserver Disconnected command timeout for Target 0 Nov 11 00:18:22 dellserver scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci8086,3...@3/pci1028,1...@0/s...@0,0 (sd13): Nov 11 00:18:22 dellserver Error for Command: read(10) Error Level: Retryable Nov 11 00:18:22 dellserver scsi: [ID 107833 kern.notice] Requested Block: 276886498 Error Block: 276886498 Nov 11 00:18:22 dellserver scsi: [ID 107833 kern.notice] Vendor: Dell Serial Number: Dell Interna Nov 11 00:18:22 dellserver scsi: [ID 107833 kern.notice] Sense Key: Unit Attention Nov 11 00:18:22 dellserver scsi: [ID 107833 kern.notice] ASC: 0x29 (power on, reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0 Nov 11 00:19:33 dellserver scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci8086,3...@3/pci1028,1...@0 (mpt0): Nov 11 00:19:33 dellserver Disconnected command timeout for Target 0 Nov 11 00:19:34 dellserver scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci8086,3...@3/pci1028,1...@0/s...@0,0 (sd13): Nov 11 00:19:34 dellserver SCSI transport failed: reason 'reset': retrying command Nov 11 00:20:44 dellserver scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci8086,3...@3/pci1028,1...@0 (mpt0): Nov 11 00:20:44 dellserver Disconnected command timeout for Target 0 -- This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss