Hi, I would like to add , yet another, mpt timeout report. Suddently the system started to get slow, noticeable due to the fact that some linux clients where complaining about nfs server timeout, and after some time i saw alot of reset bus messages in the /var/adm/messsages file. I quickly took a look to the JBOD chassis, and one of the disks had a fixed light, and after the physical removal of this disk, the system re-started to respond and the resilver process kicked in, due to a spare disk took the place of the disconnected disk, as seen with the zpool status -v :
zpool status -v DATAPOOL04
pool: DATAPOOL04
state: DEGRADED
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-9P
scrub: resilver in progress for 1h40m, 8.26% done, 18h32m to go
config:
NAME STATE READ WRITE CKSUM
DATAPOOL04 DEGRADED 0 0 0
raidz1 DEGRADED 0 0 0
c5t27d0 ONLINE 0 0 0 105M resilvered
c5t29d0 ONLINE 0 0 0 105M resilvered
c5t30d0 ONLINE 0 0 0 105M resilvered
spare DEGRADED 0 0 0
c5t31d0 REMOVED 0 423K 0
c5t28d0 ONLINE 0 0 0 9.83G resilvered
c5t32d0 ONLINE 0 0 0 105M resilvered
spares
c5t28d0 INUSE currently in use
errors: No known data errors
At this moment the system is doing the resilvering, but the messages
regarding disk/disk controller still appear in the log. Could this
messages appear due to the fact that the resilver process is a heavy
one, or more disks are probably affected?
In cases such as this one, what's the best procedure to do?
* shutdown server and JBOD , including power off/power on and see
how it goes
* replace HBA/disk ?
* other ?
Thanks for the time, and if any other information is required (even ssh
access can be granted) please feel free to ask it.
Best regards,
Bruno Sousa
System specs :
* OpenSolaris snv_101b, with two Dual-Core AMD, and 16 GB Ram
* LSI Logic SAS1068E, revision B3 , MPT Rev 105, Firmware Rev 011a0000
* 24 disks are attached to this HBA, the disks are Seagate Sata 1TB
"Enterprise" class (ATA-ST31000340NS-SN06-931.51GB )
* the LSI HBA is connect with 1 SFF 8087 connector cable (SAS 846EL1
BP 1-Port Internal Cascading Cable) to a Supermicro Chassis SC
846 with a SAS / SATA Expander Backplane with single LSI SASX36
Expander Chip
/var/adm/messages content
Dec 7 13:57:12 san01 scsi: [ID 107833 kern.warning] WARNING:
/p...@0,0/pci10de,3...@a/pci1000,3...@0/s...@17,0 (sd18):
Dec 7 13:57:12 san01 Error for Command: write(10)
Error Level: Retryable
Dec 7 13:57:12 san01 scsi: [ID 107833 kern.notice] Requested
Block: 48696432 Error Block: 48696432
Dec 7 13:57:12 san01 scsi: [ID 107833 kern.notice] Vendor:
ATA Serial Number:
Dec 7 13:57:12 san01 scsi: [ID 107833 kern.notice] Sense Key:
Unit_Attention
Dec 7 13:57:12 san01 scsi: [ID 107833 kern.notice] ASC: 0x29
(power on, reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0
Dec 7 13:57:15 san01 scsi: [ID 243001 kern.warning] WARNING:
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec 7 13:57:15 san01 mpt_handle_event_sync: IOCStatus=0x8000,
IOCLogInfo=0x31123000
Dec 7 13:57:15 san01 scsi: [ID 243001 kern.warning] WARNING:
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec 7 13:57:15 san01 mpt_handle_event: IOCStatus=0x8000,
IOCLogInfo=0x31123000
Dec 7 13:57:45 san01 Log info 0x31123000 received for target 21.
Dec 7 13:57:45 san01 scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Dec 7 13:57:45 san01 scsi: [ID 365881 kern.notice]
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec 7 13:57:45 san01 Log info 0x31123000 received for target 21.
Dec 7 13:57:45 san01 scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Dec 7 13:57:45 san01 scsi: [ID 365881 kern.notice]
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec 7 13:57:45 san01 Log info 0x31123000 received for target 21.
Dec 7 13:57:45 san01 scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Dec 7 13:57:45 san01 scsi: [ID 365881 kern.notice]
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec 7 13:57:45 san01 Log info 0x31123000 received for target 21.
Dec 7 13:57:45 san01 scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Dec 7 13:57:45 san01 scsi: [ID 365881 kern.notice]
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec 7 13:57:45 san01 Log info 0x31123000 received for target 21.
Dec 7 13:57:45 san01 scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Dec 7 13:57:45 san01 scsi: [ID 365881 kern.notice]
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec 7 13:57:45 san01 Log info 0x31123000 received for target 21.
Dec 7 13:57:45 san01 scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Dec 7 13:57:45 san01 scsi: [ID 365881 kern.notice]
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec 7 13:57:45 san01 Log info 0x31123000 received for target 21.
Dec 7 13:57:45 san01 scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Dec 7 13:57:45 san01 scsi: [ID 365881 kern.notice]
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec 7 13:57:45 san01 Log info 0x31123000 received for target 21.
Dec 7 13:57:45 san01 scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Dec 7 13:57:45 san01 scsi: [ID 365881 kern.notice]
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec 7 13:57:45 san01 Log info 0x31123000 received for target 21.
Dec 7 13:57:45 san01 scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Dec 7 13:57:45 san01 scsi: [ID 365881 kern.notice]
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec 7 13:57:45 san01 Log info 0x31123000 received for target 21.
Dec 7 13:57:45 san01 scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Dec 7 13:57:45 san01 scsi: [ID 365881 kern.notice]
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec 7 13:57:45 san01 Log info 0x31123000 received for target 21.
Dec 7 13:57:45 san01 scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Dec 7 13:57:45 san01 scsi: [ID 365881 kern.notice]
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec 7 13:57:45 san01 Log info 0x31123000 received for target 21.
Dec 7 13:57:45 san01 scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Dec 7 13:57:45 san01 scsi: [ID 365881 kern.notice]
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec 7 13:57:45 san01 Log info 0x31123000 received for target 21.
Dec 7 13:57:45 san01 scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Dec 7 13:57:45 san01 scsi: [ID 365881 kern.notice]
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec 7 13:57:45 san01 Log info 0x31123000 received for target 21.
Dec 7 13:57:45 san01 scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Dec 7 13:57:45 san01 scsi: [ID 365881 kern.notice]
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec 7 13:57:45 san01 Log info 0x31123000 received for target 21.
Dec 7 13:57:45 san01 scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Dec 7 13:57:45 san01 scsi: [ID 365881 kern.notice]
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec 7 13:57:45 san01 Log info 0x31123000 received for target 21.
Dec 7 13:57:45 san01 scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Dec 7 13:57:45 san01 scsi: [ID 365881 kern.notice]
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec 7 13:57:45 san01 Log info 0x31123000 received for target 21.
Dec 7 13:57:45 san01 scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Dec 7 13:57:45 san01 scsi: [ID 365881 kern.notice]
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec 7 13:57:45 san01 Log info 0x31123000 received for target 21.
Dec 7 13:57:45 san01 scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Dec 7 13:57:45 san01 scsi: [ID 365881 kern.notice]
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec 7 13:57:45 san01 Log info 0x31123000 received for target 21.
Dec 7 13:57:45 san01 scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Dec 7 13:57:45 san01 scsi: [ID 365881 kern.notice]
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec 7 13:57:45 san01 Log info 0x31123000 received for target 21.
Dec 7 13:57:45 san01 scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Dec 7 13:57:45 san01 scsi: [ID 107833 kern.warning] WARNING:
/p...@0,0/pci10de,3...@a/pci1000,3...@0/s...@15,0 (sd16):
Dec 7 13:57:45 san01 Error for Command: write(10)
Error Level: Retryable
Dec 7 13:57:45 san01 scsi: [ID 107833 kern.notice] Requested
Block: 445125208 Error Block: 445125208
Dec 7 13:57:45 san01 scsi: [ID 107833 kern.notice] Vendor:
ATA Serial Number:
Dec 7 13:57:45 san01 scsi: [ID 107833 kern.notice] Sense Key:
Unit_Attention
Dec 7 13:57:45 san01 scsi: [ID 107833 kern.notice] ASC: 0x29
(power on, reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0
Dec 7 13:57:50 san01 scsi: [ID 243001 kern.warning] WARNING:
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec 7 13:57:50 san01 mpt_handle_event_sync: IOCStatus=0x8000,
IOCLogInfo=0x31123000
Dec 7 13:57:50 san01 scsi: [ID 243001 kern.warning] WARNING:
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec 7 13:57:50 san01 mpt_handle_event: IOCStatus=0x8000,
IOCLogInfo=0x31123000
Dec 7 13:57:52 san01 scsi: [ID 365881 kern.notice]
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec 7 13:57:52 san01 Log info 0x31123000 received for target 28.
Dec 7 13:57:52 san01 scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Dec 7 13:57:52 san01 scsi: [ID 365881 kern.notice]
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec 7 13:57:52 san01 Log info 0x31123000 received for target 28.
Dec 7 13:57:52 san01 scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Dec 7 13:57:52 san01 scsi: [ID 365881 kern.notice]
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec 7 13:57:52 san01 Log info 0x31123000 received for target 28.
Dec 7 13:57:52 san01 scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Dec 7 13:57:52 san01 scsi: [ID 365881 kern.notice]
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec 7 13:57:52 san01 Log info 0x31123000 received for target 28.
Dec 7 13:57:52 san01 scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Dec 7 13:57:52 san01 scsi: [ID 365881 kern.notice]
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec 7 13:57:52 san01 Log info 0x31123000 received for target 28.
Dec 7 13:57:52 san01 scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Dec 7 13:57:52 san01 scsi: [ID 365881 kern.notice]
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec 7 13:57:52 san01 Log info 0x31123000 received for target 28.
Dec 7 13:57:52 san01 scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Dec 7 13:57:52 san01 scsi: [ID 365881 kern.notice]
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec 7 13:57:52 san01 Log info 0x31123000 received for target 28.
Dec 7 13:57:52 san01 scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Dec 7 13:57:52 san01 scsi: [ID 365881 kern.notice]
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec 7 13:57:52 san01 Log info 0x31123000 received for target 28.
Dec 7 13:57:52 san01 scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Dec 7 13:57:52 san01 scsi: [ID 365881 kern.notice]
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Dec 7 13:57:52 san01 Log info 0x31123000 received for target 28.
Dec 7 13:57:52 san01 scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Dec 7 13:57:52 san01 scsi: [ID 365881 kern.notice]
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
iostat -En results
Vendor: ATA Product: ST31000340NS Revision: SN06 Serial No:
Size: 1000.20GB <1000204886016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 125 Recoverable: 18
Illegal Request: 6 Predictive Failure Analysis: 0
c5t22d0 Soft Errors: 18 Hard Errors: 106 Transport Errors: 686
Vendor: ATA Product: ST31000340NS Revision: SN06 Serial No:
Size: 1000.20GB <1000204886016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 106 Recoverable: 18
Illegal Request: 6 Predictive Failure Analysis: 0
c5t23d0 Soft Errors: 18 Hard Errors: 80 Transport Errors: 339
Vendor: ATA Product: ST31000340NS Revision: SN06 Serial No:
Size: 1000.20GB <1000204886016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 80 Recoverable: 18
Illegal Request: 6 Predictive Failure Analysis: 0
c5t24d0 Soft Errors: 18 Hard Errors: 59 Transport Errors: 228
Vendor: ATA Product: ST31000340NS Revision: SN06 Serial No:
Size: 1000.20GB <1000204886016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 59 Recoverable: 18
Illegal Request: 6 Predictive Failure Analysis: 0
c5t25d0 Soft Errors: 18 Hard Errors: 55 Transport Errors: 219
Vendor: ATA Product: ST31000340NS Revision: SN06 Serial No:
Size: 1000.20GB <1000204886016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 55 Recoverable: 18
Illegal Request: 6 Predictive Failure Analysis: 0
c5t26d0 Soft Errors: 18 Hard Errors: 63 Transport Errors: 249
Vendor: ATA Product: ST31000340NS Revision: SN06 Serial No:
Size: 1000.20GB <1000204886016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 63 Recoverable: 18
Illegal Request: 6 Predictive Failure Analysis: 0
c5t27d0 Soft Errors: 18 Hard Errors: 11 Transport Errors: 274
Vendor: ATA Product: ST31000340NS Revision: SN06 Serial No:
Size: 1000.20GB <1000204886016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 10 Recoverable: 18
Illegal Request: 6 Predictive Failure Analysis: 0
c5t28d0 Soft Errors: 18 Hard Errors: 182 Transport Errors: 1255
Vendor: ATA Product: ST31000340NS Revision: SN06 Serial No:
Size: 1000.20GB <1000204886016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 182 Recoverable: 18
Illegal Request: 6 Predictive Failure Analysis: 0
c5t29d0 Soft Errors: 18 Hard Errors: 8 Transport Errors: 201
Vendor: ATA Product: ST31000340NS Revision: SN06 Serial No:
Size: 1000.20GB <1000204886016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 8 Recoverable: 18
Illegal Request: 6 Predictive Failure Analysis: 0
c5t30d0 Soft Errors: 18 Hard Errors: 10 Transport Errors: 249
Vendor: ATA Product: ST31000340NS Revision: SN06 Serial No:
Size: 1000.20GB <1000204886016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 10 Recoverable: 18
Illegal Request: 6 Predictive Failure Analysis: 0
c5t31d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 115
Vendor: ATA Product: ST31000340NS Revision: SN06 Serial No:
Size: 1000.20GB <1000204886016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12
Illegal Request: 4 Predictive Failure Analysis: 0
c5t32d0 Soft Errors: 18 Hard Errors: 11 Transport Errors: 222
Vendor: ATA Product: ST31000340NS Revision: SN06 Serial No:
Size: 1000.20GB <1000204886016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 11 Recoverable: 18
Illegal Request: 6 Predictive Failure Analysis: 0
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ zfs-discuss mailing list [email protected] http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
