https://bugzilla.kernel.org/show_bug.cgi?id=199435
Bug ID: 199435
Summary: HPSA + P420i resetting logical Direct-Access never
complete
Product: IO/Storage
Version: 2.5
Kernel Version: 4.11.0-14-generic
Hardware: All
OS: Linux
Tree: Mainline
Status: NEW
Severity: normal
Priority: P1
Component: SCSI
Assignee: [email protected]
Reporter: [email protected]
Regression: No
I'm using the kernel 4.11.0-14-generic with the last hpsa driver compile from
the last commit of torvalds github :
https://github.com/torvalds/linux/commit/8b834bff1b73dce46f4e9f5e84af6f73fed8b0ef#diff-7a84fb366ebc08b575a832f0aeee3434
I'm using a Smart Array P420i, Firmware Version 8.32.
When a resetting logical is triggered, this one never complete and the server
start to have a heavy load (can rise to 3000).
After the reset, some task begin to timout but I think that is just the effect
of the resetting (cmaeventd is the process checking for controller status):
Apr 18 01:28:53 kernel: hpsa 0000:08:00.0: scsi 0:1:0:0: resetting logical
Direct-Access HP LOGICAL VOLUME RAID-0 SSDSmartPathCap- En- Exp=1
Apr 18 01:29:16 kernel: INFO: task cmaeventd:3397 blocked for more than 120
seconds.
Apr 18 01:29:16 kernel: Tainted: G OE 4.11.0-14-generic
#20~16.04.1-Ubuntu
Apr 18 01:29:16 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
Apr 18 01:29:16 kernel: cmaeventd D 0 3397 1 0x00000000
Apr 18 01:29:16 kernel: Call Trace:
Apr 18 01:29:16 kernel: __schedule+0x3b9/0x8f0
Apr 18 01:29:16 kernel: schedule+0x36/0x80
Apr 18 01:29:16 kernel: scsi_block_when_processing_errors+0xd5/0x110
Apr 18 01:29:16 kernel: ? wake_atomic_t_function+0x60/0x60
Apr 18 01:29:16 kernel: sg_open+0x14a/0x5c0
Apr 18 01:29:16 kernel: ? lookup_fast+0xd8/0x3b0
Apr 18 01:29:16 kernel: ? refcount_inc+0x9/0x40
Apr 18 01:29:16 kernel: chrdev_open+0xbf/0x1b0
Apr 18 01:29:16 kernel: do_dentry_open+0x208/0x310
Apr 18 01:29:16 kernel: ? cdev_put+0x30/0x30
Apr 18 01:29:16 kernel: vfs_open+0x4e/0x80
Apr 18 01:29:16 kernel: path_openat+0x2ac/0x1450
Apr 18 01:29:16 kernel: do_filp_open+0x99/0x110
Apr 18 01:29:16 kernel: ? __check_object_size+0x108/0x19e
Apr 18 01:29:16 kernel: ? __alloc_fd+0x46/0x170
Apr 18 01:29:16 kernel: do_sys_open+0x12d/0x280
Apr 18 01:29:16 kernel: ? do_sys_open+0x12d/0x280
Apr 18 01:29:16 kernel: ? __put_cred+0x3d/0x50
Apr 18 01:29:16 kernel: ? SyS_access+0x1e8/0x230
Apr 18 01:29:16 kernel: SyS_open+0x1e/0x20
Apr 18 01:29:16 kernel: entry_SYSCALL_64_fastpath+0x1e/0xad
Apr 18 01:29:16 kernel: RIP: 0033:0x7f413c901be0
Apr 18 01:29:16 kernel: RSP: 002b:00007ffc0c1cd5b8 EFLAGS: 00000246 ORIG_RAX:
0000000000000002
Apr 18 01:29:16 kernel: RAX: ffffffffffffffda RBX: 00000000025f7a40 RCX:
00007f413c901be0
Apr 18 01:29:16 kernel: RDX: 0000000000000008 RSI: 0000000000000002 RDI:
00007ffc0c1cd5f0
Apr 18 01:29:16 kernel: RBP: 0000000002563b40 R08: 0000000000000001 R09:
0000000000000000
Apr 18 01:29:16 kernel: R10: 00007f413c8ea760 R11: 0000000000000246 R12:
00007ffc0c1cd7b0
Apr 18 01:29:16 kernel: R13: 0000000000000001 R14: 00007ffc0c1cd700 R15:
00007ffc0c1cd830
Apr 18 01:29:16 kernel: INFO: task cmaidad:3442 blocked for more than 120
seconds.
Apr 18 01:29:16 kernel: Tainted: G OE 4.11.0-14-generic
#20~16.04.1-Ubuntu
Apr 18 01:29:16 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
Apr 18 01:29:16 kernel: cmaidad D 0 3442 1 0x00000000
Apr 18 01:29:16 kernel: Call Trace:
Apr 18 01:29:16 kernel: __schedule+0x3b9/0x8f0
Apr 18 01:29:16 kernel: schedule+0x36/0x80
Apr 18 01:29:16 kernel: scsi_block_when_processing_errors+0xd5/0x110
Apr 18 01:29:16 kernel: ? wake_atomic_t_function+0x60/0x60
Apr 18 01:29:16 kernel: sg_open+0x14a/0x5c0
Apr 18 01:29:16 kernel: ? lookup_fast+0xd8/0x3b0
Apr 18 01:29:16 kernel: ? refcount_inc+0x9/0x40
Apr 18 01:29:16 kernel: chrdev_open+0xbf/0x1b0
Apr 18 01:29:16 kernel: do_dentry_open+0x208/0x310
Apr 18 01:29:16 kernel: ? cdev_put+0x30/0x30
Apr 18 01:29:16 kernel: vfs_open+0x4e/0x80
Apr 18 01:29:16 kernel: path_openat+0x2ac/0x1450
Apr 18 01:29:16 kernel: do_filp_open+0x99/0x110
Apr 18 01:29:16 kernel: ? ipcperms+0x94/0x100
Apr 18 01:29:16 kernel: ? __check_object_size+0x108/0x19e
Apr 18 01:29:16 kernel: ? __alloc_fd+0x46/0x170
Apr 18 01:29:16 kernel: do_sys_open+0x12d/0x280
Apr 18 01:29:16 kernel: ? do_sys_open+0x12d/0x280
Apr 18 01:29:16 kernel: ? __put_cred+0x3d/0x50
Apr 18 01:29:16 kernel: ? SyS_access+0x1e8/0x230
Apr 18 01:29:16 kernel: SyS_open+0x1e/0x20
Apr 18 01:29:16 kernel: entry_SYSCALL_64_fastpath+0x1e/0xad
Apr 18 01:29:16 kernel: RIP: 0033:0x7ff5af4cdbe0
Apr 18 01:29:16 kernel: RSP: 002b:00007fff8eac8818 EFLAGS: 00000246 ORIG_RAX:
0000000000000002
Apr 18 01:29:16 kernel: RAX: ffffffffffffffda RBX: 0000000000000004 RCX:
00007ff5af4cdbe0
Apr 18 01:29:16 kernel: RDX: 0000000000000008 RSI: 0000000000000002 RDI:
00007fff8eac8850
Apr 18 01:29:16 kernel: RBP: 0000000002372870 R08: 0000000000000001 R09:
00007ff5af4b77b8
Apr 18 01:29:16 kernel: R10: 00007ff5af4b6760 R11: 0000000000000246 R12:
0000000002372878
Apr 18 01:29:16 kernel: R13: 0000000000000005 R14: 00007ff5b00018c0 R15:
0000000000000000
Apr 18 01:29:16 kernel: INFO: task jbd2/sdam-8:9965 blocked for more than 120
seconds.
Apr 18 01:29:16 kernel: Tainted: G OE 4.11.0-14-generic
#20~16.04.1-Ubuntu
Apr 18 01:29:16 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
Apr 18 01:29:16 kernel: jbd2/sdam-8 D 0 9965 2 0x00000000
Apr 18 01:29:16 kernel: Call Trace:
Apr 18 01:29:16 kernel: __schedule+0x3b9/0x8f0
Apr 18 01:29:16 kernel: schedule+0x36/0x80
Apr 18 01:29:16 kernel: jbd2_journal_commit_transaction+0x241/0x1830
Apr 18 01:29:16 kernel: ? update_load_avg+0x84/0x560
Apr 18 01:29:16 kernel: ? update_load_avg+0x84/0x560
Apr 18 01:29:16 kernel: ? dequeue_entity+0xed/0x4c0
Apr 18 01:29:16 kernel: ? wake_atomic_t_function+0x60/0x60
Apr 18 01:29:16 kernel: ? lock_timer_base+0x7d/0xa0
Apr 18 01:29:16 kernel: kjournald2+0xca/0x250
Apr 18 01:29:16 kernel: ? kjournald2+0xca/0x250
Apr 18 01:29:16 kernel: ? wake_atomic_t_function+0x60/0x60
Apr 18 01:29:16 kernel: kthread+0x109/0x140
Apr 18 01:29:16 kernel: ? commit_timeout+0x10/0x10
Apr 18 01:29:16 kernel: ? kthread_create_on_node+0x70/0x70
Apr 18 01:29:16 kernel: ret_from_fork+0x25/0x30
The only way to be back to normal is to reboot the server.
Hope this helps somebody. If there is any more info I can provide, just ask
what would be useful.
--
You are receiving this mail because:
You are the assignee for the bug.