So far the best I could reach is get the kernel call trace but not
crashing the node yet.

Oct 12 15:33:41 ip-10-16-21-10 kernel: [10919.306845] INFO: task java:1932 
blocked for more than 120 seconds.
Oct 12 15:33:41 ip-10-16-21-10 kernel: [10919.308573]       Not tainted 
4.4.0-1069-aws #79-Ubuntu
Oct 12 15:33:41 ip-10-16-21-10 kernel: [10919.309944] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 12 15:33:41 ip-10-16-21-10 kernel: [10919.312015] java            D 
ffff88032dc57db0     0  1932      1 0x00000000
Oct 12 15:33:41 ip-10-16-21-10 kernel: [10919.312019]  ffff88032dc57db0 
0000000000000246 ffff88041077e900 ffff88032dcae900
Oct 12 15:33:41 ip-10-16-21-10 kernel: [10919.312022]  ffff88032dc58000 
0000000000074da7 ffff88040dbdf888 ffff88040dbdf824
Oct 12 15:33:41 ip-10-16-21-10 kernel: [10919.312024]  ffff88032dc57df8 
ffff88032dc57dc8 ffffffff81823825 ffff88040dbdf800
Oct 12 15:33:41 ip-10-16-21-10 kernel: [10919.312027] Call Trace:
Oct 12 15:33:41 ip-10-16-21-10 kernel: [10919.312035]  [<ffffffff81823825>] 
schedule+0x35/0x80
Oct 12 15:33:41 ip-10-16-21-10 kernel: [10919.312042]  [<ffffffff812f8309>] 
jbd2_log_wait_commit+0xa9/0x130
Oct 12 15:33:41 ip-10-16-21-10 kernel: [10919.312045]  [<ffffffff810c77b0>] ? 
wake_atomic_t_function+0x60/0x60
Oct 12 15:33:41 ip-10-16-21-10 kernel: [10919.312048]  [<ffffffff812fae21>] 
__jbd2_journal_force_commit+0x61/0xb0
Oct 12 15:33:41 ip-10-16-21-10 kernel: [10919.312049]  [<ffffffff812faeb1>] 
jbd2_journal_force_commit+0x21/0x30
Oct 12 15:33:41 ip-10-16-21-10 kernel: [10919.312053]  [<ffffffff812c71c9>] 
ext4_force_commit+0x29/0x30
Oct 12 15:33:41 ip-10-16-21-10 kernel: [10919.312057]  [<ffffffff8129a7c1>] 
ext4_sync_file+0x1a1/0x350
Oct 12 15:33:41 ip-10-16-21-10 kernel: [10919.312065]  [<ffffffff81247bae>] 
vfs_fsync_range+0x4e/0xb0
Oct 12 15:33:41 ip-10-16-21-10 kernel: [10919.312072]  [<ffffffff81247c6d>] 
do_fsync+0x3d/0x70
Oct 12 15:33:41 ip-10-16-21-10 kernel: [10919.312074]  [<ffffffff81247f10>] 
SyS_fsync+0x10/0x20
Oct 12 15:33:41 ip-10-16-21-10 kernel: [10919.312077]  [<ffffffff818279ce>] 
entry_SYSCALL_64_fastpath+0x22/0xc1
Oct 12 15:35:01 ip-10-16-21-10 CRON[31352]: (root) CMD (command -v debian-sa1 > 
/dev/null && debian-sa1 1 1)

using this simple script writing and reading same file in parallel from
different process and let it run for while in my case I got the first
call trace after 3h.

while true
do
dd iflag=nocache if=/dev/zero of=/tmp/myfilesource$1 bs=1k count=$2
dd iflag=nocache if=/tmp/myfilesource$1 of=/tmp/myfileout$1 bs=1k count=$2
done


root@ip-10-16-21-10:~# jobs
[1]   Running                 nohup ./mytest1.sh 1 1024000&
[2]-  Running                 nohup ./mytest1.sh 1 1024 &
[3]+  Running                 nohup ./mytest1.sh 1 102400 &

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1788035

Title:
  nvme: avoid cqe corruption

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1788035/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to