Public bug reported:

SRU Justification:

[Impact]
This bug in bcache affects focal and jammy releases.

When Random Read I/O is started with a test like -

fio --name=read_iops --directory=/home/ubuntu/bcache_mount/ --size=16G
--ioengine=libaio --direct=1 --verify=0 --bs=4K --iodepth=128
--rw=randread --randrepeat=0

or

random read-writes with a test like,

fio --filename=/home/ubuntu/bcache_mount/cachedfile --size=15GB
--direct=1 --rw=randrw --bs=4k --ioengine=libaio --iodepth=128
--name=iops-test-job --randrepeat=0

traces are seen in the kernel log,

[ 4473.699902] INFO: task bcache_writebac:1835 blocked for more than 120 
seconds.
[ 4474.050921]       Not tainted 5.15.50-051550-generic #202206251445
[ 4474.350883] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 4474.731391] task:bcache_writebac state:D stack:    0 pid: 1835 ppid:     2 
flags:0x00004000
[ 4474.731408] Call Trace:
[ 4474.731411]  <TASK>
[ 4474.731413]  __schedule+0x23d/0x5a0
[ 4474.731433]  schedule+0x4e/0xb0
[ 4474.731436]  rwsem_down_write_slowpath+0x220/0x3d0
[ 4474.731441]  down_write+0x43/0x50
[ 4474.731446]  bch_writeback_thread+0x78/0x320 [bcache]
[ 4474.731471]  ? read_dirty_submit+0x70/0x70 [bcache]
[ 4474.731487]  kthread+0x12a/0x150
[ 4474.731491]  ? set_kthread_struct+0x50/0x50
[ 4474.731494]  ret_from_fork+0x22/0x30
[ 4474.731499]  </TASK>

The bug exists till kernel 5.15.50-051550-generic, and is fixed through
upstream commit

commit b57a534df246999bc09a510dc0e0125cacd1724c (HEAD)
Author: Coly Li <col...@suse.de>
Date:   Fri May 27 23:28:16 2022 +0800

    bcache: memset on stack variables in bch_btree_check() and 
bch_sectors_dirty_init()
    
    The local variables check_state (in bch_btree_check()) and state (in
    bch_sectors_dirty_init()) should be fully filled by 0, because before
    allocating them on stack, they were dynamically allocated by kzalloc().
    
    Signed-off-by: Coly Li <col...@suse.de>
    Link: https://lore.kernel.org/r/20220527152818.27545-2-col...@suse.de
    Signed-off-by: Jens Axboe <ax...@kernel.dk>


The reproducer is pasted below:

# uname -a
Linux bronzor 5.15.50-051550-generic #202206251445 SMP Sat Jun 25 14:51:22 UTC 
2022 x86_64 x86_64 x86_64 GNU/Linux


NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sdd           8:48   0 279.4G  0 disk 
└─sdd1        8:49   0    60G  0 part 
  └─bcache0 252:0    0    60G  0 disk /home/ubuntu/bcache_mount
nvme0n1     259:0    0 372.6G  0 disk 
└─nvme0n1p1 259:2    0    15G  0 part 
  └─bcache0 252:0    0    60G  0 disk /home/ubuntu/bcache_mount
 
fio --name=read_iops --directory=/home/ubuntu/bcache_mount --size=12G 
--ioengine=libaio --direct=1 --verify=0 --bs=4K --iodepth=128 --rw=randread 
--group_reporting=1
read_iops: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 
4096B-4096B, ioengine=libaio, iodepth=128
fio-3.28
Starting 1 process
read_iops: Laying out IO file (1 file / 12288MiB)
 
The test does not progress beyond a few minutes, and this trace is then seen in 
the kernel log,

[ 4473.699902] INFO: task bcache_writebac:1835 blocked for more than 120 
seconds.
[ 4474.050921]       Not tainted 5.15.50-051550-generic #202206251445
[ 4474.350883] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 4474.731391] task:bcache_writebac state:D stack:    0 pid: 1835 ppid:     2 
flags:0x00004000
[ 4474.731408] Call Trace:
[ 4474.731411]  <TASK>
[ 4474.731413]  __schedule+0x23d/0x5a0
[ 4474.731433]  schedule+0x4e/0xb0
[ 4474.731436]  rwsem_down_write_slowpath+0x220/0x3d0
[ 4474.731441]  down_write+0x43/0x50
[ 4474.731446]  bch_writeback_thread+0x78/0x320 [bcache]
[ 4474.731471]  ? read_dirty_submit+0x70/0x70 [bcache]
[ 4474.731487]  kthread+0x12a/0x150
[ 4474.731491]  ? set_kthread_struct+0x50/0x50
[ 4474.731494]  ret_from_fork+0x22/0x30
[ 4474.731499]  </TASK>
[ 4474.731735] INFO: task fio:16626 blocked for more than 121 seconds.
[ 4475.035858]       Not tainted 5.15.50-051550-generic #202206251445
[ 4475.335859] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 4475.716526] task:fio             state:D stack:    0 pid:16626 ppid:  1715 
flags:0x00004002
[ 4475.716531] Call Trace:
[ 4475.716546]  <TASK>
[ 4475.716549]  __schedule+0x23d/0x5a0
[ 4475.716555]  ? sysvec_apic_timer_interrupt+0x4e/0x90
[ 4475.716560]  schedule+0x4e/0xb0
[ 4475.716563]  rwsem_down_read_slowpath+0x32e/0x380
[ 4475.716567]  down_read+0x43/0x90
[ 4475.716571]  cached_dev_write+0x7e/0x480 [bcache]
[ 4475.716604]  cached_dev_submit_bio+0x502/0x550 [bcache]
[ 4475.716618]  __submit_bio+0x1a1/0x220
[ 4475.716623]  __submit_bio_noacct+0x85/0x1f0
[ 4475.716626]  submit_bio_noacct+0x4e/0x120
[ 4475.716628]  submit_bio+0x4a/0x130
[ 4475.716631]  iomap_submit_ioend+0x53/0x80
[ 4475.716634]  iomap_writepages+0x35/0x40
[ 4475.716636]  xfs_vm_writepages+0x84/0xb0 [xfs]
[ 4475.716730]  do_writepages+0xda/0x200
[ 4475.716736]  filemap_fdatawrite_wbc+0x81/0xd0
[ 4475.716739]  file_write_and_wait_range+0xac/0xf0
[ 4475.716742]  xfs_file_fsync+0x5b/0x250 [xfs]
[ 4475.716821]  vfs_fsync_range+0x49/0x80
[ 4475.716826]  ? __fget_light+0x32/0x80
[ 4475.716829]  __x64_sys_fsync+0x38/0x60
[ 4475.716832]  do_syscall_64+0x5c/0xc0
[ 4475.716836]  ? ksys_write+0xce/0xe0
[ 4475.716838]  ? exit_to_user_mode_prepare+0x37/0xb0
[ 4475.716843]  ? syscall_exit_to_user_mode+0x27/0x50
[ 4475.716847]  ? __x64_sys_write+0x19/0x20
[ 4475.716849]  ? do_syscall_64+0x69/0xc0
[ 4475.716851]  ? syscall_exit_to_user_mode+0x27/0x50
[ 4475.716855]  ? __x64_sys_write+0x19/0x20
[ 4475.716857]  ? do_syscall_64+0x69/0xc0
[ 4475.716859]  ? exit_to_user_mode_prepare+0x37/0xb0
[ 4475.716862]  ? syscall_exit_to_user_mode+0x27/0x50
[ 4475.716865]  ? __x64_sys_write+0x19/0x20
[ 4475.716867]  ? do_syscall_64+0x69/0xc0
[ 4475.716869]  ? syscall_exit_to_user_mode+0x27/0x50
[ 4475.716873]  ? __x64_sys_write+0x19/0x20
[ 4475.716875]  ? do_syscall_64+0x69/0xc0
[ 4475.716877]  ? do_syscall_64+0x69/0xc0
[ 4475.716879]  ? asm_common_interrupt+0x8/0x40
[ 4475.716883]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 4475.716885] RIP: 0033:0x7fd3c498aa5b
[ 4475.716888] RSP: 002b:00007ffc81602ee0 EFLAGS: 00000293 ORIG_RAX: 
000000000000004a
[ 4475.716890] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd3c498aa5b
[ 4475.716892] RDX: 0000000000000000 RSI: 000055ededf41f40 RDI: 0000000000000006
[ 4475.716893] RBP: 0000000000001000 R08: 0000000000000000 R09: 000055ededf41f40
[ 4475.716895] R10: 00000000e8746551 R11: 0000000000000293 R12: 00007fd3ba533000
[ 4475.716896] R13: 0000000000000000 R14: 00007fd3c1cde110 R15: 0000000300000000
[ 4475.716898]  </TASK>
[ 4475.716912] INFO: task kworker/26:87:16682 blocked for more than 122 seconds.
[ 4476.063391]       Not tainted 5.15.50-051550-generic #202206251445
[ 4476.363476] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 4476.743988] task:kworker/26:87   state:D stack:    0 pid:16682 ppid:     2 
flags:0x00004000
[ 4476.743994] Workqueue: events update_writeback_rate [bcache]
[ 4476.744022] Call Trace:
[ 4476.744024]  <TASK>
[ 4476.744026]  __schedule+0x23d/0x5a0
[ 4476.744032]  schedule+0x4e/0xb0
[ 4476.744035]  rwsem_down_read_slowpath+0x32e/0x380
[ 4476.744039]  down_read+0x43/0x90
[ 4476.744042]  update_writeback_rate+0x134/0x190 [bcache]
[ 4476.744057]  process_one_work+0x22b/0x3d0
[ 4476.744063]  worker_thread+0x53/0x410
[ 4476.744066]  ? process_one_work+0x3d0/0x3d0
[ 4476.744070]  kthread+0x12a/0x150
[ 4476.744074]  ? set_kthread_struct+0x50/0x50
[ 4476.744077]  ret_from_fork+0x22/0x30
[ 4476.744082]  </TASK>
[ 4598.626220] INFO: task bcache_writebac:1835 blocked for more than 245 
seconds.
[ 4598.978432]       Not tainted 5.15.50-051550-generic #202206251445
[ 4599.279900] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 4599.657293] task:bcache_writebac state:D stack:    0 pid: 1835 ppid:     2 
flags:0x00004000
[ 4599.657298] Call Trace:
[ 4599.657300]  <TASK>
[ 4599.657302]  __schedule+0x23d/0x5a0
[ 4599.657325]  schedule+0x4e/0xb0
[ 4599.657328]  rwsem_down_write_slowpath+0x220/0x3d0
[ 4599.657333]  down_write+0x43/0x50
[ 4599.657338]  bch_writeback_thread+0x78/0x320 [bcache]
[ 4599.657374]  ? read_dirty_submit+0x70/0x70 [bcache]
[ 4599.657390]  kthread+0x12a/0x150
[ 4599.657393]  ? set_kthread_struct+0x50/0x50
[ 4599.657396]  ret_from_fork+0x22/0x30
[ 4599.657401]  </TASK>
[ 4599.657434] INFO: task kworker/2:63:14895 blocked for more than 124 seconds.
[ 4600.000498]       Not tainted 5.15.50-051550-generic #202206251445
[ 4600.301188] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 4600.681984] task:kworker/2:63    state:D stack:    0 pid:14895 ppid:     2 
flags:0x00004000
[ 4600.682005] Workqueue: bcache bch_data_insert_start [bcache]
[ 4600.682031] Call Trace:
[ 4600.682033]  <TASK>
[ 4600.682035]  __schedule+0x23d/0x5a0
[ 4600.682040]  schedule+0x4e/0xb0
[ 4600.682043]  bch_bucket_alloc+0x21a/0x5b0 [bcache]
[ 4600.682066]  ? wait_woken+0x70/0x70
[ 4600.682071]  __bch_bucket_alloc_set+0x52/0xf0 [bcache]
[ 4600.682082]  bch_alloc_sectors+0x1c9/0x4c0 [bcache]
[ 4600.682098]  ? krealloc+0x9d/0xd0
[ 4600.682102]  ? __bch_keylist_realloc+0xb7/0x100 [bcache]
[ 4600.682113]  ? __bch_submit_bbio+0x97/0xb0 [bcache]
[ 4600.682126]  bch_data_insert_start+0x15e/0x3a0 [bcache]
[ 4600.682139]  ? closure_sub+0x94/0xb0 [bcache]
[ 4600.682152]  process_one_work+0x22b/0x3d0
[ 4600.682157]  worker_thread+0x53/0x410
[ 4600.682161]  ? process_one_work+0x3d0/0x3d0
[ 4600.682164]  kthread+0x12a/0x150
[ 4600.682168]  ? set_kthread_struct+0x50/0x50
[ 4600.682171]  ret_from_fork+0x22/0x30
[ 4600.682176]  </TASK>
[ 4600.682240] INFO: task kworker/25:123:16464 blocked for more than 125 
seconds.
[ 4601.033116]       Not tainted 5.15.50-051550-generic #202206251445
[ 4601.333405] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 4601.713829] task:kworker/25:123  state:D stack:    0 pid:16464 ppid:     2 
flags:0x00004000
[ 4601.713835] Workqueue: xfs-sync/bcache0 xfs_log_worker [xfs]
[ 4601.713975] Call Trace:
[ 4601.713977]  <TASK>
[ 4601.713979]  __schedule+0x23d/0x5a0
[ 4601.713986]  schedule+0x4e/0xb0
[ 4601.713988]  rwsem_down_read_slowpath+0x32e/0x380
[ 4601.713993]  down_read+0x43/0x90
[ 4601.713996]  cached_dev_write+0x7e/0x480 [bcache]
[ 4601.714018]  cached_dev_submit_bio+0x502/0x550 [bcache]
[ 4601.714031]  __submit_bio+0x1a1/0x220
[ 4601.714035]  ? ttwu_do_activate+0x72/0xf0
[ 4601.714039]  __submit_bio_noacct+0x85/0x1f0
[ 4601.714042]  ? mutex_lock+0x13/0x40
[ 4601.714045]  submit_bio_noacct+0x4e/0x120
[ 4601.714047]  submit_bio+0x4a/0x130
[ 4601.714050]  xlog_write_iclog+0x254/0x300 [xfs]
[ 4601.714142]  xlog_sync+0x1ab/0x2c0 [xfs]
[ 4601.714225]  xlog_state_release_iclog+0x123/0x1d0 [xfs]
[ 4601.714306]  xfs_log_force+0x186/0x250 [xfs]
[ 4601.714386]  xfs_log_worker+0x39/0x90 [xfs]
[ 4601.714466]  process_one_work+0x22b/0x3d0
[ 4601.714471]  worker_thread+0x53/0x410
[ 4601.714474]  ? process_one_work+0x3d0/0x3d0
[ 4601.714477]  kthread+0x12a/0x150
[ 4601.714481]  ? set_kthread_struct+0x50/0x50
[ 4601.714484]  ret_from_fork+0x22/0x30
[ 4601.714489]  </TASK>
[ 4601.714526] INFO: task kworker/22:89:16557 blocked for more than 126 seconds.
[ 4602.056844]       Not tainted 5.15.50-051550-generic #202206251445
[ 4602.357148] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 4602.733923] task:kworker/22:89   state:D stack:    0 pid:16557 ppid:     2 
flags:0x00004000
[ 4602.733940] Workqueue: bcache bch_data_insert_start [bcache]
[ 4602.733966] Call Trace:
[ 4602.733968]  <TASK>
[ 4602.733970]  __schedule+0x23d/0x5a0
[ 4602.733988]  schedule+0x4e/0xb0
[ 4602.733991]  bch_bucket_alloc+0x21a/0x5b0 [bcache]
[ 4602.734002]  ? bch_btree_insert_check_key+0x1e0/0x1e0 [bcache]
[ 4602.734014]  ? wait_woken+0x70/0x70
[ 4602.734019]  __bch_bucket_alloc_set+0x52/0xf0 [bcache]
[ 4602.734031]  bch_alloc_sectors+0x1c9/0x4c0 [bcache]
[ 4602.734043]  ? __wake_up_common_lock+0x8a/0xc0
[ 4602.734046]  bch_data_insert_start+0x15e/0x3a0 [bcache]
[ 4602.734062]  ? closure_sub+0x94/0xb0 [bcache]
[ 4602.734083]  process_one_work+0x22b/0x3d0
[ 4602.734088]  worker_thread+0x53/0x410
[ 4602.734092]  ? process_one_work+0x3d0/0x3d0
[ 4602.734095]  kthread+0x12a/0x150
[ 4602.734099]  ? set_kthread_struct+0x50/0x50
[ 4602.734102]  ret_from_fork+0x22/0x30
[ 4602.734108]  </TASK>
[ 4602.734134] INFO: task fio:16626 blocked for more than 249 seconds.
[ 4603.034541]       Not tainted 5.15.50-051550-generic #202206251445
[ 4603.330400] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 4603.711013] task:fio             state:D stack:    0 pid:16626 ppid:  1715 
flags:0x00004002
[ 4603.711030] Call Trace:
[ 4603.711032]  <TASK>
[ 4603.711046]  __schedule+0x23d/0x5a0
[ 4603.711054]  ? sysvec_apic_timer_interrupt+0x4e/0x90
[ 4603.711060]  schedule+0x4e/0xb0
[ 4603.711063]  rwsem_down_read_slowpath+0x32e/0x380
[ 4603.711067]  down_read+0x43/0x90
[ 4603.711072]  cached_dev_write+0x7e/0x480 [bcache]
[ 4603.711095]  cached_dev_submit_bio+0x502/0x550 [bcache]
[ 4603.711108]  __submit_bio+0x1a1/0x220
[ 4603.711113]  __submit_bio_noacct+0x85/0x1f0
[ 4603.711116]  submit_bio_noacct+0x4e/0x120
[ 4603.711118]  submit_bio+0x4a/0x130
[ 4603.711121]  iomap_submit_ioend+0x53/0x80
[ 4603.711124]  iomap_writepages+0x35/0x40
[ 4603.711126]  xfs_vm_writepages+0x84/0xb0 [xfs]
[ 4603.711223]  do_writepages+0xda/0x200
[ 4603.711229]  filemap_fdatawrite_wbc+0x81/0xd0
[ 4603.711232]  file_write_and_wait_range+0xac/0xf0
[ 4603.711235]  xfs_file_fsync+0x5b/0x250 [xfs]
[ 4603.711312]  vfs_fsync_range+0x49/0x80
[ 4603.711317]  ? __fget_light+0x32/0x80
[ 4603.711321]  __x64_sys_fsync+0x38/0x60
[ 4603.711324]  do_syscall_64+0x5c/0xc0
[ 4603.711327]  ? ksys_write+0xce/0xe0
[ 4603.711330]  ? exit_to_user_mode_prepare+0x37/0xb0
[ 4603.711335]  ? syscall_exit_to_user_mode+0x27/0x50
[ 4603.711338]  ? __x64_sys_write+0x19/0x20
[ 4603.711340]  ? do_syscall_64+0x69/0xc0
[ 4603.711342]  ? syscall_exit_to_user_mode+0x27/0x50
[ 4603.711345]  ? __x64_sys_write+0x19/0x20
[ 4603.711347]  ? do_syscall_64+0x69/0xc0
[ 4603.711349]  ? exit_to_user_mode_prepare+0x37/0xb0
[ 4603.711352]  ? syscall_exit_to_user_mode+0x27/0x50
[ 4603.711355]  ? __x64_sys_write+0x19/0x20
[ 4603.711357]  ? do_syscall_64+0x69/0xc0
[ 4603.711359]  ? syscall_exit_to_user_mode+0x27/0x50
[ 4603.711362]  ? __x64_sys_write+0x19/0x20
[ 4603.711364]  ? do_syscall_64+0x69/0xc0
[ 4603.711366]  ? do_syscall_64+0x69/0xc0
[ 4603.711368]  ? asm_common_interrupt+0x8/0x40
[ 4603.711371]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 4603.711374] RIP: 0033:0x7fd3c498aa5b
[ 4603.711376] RSP: 002b:00007ffc81602ee0 EFLAGS: 00000293 ORIG_RAX: 
000000000000004a
[ 4603.711379] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd3c498aa5b
[ 4603.711381] RDX: 0000000000000000 RSI: 000055ededf41f40 RDI: 0000000000000006
[ 4603.711382] RBP: 0000000000001000 R08: 0000000000000000 R09: 000055ededf41f40
[ 4603.711383] R10: 00000000e8746551 R11: 0000000000000293 R12: 00007fd3ba533000
[ 4603.711384] R13: 0000000000000000 R14: 00007fd3c1cde110 R15: 0000000300000000
[ 4603.711387]  </TASK>
[ 4603.711421] INFO: task kworker/26:87:16682 blocked for more than 250 seconds.
[ 4604.058441]       Not tainted 5.15.50-051550-generic #202206251445
[ 4604.359128] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 4604.739950] task:kworker/26:87   state:D stack:    0 pid:16682 ppid:     2 
flags:0x00004000
[ 4604.739968] Workqueue: events update_writeback_rate [bcache]
[ 4604.739995] Call Trace:
[ 4604.739997]  <TASK>
[ 4604.740012]  __schedule+0x23d/0x5a0
[ 4604.740017]  schedule+0x4e/0xb0
[ 4604.740020]  rwsem_down_read_slowpath+0x32e/0x380
[ 4604.740024]  down_read+0x43/0x90
[ 4604.740027]  update_writeback_rate+0x134/0x190 [bcache]
[ 4604.740043]  process_one_work+0x22b/0x3d0
[ 4604.740049]  worker_thread+0x53/0x410
[ 4604.740052]  ? process_one_work+0x3d0/0x3d0
[ 4604.740055]  kthread+0x12a/0x150
[ 4604.740059]  ? set_kthread_struct+0x50/0x50
[ 4604.740062]  ret_from_fork+0x22/0x30
[ 4604.740067]  </TASK>
[ 4725.600602] INFO: task bcache_writebac:1835 blocked for more than 372 
seconds.
[ 4725.952827]       Not tainted 5.15.50-051550-generic #202206251445
[ 4726.253763] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 4726.630937] task:bcache_writebac state:D stack:    0 pid: 1835 ppid:     2 
flags:0x00004000
[ 4726.630954] Call Trace:
[ 4726.630956]  <TASK>
[ 4726.630959]  __schedule+0x23d/0x5a0
[ 4726.630966]  schedule+0x4e/0xb0
[ 4726.630970]  rwsem_down_write_slowpath+0x220/0x3d0
[ 4726.630976]  down_write+0x43/0x50
[ 4726.630980]  bch_writeback_thread+0x78/0x320 [bcache]
[ 4726.631005]  ? read_dirty_submit+0x70/0x70 [bcache]
[ 4726.631021]  kthread+0x12a/0x150
[ 4726.631025]  ? set_kthread_struct+0x50/0x50
[ 4726.631028]  ret_from_fork+0x22/0x30
[ 4726.631033]  </TASK>

[Fix]

commit b57a534df246999bc09a510dc0e0125cacd1724c (HEAD)
Author: Coly Li <col...@suse.de>
Date:   Fri May 27 23:28:16 2022 +0800

    bcache: memset on stack variables in bch_btree_check() and
bch_sectors_dirty_init()

I have built this fix into kernel 5.15.50
(https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.15/ ) and tested to
verify the problem is fixed.

root@nikhilbionic:/home/ubuntu/nikhil/5_15_50/cod/mainline/v5.15.50# git log
commit b57a534df246999bc09a510dc0e0125cacd1724c (HEAD)
Author: Coly Li <col...@suse.de>
Date:   Fri May 27 23:28:16 2022 +0800

    bcache: memset on stack variables in bch_btree_check() and 
bch_sectors_dirty_init()
    
    The local variables check_state (in bch_btree_check()) and state (in
    bch_sectors_dirty_init()) should be fully filled by 0, because before
    allocating them on stack, they were dynamically allocated by kzalloc().
    
    Signed-off-by: Coly Li <col...@suse.de>
    Link: https://lore.kernel.org/r/20220527152818.27545-2-col...@suse.de
    Signed-off-by: Jens Axboe <ax...@kernel.dk>

commit 18a33c8dabb88b50b860e0177a73933f2c0ddf68 (tag: v5.15.50)
Author: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Date:   Sat Jun 25 15:18:40 2022 +0200

    Linux 5.15.50
    
    Link: https://lore.kernel.org/r/20220623164322.288837...@linuxfoundation.org
    Tested-by: Florian Fainelli <f.faine...@gmail.com>
    Tested-by: Shuah Khan <sk...@linuxfoundation.org>
    Tested-by: Bagas Sanjaya <bagasdo...@gmail.com>
    Tested-by: Jon Hunter <jonath...@nvidia.com>
    Tested-by: Sudip Mukherjee <sudip.mukher...@codethink.co.uk>
    Tested-by: Ron Economos <r...@w6rz.net>
    Tested-by: Guenter Roeck <li...@roeck-us.net>
    Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>


[Regression Potential]

I have not seen any potential drawbacks or harmful effects of this fix
in my testing. In fact it is required, without which the deadlock is
easily reproduced both on focal as well as jammy GA.

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: New

** Affects: linux (Ubuntu Focal)
     Importance: Undecided
         Status: New

** Affects: linux (Ubuntu Jammy)
     Importance: Undecided
         Status: New

** Also affects: linux (Ubuntu Focal)
   Importance: Undecided
       Status: New

** Also affects: linux (Ubuntu Jammy)
   Importance: Undecided
       Status: New

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1980925

Title:
  [SRU] bcache deadlock during read IO in writeback mode

Status in linux package in Ubuntu:
  New
Status in linux source package in Focal:
  New
Status in linux source package in Jammy:
  New

Bug description:
  SRU Justification:

  [Impact]
  This bug in bcache affects focal and jammy releases.

  When Random Read I/O is started with a test like -

  fio --name=read_iops --directory=/home/ubuntu/bcache_mount/ --size=16G
  --ioengine=libaio --direct=1 --verify=0 --bs=4K --iodepth=128
  --rw=randread --randrepeat=0

  or

  random read-writes with a test like,

  fio --filename=/home/ubuntu/bcache_mount/cachedfile --size=15GB
  --direct=1 --rw=randrw --bs=4k --ioengine=libaio --iodepth=128
  --name=iops-test-job --randrepeat=0

  traces are seen in the kernel log,

  [ 4473.699902] INFO: task bcache_writebac:1835 blocked for more than 120 
seconds.
  [ 4474.050921]       Not tainted 5.15.50-051550-generic #202206251445
  [ 4474.350883] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [ 4474.731391] task:bcache_writebac state:D stack:    0 pid: 1835 ppid:     2 
flags:0x00004000
  [ 4474.731408] Call Trace:
  [ 4474.731411]  <TASK>
  [ 4474.731413]  __schedule+0x23d/0x5a0
  [ 4474.731433]  schedule+0x4e/0xb0
  [ 4474.731436]  rwsem_down_write_slowpath+0x220/0x3d0
  [ 4474.731441]  down_write+0x43/0x50
  [ 4474.731446]  bch_writeback_thread+0x78/0x320 [bcache]
  [ 4474.731471]  ? read_dirty_submit+0x70/0x70 [bcache]
  [ 4474.731487]  kthread+0x12a/0x150
  [ 4474.731491]  ? set_kthread_struct+0x50/0x50
  [ 4474.731494]  ret_from_fork+0x22/0x30
  [ 4474.731499]  </TASK>

  The bug exists till kernel 5.15.50-051550-generic, and is fixed
  through upstream commit

  commit b57a534df246999bc09a510dc0e0125cacd1724c (HEAD)
  Author: Coly Li <col...@suse.de>
  Date:   Fri May 27 23:28:16 2022 +0800

      bcache: memset on stack variables in bch_btree_check() and 
bch_sectors_dirty_init()
      
      The local variables check_state (in bch_btree_check()) and state (in
      bch_sectors_dirty_init()) should be fully filled by 0, because before
      allocating them on stack, they were dynamically allocated by kzalloc().
      
      Signed-off-by: Coly Li <col...@suse.de>
      Link: https://lore.kernel.org/r/20220527152818.27545-2-col...@suse.de
      Signed-off-by: Jens Axboe <ax...@kernel.dk>

  
  The reproducer is pasted below:

  # uname -a
  Linux bronzor 5.15.50-051550-generic #202206251445 SMP Sat Jun 25 14:51:22 
UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

  
  NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
  sdd           8:48   0 279.4G  0 disk 
  └─sdd1        8:49   0    60G  0 part 
    └─bcache0 252:0    0    60G  0 disk /home/ubuntu/bcache_mount
  nvme0n1     259:0    0 372.6G  0 disk 
  └─nvme0n1p1 259:2    0    15G  0 part 
    └─bcache0 252:0    0    60G  0 disk /home/ubuntu/bcache_mount
   
  fio --name=read_iops --directory=/home/ubuntu/bcache_mount --size=12G 
--ioengine=libaio --direct=1 --verify=0 --bs=4K --iodepth=128 --rw=randread 
--group_reporting=1
  read_iops: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 
4096B-4096B, ioengine=libaio, iodepth=128
  fio-3.28
  Starting 1 process
  read_iops: Laying out IO file (1 file / 12288MiB)
   
  The test does not progress beyond a few minutes, and this trace is then seen 
in the kernel log,

  [ 4473.699902] INFO: task bcache_writebac:1835 blocked for more than 120 
seconds.
  [ 4474.050921]       Not tainted 5.15.50-051550-generic #202206251445
  [ 4474.350883] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [ 4474.731391] task:bcache_writebac state:D stack:    0 pid: 1835 ppid:     2 
flags:0x00004000
  [ 4474.731408] Call Trace:
  [ 4474.731411]  <TASK>
  [ 4474.731413]  __schedule+0x23d/0x5a0
  [ 4474.731433]  schedule+0x4e/0xb0
  [ 4474.731436]  rwsem_down_write_slowpath+0x220/0x3d0
  [ 4474.731441]  down_write+0x43/0x50
  [ 4474.731446]  bch_writeback_thread+0x78/0x320 [bcache]
  [ 4474.731471]  ? read_dirty_submit+0x70/0x70 [bcache]
  [ 4474.731487]  kthread+0x12a/0x150
  [ 4474.731491]  ? set_kthread_struct+0x50/0x50
  [ 4474.731494]  ret_from_fork+0x22/0x30
  [ 4474.731499]  </TASK>
  [ 4474.731735] INFO: task fio:16626 blocked for more than 121 seconds.
  [ 4475.035858]       Not tainted 5.15.50-051550-generic #202206251445
  [ 4475.335859] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [ 4475.716526] task:fio             state:D stack:    0 pid:16626 ppid:  1715 
flags:0x00004002
  [ 4475.716531] Call Trace:
  [ 4475.716546]  <TASK>
  [ 4475.716549]  __schedule+0x23d/0x5a0
  [ 4475.716555]  ? sysvec_apic_timer_interrupt+0x4e/0x90
  [ 4475.716560]  schedule+0x4e/0xb0
  [ 4475.716563]  rwsem_down_read_slowpath+0x32e/0x380
  [ 4475.716567]  down_read+0x43/0x90
  [ 4475.716571]  cached_dev_write+0x7e/0x480 [bcache]
  [ 4475.716604]  cached_dev_submit_bio+0x502/0x550 [bcache]
  [ 4475.716618]  __submit_bio+0x1a1/0x220
  [ 4475.716623]  __submit_bio_noacct+0x85/0x1f0
  [ 4475.716626]  submit_bio_noacct+0x4e/0x120
  [ 4475.716628]  submit_bio+0x4a/0x130
  [ 4475.716631]  iomap_submit_ioend+0x53/0x80
  [ 4475.716634]  iomap_writepages+0x35/0x40
  [ 4475.716636]  xfs_vm_writepages+0x84/0xb0 [xfs]
  [ 4475.716730]  do_writepages+0xda/0x200
  [ 4475.716736]  filemap_fdatawrite_wbc+0x81/0xd0
  [ 4475.716739]  file_write_and_wait_range+0xac/0xf0
  [ 4475.716742]  xfs_file_fsync+0x5b/0x250 [xfs]
  [ 4475.716821]  vfs_fsync_range+0x49/0x80
  [ 4475.716826]  ? __fget_light+0x32/0x80
  [ 4475.716829]  __x64_sys_fsync+0x38/0x60
  [ 4475.716832]  do_syscall_64+0x5c/0xc0
  [ 4475.716836]  ? ksys_write+0xce/0xe0
  [ 4475.716838]  ? exit_to_user_mode_prepare+0x37/0xb0
  [ 4475.716843]  ? syscall_exit_to_user_mode+0x27/0x50
  [ 4475.716847]  ? __x64_sys_write+0x19/0x20
  [ 4475.716849]  ? do_syscall_64+0x69/0xc0
  [ 4475.716851]  ? syscall_exit_to_user_mode+0x27/0x50
  [ 4475.716855]  ? __x64_sys_write+0x19/0x20
  [ 4475.716857]  ? do_syscall_64+0x69/0xc0
  [ 4475.716859]  ? exit_to_user_mode_prepare+0x37/0xb0
  [ 4475.716862]  ? syscall_exit_to_user_mode+0x27/0x50
  [ 4475.716865]  ? __x64_sys_write+0x19/0x20
  [ 4475.716867]  ? do_syscall_64+0x69/0xc0
  [ 4475.716869]  ? syscall_exit_to_user_mode+0x27/0x50
  [ 4475.716873]  ? __x64_sys_write+0x19/0x20
  [ 4475.716875]  ? do_syscall_64+0x69/0xc0
  [ 4475.716877]  ? do_syscall_64+0x69/0xc0
  [ 4475.716879]  ? asm_common_interrupt+0x8/0x40
  [ 4475.716883]  entry_SYSCALL_64_after_hwframe+0x44/0xae
  [ 4475.716885] RIP: 0033:0x7fd3c498aa5b
  [ 4475.716888] RSP: 002b:00007ffc81602ee0 EFLAGS: 00000293 ORIG_RAX: 
000000000000004a
  [ 4475.716890] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 
00007fd3c498aa5b
  [ 4475.716892] RDX: 0000000000000000 RSI: 000055ededf41f40 RDI: 
0000000000000006
  [ 4475.716893] RBP: 0000000000001000 R08: 0000000000000000 R09: 
000055ededf41f40
  [ 4475.716895] R10: 00000000e8746551 R11: 0000000000000293 R12: 
00007fd3ba533000
  [ 4475.716896] R13: 0000000000000000 R14: 00007fd3c1cde110 R15: 
0000000300000000
  [ 4475.716898]  </TASK>
  [ 4475.716912] INFO: task kworker/26:87:16682 blocked for more than 122 
seconds.
  [ 4476.063391]       Not tainted 5.15.50-051550-generic #202206251445
  [ 4476.363476] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [ 4476.743988] task:kworker/26:87   state:D stack:    0 pid:16682 ppid:     2 
flags:0x00004000
  [ 4476.743994] Workqueue: events update_writeback_rate [bcache]
  [ 4476.744022] Call Trace:
  [ 4476.744024]  <TASK>
  [ 4476.744026]  __schedule+0x23d/0x5a0
  [ 4476.744032]  schedule+0x4e/0xb0
  [ 4476.744035]  rwsem_down_read_slowpath+0x32e/0x380
  [ 4476.744039]  down_read+0x43/0x90
  [ 4476.744042]  update_writeback_rate+0x134/0x190 [bcache]
  [ 4476.744057]  process_one_work+0x22b/0x3d0
  [ 4476.744063]  worker_thread+0x53/0x410
  [ 4476.744066]  ? process_one_work+0x3d0/0x3d0
  [ 4476.744070]  kthread+0x12a/0x150
  [ 4476.744074]  ? set_kthread_struct+0x50/0x50
  [ 4476.744077]  ret_from_fork+0x22/0x30
  [ 4476.744082]  </TASK>
  [ 4598.626220] INFO: task bcache_writebac:1835 blocked for more than 245 
seconds.
  [ 4598.978432]       Not tainted 5.15.50-051550-generic #202206251445
  [ 4599.279900] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [ 4599.657293] task:bcache_writebac state:D stack:    0 pid: 1835 ppid:     2 
flags:0x00004000
  [ 4599.657298] Call Trace:
  [ 4599.657300]  <TASK>
  [ 4599.657302]  __schedule+0x23d/0x5a0
  [ 4599.657325]  schedule+0x4e/0xb0
  [ 4599.657328]  rwsem_down_write_slowpath+0x220/0x3d0
  [ 4599.657333]  down_write+0x43/0x50
  [ 4599.657338]  bch_writeback_thread+0x78/0x320 [bcache]
  [ 4599.657374]  ? read_dirty_submit+0x70/0x70 [bcache]
  [ 4599.657390]  kthread+0x12a/0x150
  [ 4599.657393]  ? set_kthread_struct+0x50/0x50
  [ 4599.657396]  ret_from_fork+0x22/0x30
  [ 4599.657401]  </TASK>
  [ 4599.657434] INFO: task kworker/2:63:14895 blocked for more than 124 
seconds.
  [ 4600.000498]       Not tainted 5.15.50-051550-generic #202206251445
  [ 4600.301188] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [ 4600.681984] task:kworker/2:63    state:D stack:    0 pid:14895 ppid:     2 
flags:0x00004000
  [ 4600.682005] Workqueue: bcache bch_data_insert_start [bcache]
  [ 4600.682031] Call Trace:
  [ 4600.682033]  <TASK>
  [ 4600.682035]  __schedule+0x23d/0x5a0
  [ 4600.682040]  schedule+0x4e/0xb0
  [ 4600.682043]  bch_bucket_alloc+0x21a/0x5b0 [bcache]
  [ 4600.682066]  ? wait_woken+0x70/0x70
  [ 4600.682071]  __bch_bucket_alloc_set+0x52/0xf0 [bcache]
  [ 4600.682082]  bch_alloc_sectors+0x1c9/0x4c0 [bcache]
  [ 4600.682098]  ? krealloc+0x9d/0xd0
  [ 4600.682102]  ? __bch_keylist_realloc+0xb7/0x100 [bcache]
  [ 4600.682113]  ? __bch_submit_bbio+0x97/0xb0 [bcache]
  [ 4600.682126]  bch_data_insert_start+0x15e/0x3a0 [bcache]
  [ 4600.682139]  ? closure_sub+0x94/0xb0 [bcache]
  [ 4600.682152]  process_one_work+0x22b/0x3d0
  [ 4600.682157]  worker_thread+0x53/0x410
  [ 4600.682161]  ? process_one_work+0x3d0/0x3d0
  [ 4600.682164]  kthread+0x12a/0x150
  [ 4600.682168]  ? set_kthread_struct+0x50/0x50
  [ 4600.682171]  ret_from_fork+0x22/0x30
  [ 4600.682176]  </TASK>
  [ 4600.682240] INFO: task kworker/25:123:16464 blocked for more than 125 
seconds.
  [ 4601.033116]       Not tainted 5.15.50-051550-generic #202206251445
  [ 4601.333405] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [ 4601.713829] task:kworker/25:123  state:D stack:    0 pid:16464 ppid:     2 
flags:0x00004000
  [ 4601.713835] Workqueue: xfs-sync/bcache0 xfs_log_worker [xfs]
  [ 4601.713975] Call Trace:
  [ 4601.713977]  <TASK>
  [ 4601.713979]  __schedule+0x23d/0x5a0
  [ 4601.713986]  schedule+0x4e/0xb0
  [ 4601.713988]  rwsem_down_read_slowpath+0x32e/0x380
  [ 4601.713993]  down_read+0x43/0x90
  [ 4601.713996]  cached_dev_write+0x7e/0x480 [bcache]
  [ 4601.714018]  cached_dev_submit_bio+0x502/0x550 [bcache]
  [ 4601.714031]  __submit_bio+0x1a1/0x220
  [ 4601.714035]  ? ttwu_do_activate+0x72/0xf0
  [ 4601.714039]  __submit_bio_noacct+0x85/0x1f0
  [ 4601.714042]  ? mutex_lock+0x13/0x40
  [ 4601.714045]  submit_bio_noacct+0x4e/0x120
  [ 4601.714047]  submit_bio+0x4a/0x130
  [ 4601.714050]  xlog_write_iclog+0x254/0x300 [xfs]
  [ 4601.714142]  xlog_sync+0x1ab/0x2c0 [xfs]
  [ 4601.714225]  xlog_state_release_iclog+0x123/0x1d0 [xfs]
  [ 4601.714306]  xfs_log_force+0x186/0x250 [xfs]
  [ 4601.714386]  xfs_log_worker+0x39/0x90 [xfs]
  [ 4601.714466]  process_one_work+0x22b/0x3d0
  [ 4601.714471]  worker_thread+0x53/0x410
  [ 4601.714474]  ? process_one_work+0x3d0/0x3d0
  [ 4601.714477]  kthread+0x12a/0x150
  [ 4601.714481]  ? set_kthread_struct+0x50/0x50
  [ 4601.714484]  ret_from_fork+0x22/0x30
  [ 4601.714489]  </TASK>
  [ 4601.714526] INFO: task kworker/22:89:16557 blocked for more than 126 
seconds.
  [ 4602.056844]       Not tainted 5.15.50-051550-generic #202206251445
  [ 4602.357148] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [ 4602.733923] task:kworker/22:89   state:D stack:    0 pid:16557 ppid:     2 
flags:0x00004000
  [ 4602.733940] Workqueue: bcache bch_data_insert_start [bcache]
  [ 4602.733966] Call Trace:
  [ 4602.733968]  <TASK>
  [ 4602.733970]  __schedule+0x23d/0x5a0
  [ 4602.733988]  schedule+0x4e/0xb0
  [ 4602.733991]  bch_bucket_alloc+0x21a/0x5b0 [bcache]
  [ 4602.734002]  ? bch_btree_insert_check_key+0x1e0/0x1e0 [bcache]
  [ 4602.734014]  ? wait_woken+0x70/0x70
  [ 4602.734019]  __bch_bucket_alloc_set+0x52/0xf0 [bcache]
  [ 4602.734031]  bch_alloc_sectors+0x1c9/0x4c0 [bcache]
  [ 4602.734043]  ? __wake_up_common_lock+0x8a/0xc0
  [ 4602.734046]  bch_data_insert_start+0x15e/0x3a0 [bcache]
  [ 4602.734062]  ? closure_sub+0x94/0xb0 [bcache]
  [ 4602.734083]  process_one_work+0x22b/0x3d0
  [ 4602.734088]  worker_thread+0x53/0x410
  [ 4602.734092]  ? process_one_work+0x3d0/0x3d0
  [ 4602.734095]  kthread+0x12a/0x150
  [ 4602.734099]  ? set_kthread_struct+0x50/0x50
  [ 4602.734102]  ret_from_fork+0x22/0x30
  [ 4602.734108]  </TASK>
  [ 4602.734134] INFO: task fio:16626 blocked for more than 249 seconds.
  [ 4603.034541]       Not tainted 5.15.50-051550-generic #202206251445
  [ 4603.330400] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [ 4603.711013] task:fio             state:D stack:    0 pid:16626 ppid:  1715 
flags:0x00004002
  [ 4603.711030] Call Trace:
  [ 4603.711032]  <TASK>
  [ 4603.711046]  __schedule+0x23d/0x5a0
  [ 4603.711054]  ? sysvec_apic_timer_interrupt+0x4e/0x90
  [ 4603.711060]  schedule+0x4e/0xb0
  [ 4603.711063]  rwsem_down_read_slowpath+0x32e/0x380
  [ 4603.711067]  down_read+0x43/0x90
  [ 4603.711072]  cached_dev_write+0x7e/0x480 [bcache]
  [ 4603.711095]  cached_dev_submit_bio+0x502/0x550 [bcache]
  [ 4603.711108]  __submit_bio+0x1a1/0x220
  [ 4603.711113]  __submit_bio_noacct+0x85/0x1f0
  [ 4603.711116]  submit_bio_noacct+0x4e/0x120
  [ 4603.711118]  submit_bio+0x4a/0x130
  [ 4603.711121]  iomap_submit_ioend+0x53/0x80
  [ 4603.711124]  iomap_writepages+0x35/0x40
  [ 4603.711126]  xfs_vm_writepages+0x84/0xb0 [xfs]
  [ 4603.711223]  do_writepages+0xda/0x200
  [ 4603.711229]  filemap_fdatawrite_wbc+0x81/0xd0
  [ 4603.711232]  file_write_and_wait_range+0xac/0xf0
  [ 4603.711235]  xfs_file_fsync+0x5b/0x250 [xfs]
  [ 4603.711312]  vfs_fsync_range+0x49/0x80
  [ 4603.711317]  ? __fget_light+0x32/0x80
  [ 4603.711321]  __x64_sys_fsync+0x38/0x60
  [ 4603.711324]  do_syscall_64+0x5c/0xc0
  [ 4603.711327]  ? ksys_write+0xce/0xe0
  [ 4603.711330]  ? exit_to_user_mode_prepare+0x37/0xb0
  [ 4603.711335]  ? syscall_exit_to_user_mode+0x27/0x50
  [ 4603.711338]  ? __x64_sys_write+0x19/0x20
  [ 4603.711340]  ? do_syscall_64+0x69/0xc0
  [ 4603.711342]  ? syscall_exit_to_user_mode+0x27/0x50
  [ 4603.711345]  ? __x64_sys_write+0x19/0x20
  [ 4603.711347]  ? do_syscall_64+0x69/0xc0
  [ 4603.711349]  ? exit_to_user_mode_prepare+0x37/0xb0
  [ 4603.711352]  ? syscall_exit_to_user_mode+0x27/0x50
  [ 4603.711355]  ? __x64_sys_write+0x19/0x20
  [ 4603.711357]  ? do_syscall_64+0x69/0xc0
  [ 4603.711359]  ? syscall_exit_to_user_mode+0x27/0x50
  [ 4603.711362]  ? __x64_sys_write+0x19/0x20
  [ 4603.711364]  ? do_syscall_64+0x69/0xc0
  [ 4603.711366]  ? do_syscall_64+0x69/0xc0
  [ 4603.711368]  ? asm_common_interrupt+0x8/0x40
  [ 4603.711371]  entry_SYSCALL_64_after_hwframe+0x44/0xae
  [ 4603.711374] RIP: 0033:0x7fd3c498aa5b
  [ 4603.711376] RSP: 002b:00007ffc81602ee0 EFLAGS: 00000293 ORIG_RAX: 
000000000000004a
  [ 4603.711379] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 
00007fd3c498aa5b
  [ 4603.711381] RDX: 0000000000000000 RSI: 000055ededf41f40 RDI: 
0000000000000006
  [ 4603.711382] RBP: 0000000000001000 R08: 0000000000000000 R09: 
000055ededf41f40
  [ 4603.711383] R10: 00000000e8746551 R11: 0000000000000293 R12: 
00007fd3ba533000
  [ 4603.711384] R13: 0000000000000000 R14: 00007fd3c1cde110 R15: 
0000000300000000
  [ 4603.711387]  </TASK>
  [ 4603.711421] INFO: task kworker/26:87:16682 blocked for more than 250 
seconds.
  [ 4604.058441]       Not tainted 5.15.50-051550-generic #202206251445
  [ 4604.359128] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [ 4604.739950] task:kworker/26:87   state:D stack:    0 pid:16682 ppid:     2 
flags:0x00004000
  [ 4604.739968] Workqueue: events update_writeback_rate [bcache]
  [ 4604.739995] Call Trace:
  [ 4604.739997]  <TASK>
  [ 4604.740012]  __schedule+0x23d/0x5a0
  [ 4604.740017]  schedule+0x4e/0xb0
  [ 4604.740020]  rwsem_down_read_slowpath+0x32e/0x380
  [ 4604.740024]  down_read+0x43/0x90
  [ 4604.740027]  update_writeback_rate+0x134/0x190 [bcache]
  [ 4604.740043]  process_one_work+0x22b/0x3d0
  [ 4604.740049]  worker_thread+0x53/0x410
  [ 4604.740052]  ? process_one_work+0x3d0/0x3d0
  [ 4604.740055]  kthread+0x12a/0x150
  [ 4604.740059]  ? set_kthread_struct+0x50/0x50
  [ 4604.740062]  ret_from_fork+0x22/0x30
  [ 4604.740067]  </TASK>
  [ 4725.600602] INFO: task bcache_writebac:1835 blocked for more than 372 
seconds.
  [ 4725.952827]       Not tainted 5.15.50-051550-generic #202206251445
  [ 4726.253763] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [ 4726.630937] task:bcache_writebac state:D stack:    0 pid: 1835 ppid:     2 
flags:0x00004000
  [ 4726.630954] Call Trace:
  [ 4726.630956]  <TASK>
  [ 4726.630959]  __schedule+0x23d/0x5a0
  [ 4726.630966]  schedule+0x4e/0xb0
  [ 4726.630970]  rwsem_down_write_slowpath+0x220/0x3d0
  [ 4726.630976]  down_write+0x43/0x50
  [ 4726.630980]  bch_writeback_thread+0x78/0x320 [bcache]
  [ 4726.631005]  ? read_dirty_submit+0x70/0x70 [bcache]
  [ 4726.631021]  kthread+0x12a/0x150
  [ 4726.631025]  ? set_kthread_struct+0x50/0x50
  [ 4726.631028]  ret_from_fork+0x22/0x30
  [ 4726.631033]  </TASK>

  [Fix]

  commit b57a534df246999bc09a510dc0e0125cacd1724c (HEAD)
  Author: Coly Li <col...@suse.de>
  Date:   Fri May 27 23:28:16 2022 +0800

      bcache: memset on stack variables in bch_btree_check() and
  bch_sectors_dirty_init()

  I have built this fix into kernel 5.15.50
  (https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.15/ ) and tested to
  verify the problem is fixed.

  root@nikhilbionic:/home/ubuntu/nikhil/5_15_50/cod/mainline/v5.15.50# git log
  commit b57a534df246999bc09a510dc0e0125cacd1724c (HEAD)
  Author: Coly Li <col...@suse.de>
  Date:   Fri May 27 23:28:16 2022 +0800

      bcache: memset on stack variables in bch_btree_check() and 
bch_sectors_dirty_init()
      
      The local variables check_state (in bch_btree_check()) and state (in
      bch_sectors_dirty_init()) should be fully filled by 0, because before
      allocating them on stack, they were dynamically allocated by kzalloc().
      
      Signed-off-by: Coly Li <col...@suse.de>
      Link: https://lore.kernel.org/r/20220527152818.27545-2-col...@suse.de
      Signed-off-by: Jens Axboe <ax...@kernel.dk>

  commit 18a33c8dabb88b50b860e0177a73933f2c0ddf68 (tag: v5.15.50)
  Author: Greg Kroah-Hartman <gre...@linuxfoundation.org>
  Date:   Sat Jun 25 15:18:40 2022 +0200

      Linux 5.15.50
      
      Link: 
https://lore.kernel.org/r/20220623164322.288837...@linuxfoundation.org
      Tested-by: Florian Fainelli <f.faine...@gmail.com>
      Tested-by: Shuah Khan <sk...@linuxfoundation.org>
      Tested-by: Bagas Sanjaya <bagasdo...@gmail.com>
      Tested-by: Jon Hunter <jonath...@nvidia.com>
      Tested-by: Sudip Mukherjee <sudip.mukher...@codethink.co.uk>
      Tested-by: Ron Economos <r...@w6rz.net>
      Tested-by: Guenter Roeck <li...@roeck-us.net>
      Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

  
  [Regression Potential]

  I have not seen any potential drawbacks or harmful effects of this fix
  in my testing. In fact it is required, without which the deadlock is
  easily reproduced both on focal as well as jammy GA.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1980925/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to