Re: [PATCH v4] dm: optimize REQ_PREFLUSH with data when using the linear target

Shinichiro Kawasaki Wed, 17 Sep 2025 18:56:48 -0700

On Sep 15, 2025 / 17:42, Mikulas Patocka wrote:
[...]
> If the table has only linear targets and there is just one underlying
> device, we can optimize REQ_PREFLUSH with data - we don't have to split
> it to two bios - a flush and a write. We can pass it to the linear target
> directly.
> 
> Signed-off-by: Mikulas Patocka <[email protected]>
> Tested-by: Bart Van Assche <[email protected]>
> 
> ---
>  drivers/md/dm-core.h |    1 +
>  drivers/md/dm.c      |   31 +++++++++++++++++++++++--------
>  2 files changed, 24 insertions(+), 8 deletions(-)
> 
> Index: linux-2.6/drivers/md/dm.c
> ===================================================================
> --- linux-2.6.orig/drivers/md/dm.c    2025-09-15 17:30:25.000000000 +0200
> +++ linux-2.6/drivers/md/dm.c 2025-09-15 17:35:47.000000000 +0200
[...]
> @@ -976,11 +972,12 @@ static void __dm_io_complete(struct dm_i
>       if (requeued)
>               return;
>  
> -     if (bio_is_flush_with_data(bio)) {
> +     if (unlikely(io->requeue_flush_with_data)) {


Hello Mikluas,

Last night, I ran my fio test for zoned block devices using linux-next kernel
with the tag "next-20250917" and dm-crypt on QEMU ZNS drive. Then I observed
KASAN slab-use-after-free [1]. It happend at "__dm_io_complete+0x866/0x960",
which points to the line above added by this patch. It looks like 'io' is freed
before the reference to io->requeue_flush_with_data. I checked the function
__dm_io_complete(), and found that free_io(io) is called just before the line
of KASAN. It looks weird for me to access io->requeue_flush_with_data after
freeing io. Do you think this is the cause of the KASAN?

  961                 dm_end_io_acct(io);
  962         }
  963         free_io(io);    <-------------------------------- free io here?
  964         smp_wmb();
  965         this_cpu_dec(*md->pending_io)
  966
  967         /* nudge anyone waiting on suspend queue */
  968         if (unlikely(wq_has_sleeper(&md->wait)))
  969                 wake_up(&md->wait);
  970
  971         /* Return early if the original bio was requeued */
  972         if (requeued)
  973                 return;
  974
  975         if (unlikely(io->requeue_flush_with_data)) {  <---- KASAN suaf
  976                 /*
  977                  * Preflush done for flush with data, reissue
  978                  * without REQ_PREFLUSH.

I ran the fio test under the same condition several times, but I'm failing to
recreate the KASAN so far. This KASAN looks very rare.

[1]

[ 7042.301608][T62536] 
==================================================================
[ 7042.303303][T62536] BUG: KASAN: slab-use-after-free in 
__dm_io_complete+0x866/0x960
[ 7042.307002][T62536] Read of size 1 at addr ffff88813370caa1 by task 
kworker/u16:161/62536
[ 7042.308283][T62536] 
[ 7042.308656][T62536] CPU: 0 UID: 0 PID: 62536 Comm: kworker/u16:161 Not 
tainted 6.17.0-rc6-next-20250917-kts #1 PREEMPT(lazy) 
[ 7042.308668][T62536] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
1.16.3-4.fc42 04/01/2014
[ 7042.308679][T62536] Workqueue: kcryptd-252:0-1 kcryptd_crypt [dm_crypt]
[ 7042.308712][T62536] Call Trace:
[ 7042.308732][T62536]  <TASK>
[ 7042.308739][T62536]  ? __dm_io_complete+0x866/0x960
[ 7042.308752][T62536]  dump_stack_lvl+0x6e/0xa0
[ 7042.308767][T62536]  print_address_description.constprop.0+0x88/0x320
[ 7042.308785][T62536]  ? __dm_io_complete+0x866/0x960
[ 7042.308793][T62536]  print_report+0xfc/0x1ff
[ 7042.308801][T62536]  ? __virt_addr_valid+0x25a/0x4e0
[ 7042.308812][T62536]  ? __dm_io_complete+0x866/0x960
[ 7042.308820][T62536]  kasan_report+0xe1/0x1a0
[ 7042.308832][T62536]  ? __dm_io_complete+0x866/0x960
[ 7042.308845][T62536]  __dm_io_complete+0x866/0x960
[ 7042.308857][T62536]  clone_endio+0x36f/0x7f0
[ 7042.308867][T62536]  ? __pfx_clone_endio+0x10/0x10
[ 7042.308879][T62536]  ? crypt_dec_pending+0x1a4/0x4a0 [dm_crypt]
[ 7042.308897][T62536]  process_one_work+0x86b/0x14c0
[ 7042.308914][T62536]  ? __pfx_process_one_work+0x10/0x10
[ 7042.308929][T62536]  ? lock_is_held_type+0x9a/0x110
[ 7042.308939][T62536]  ? assign_work+0x156/0x390
[ 7042.308949][T62536]  worker_thread+0x5f2/0xfd0
[ 7042.308964][T62536]  ? __pfx_worker_thread+0x10/0x10
[ 7042.308972][T62536]  kthread+0x3a4/0x760
[ 7042.308983][T62536]  ? __pfx_kthread+0x10/0x10
[ 7042.308991][T62536]  ? __lock_release.isra.0+0x59/0x170
[ 7042.309002][T62536]  ? __pfx_kthread+0x10/0x10
[ 7042.309011][T62536]  ret_from_fork+0x2d6/0x3e0
[ 7042.309018][T62536]  ? __pfx_kthread+0x10/0x10
[ 7042.309025][T62536]  ? __pfx_kthread+0x10/0x10
[ 7042.309033][T62536]  ret_from_fork_asm+0x1a/0x30
[ 7042.309052][T62536]  </TASK>
[ 7042.309055][T62536] 
[ 7042.335577][T62536] Allocated by task 70951:
[ 7042.336313][T62536]  kasan_save_stack+0x30/0x50
[ 7042.337085][T62536]  kasan_save_track+0x14/0x30
[ 7042.337859][T62536]  __kasan_slab_alloc+0x7e/0x90
[ 7042.338699][T62536]  kmem_cache_alloc_noprof+0x219/0x7c0
[ 7042.339587][T62536]  mempool_alloc_noprof+0x12b/0x300
[ 7042.340351][T62536]  bio_alloc_bioset+0x1df/0x6e0
[ 7042.341153][T62536]  bio_alloc_clone+0x52/0x100
[ 7042.341828][T62536]  alloc_io+0x51/0x490
[ 7042.342478][T62536]  dm_split_and_process_bio+0x13d/0x1cd0
[ 7042.343387][T62536]  dm_submit_bio+0x131/0x470
[ 7042.343975][T62536]  __submit_bio+0x31f/0x700
[ 7042.344688][T62536]  __submit_bio_noacct+0x15f/0x600
[ 7042.345477][T62536]  submit_bio_noacct_nocheck+0x493/0x5b0
[ 7042.346375][T62536]  __blkdev_direct_IO+0x7e0/0xfc0
[ 7042.347084][T62536]  blkdev_read_iter+0x208/0x420
[ 7042.347742][T62536]  aio_read+0x2a3/0x490
[ 7042.348338][T62536]  io_submit_one+0x26c/0x8a0
[ 7042.349038][T62536]  __x64_sys_io_submit+0x161/0x2b0
[ 7042.350031][T62536]  do_syscall_64+0x94/0x7f0
[ 7042.350677][T62536]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 7042.351519][T62536] 
[ 7042.351818][T62536] Freed by task 70953:
[ 7042.352394][T62536]  kasan_save_stack+0x30/0x50
[ 7042.352965][T62536]  kasan_save_track+0x14/0x30
[ 7042.353740][T62536]  __kasan_save_free_info+0x3b/0x70
[ 7042.354584][T62536]  __kasan_slab_free+0x6b/0x90
[ 7042.355255][T62536]  slab_free_after_rcu_debug+0xf1/0x230
[ 7042.356062][T62536]  rcu_do_batch+0x34a/0x1900
[ 7042.356911][T62536]  rcu_core+0x62f/0x9f0
[ 7042.357709][T62536]  handle_softirqs+0x1de/0x7e0
[ 7042.358422][T62536]  __irq_exit_rcu+0x181/0x1d0
[ 7042.358994][T62536]  irq_exit_rcu+0xe/0x20
[ 7042.359639][T62536]  sysvec_apic_timer_interrupt+0x71/0x90
[ 7042.360701][T62536]  asm_sysvec_apic_timer_interrupt+0x1a/0x20
[ 7042.361423][T62536] 
[ 7042.361701][T62536] Last potentially related work creation:
[ 7042.362422][T62536]  kasan_save_stack+0x30/0x50
[ 7042.362970][T62536]  kasan_record_aux_stack+0xb0/0xc0
[ 7042.363595][T62536]  kmem_cache_free+0x3fb/0x730
[ 7042.364165][T62536]  __dm_io_complete+0xf1/0x960
[ 7042.364742][T62536]  clone_endio+0x36f/0x7f0
[ 7042.365276][T62536]  process_one_work+0x86b/0x14c0
[ 7042.365832][T62536]  worker_thread+0x5f2/0xfd0
[ 7042.366360][T62536]  kthread+0x3a4/0x760
[ 7042.366824][T62536]  ret_from_fork+0x2d6/0x3e0
[ 7042.367386][T62536]  ret_from_fork_asm+0x1a/0x30
[ 7042.367916][T62536] 
[ 7042.368171][T62536] Second to last potentially related work creation:
[ 7042.368949][T62536]  kasan_save_stack+0x30/0x50
[ 7042.369498][T62536]  kasan_record_aux_stack+0xb0/0xc0
[ 7042.370089][T62536]  __queue_work+0x315/0x1250
[ 7042.370624][T62536]  queue_work_on+0x6d/0xd0
[ 7042.371121][T62536]  blk_update_request+0x447/0x11c0
[ 7042.371726][T62536]  blk_mq_end_request+0x57/0x380
[ 7042.372308][T62536]  nvme_poll_cq+0x6f6/0x1020 [nvme]
[ 7042.372897][T62536]  nvme_irq+0x90/0xf0 [nvme]
[ 7042.373477][T62536]  __handle_irq_event_percpu+0x1c1/0x5b0
[ 7042.374083][T62536]  handle_irq_event+0xab/0x1c0
[ 7042.374627][T62536]  handle_edge_irq+0x2d8/0x880
[ 7042.375142][T62536]  __common_interrupt+0x82/0x170
[ 7042.375726][T62536]  common_interrupt+0x80/0xa0
[ 7042.376269][T62536]  asm_common_interrupt+0x26/0x40
[ 7042.376826][T62536] 
[ 7042.377077][T62536] The buggy address belongs to the object at 
ffff88813370c740
[ 7042.377077][T62536]  which belongs to the cache bio-1072 of size 1072
[ 7042.378609][T62536] The buggy address is located 865 bytes inside of
[ 7042.378609][T62536]  freed 1072-byte region [ffff88813370c740, 
ffff88813370cb70)
[ 7042.380253][T62536] 
[ 7042.380513][T62536] The buggy address belongs to the physical page:
[ 7042.381168][T62536] page: refcount:0 mapcount:0 mapping:0000000000000000 
index:0xffff88813370b900 pfn:0x133708
[ 7042.382292][T62536] head: order:3 mapcount:0 entire_mapcount:0 
nr_pages_mapped:0 pincount:0
[ 7042.383174][T62536] flags: 
0x17ffffc0000240(workingset|head|node=0|zone=2|lastcpupid=0x1fffff)
[ 7042.384094][T62536] page_type: f5(slab)
[ 7042.384554][T62536] raw: 0017ffffc0000240 ffff888134d8a640 ffffea0004cebe10 
ffffea000453ee10
[ 7042.385476][T62536] raw: ffff88813370b900 00000000001a0019 00000000f5000000 
0000000000000000
[ 7042.386359][T62536] head: 0017ffffc0000240 ffff888134d8a640 ffffea0004cebe10 
ffffea000453ee10
[ 7042.387275][T62536] head: ffff88813370b900 00000000001a0019 00000000f5000000 
0000000000000000
[ 7042.388137][T62536] head: 0017ffffc0000003 ffffea0004cdc201 00000000ffffffff 
00000000ffffffff
[ 7042.389107][T62536] head: ffffffffffffffff 0000000000000000 00000000ffffffff 
0000000000000008
[ 7042.390052][T62536] page dumped because: kasan: bad access detected
[ 7042.390761][T62536] 
[ 7042.391009][T62536] Memory state around the buggy address:
[ 7042.391635][T62536]  ffff88813370c980: fb fb fb fb fb fb fb fb fb fb fb fb 
fb fb fb fb
[ 7042.392487][T62536]  ffff88813370ca00: fb fb fb fb fb fb fb fb fb fb fb fb 
fb fb fb fb
[ 7042.393361][T62536] >ffff88813370ca80: fb fb fb fb fb fb fb fb fb fb fb fb 
fb fb fb fb
[ 7042.394191][T62536]                                ^
[ 7042.394740][T62536]  ffff88813370cb00: fb fb fb fb fb fb fb fb fb fb fb fb 
fb fb fc fc
[ 7042.395570][T62536]  ffff88813370cb80: fc fc fc fc fc fc fc fc fc fc fc fc 
fc fc fc fc
[ 7042.396444][T62536] 
==================================================================
[ 7042.397632][T62536] Disabling lock debugging due to kernel taint

Re: [PATCH v4] dm: optimize REQ_PREFLUSH with data when using the linear target

Reply via email to