On 10/9/24 17:35, Alexander Atanasov wrote:
Hi,

I haven't went in deep but one thing that looks suspicious is in 6/6 patch

+    if (pio->bi_op & REQ_PREFLUSH && (pio->bi_op & REQ_OP_MASK) != REQ_OP_FLUSH) {
+        flush_pio = ploop_alloc_pio(ploop, GFP_NOIO);
+        if (!flush_pio) {
+            pio->bi_status = BLK_STS_RESOURCE;
+            ploop_pio_endio(pio);
+            return;
+        }
+
+        ploop_init_pio(ploop, REQ_OP_FLUSH, flush_pio);
+        flush_pio->endio_cb = ploop_preflush_endio;
+        flush_pio->endio_cb_data = pio;
+        flush_pio->queue_list_id = PLOOP_LIST_FLUSH;
+        flush_pio->free_on_endio = true;
+    }
+
      write_lock_irq(&ploop->bat_rwlock);

      list_for_each_entry(md, &ploop->wb_batch_list_prio, wb_link) {
          md_inflight++;
-        md->piwb->flush_pio = pio;
+        md->piwb->flush_pio = flush_pio;
      }


one pio is allocted and then assigned to multiple md->piwb

notably : flush_pio->free_on_endio = true;

That is expected: flush pio is dispatched only when all piwb has completed atomic_dec(&flush_pio->md_inflight) to zero.
free_on_endio is needed to automatically drop these extra flush pios.

Crash is introduced with patch "dm-ploop: add delay for metadata writeback". Seems like it somehow broke BLKDISCARD. mkfs.xfs uses them, and there is some suspicious things:

55296 ioctl(5, BLKDISCARD, [0, 2147483648]) = 0
55296 write(1, "Discarding blocks...", 20) = 20
55296 ioctl(5, BLKDISCARD, [2147483648, 2147483648]) = 0
55296 ioctl(5, BLKDISCARD, [4294967296, 2147483648]) = 0
55296 ioctl(5, BLKDISCARD, [6442450944, 2147483648]) = 0
55296 write(1, "Done.\n", 6)            = 6
...
55296 pwrite64(5, "XFSB\0\0\20\0\0\0\0\0\0 \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512, 0) = 512
55296 fallocate(5, FALLOC_FL_ZERO_RANGE, 4294991872, 67108864) = 0
55296 pwrite64(5, "\376\355\272\276\0\0\0\1\0\0\0\2\0\0\2\0\0\0\0\1\0\0\0\0\0\0\0\1\0\0\0\0"..., 1024, 4294991872) = 1024 55296 pread64(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512, 8589934080) = 512 55296 pwrite64(5, "XFSB\0\0\20\0\0\0\0\0\0 \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512, 0) = 512 55296 pwrite64(5, "XAGF\0\0\0\1\0\0\0\0\0\10\0\0\0\0\0\1\0\0\0\2\0\0\0\0\0\0\0\1"..., 512, 512) = 512
...
55296 write(2, "Metadata CRC error detected at 0"..., 70) = 70
55296 write(2, "\n", 1)                 = 1
55296 pread64(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512, 2147484160) = 512
55296 write(2, "Metadata CRC error detected at 0"..., 75) = 75
55296 write(2, "\n", 1)

Second run on the same disk just hangs on
ioctl(4, BLKDISCARD, [0, 2147483648]



On 9.10.24 17:35, Denis V. Lunev wrote:
On 10/9/24 16:33, Denis V. Lunev wrote:
On 10/9/24 14:17, Andrey Zhadchenko wrote:
With this patchset, we start to check if the upcoming requests for ploop
have REQ_FUA flag.
If the flag is present:
  - IO submits immediately, completion waits for BAT update
  - BAT updates submit immediately
  - set IOCB_DSYNC for writes and BAT updates
  - BAT update does not trigger flush
If the flag is absent:
  - IO completion does not wait for BAT update
  - all BAT updates are periodically submitted in a batch
  - BAT update does not trigger flush

New parameter with metadata submit delay can be specified when setting
up ploop table

v2 changes:
- drop flush after BAT updates even for FUA requests
- set IOCB_DSYNC flag on writes triggered by FUA
- complete all accumulated metadata on FLUSH requests
- support REQ_PREFLUSH flag

Andrey Zhadchenko (3):
   dm-ploop: do not flush BAT writes if request was not FUA
   dm-ploop: do not wait for BAT update for non-FUA requests
   dm-ploop: add delay for metadata writeback


*** BLURB HERE ***

Andrey Zhadchenko (6):
   dm-ploop: do not flush after metadata writes
   dm-ploop: set IOCB_DSYNC on all FUA requests
   dm-ploop: do not wait for BAT update for non-FUA requests
   dm-ploop: add delay for metadata writeback
   dm-ploop: submit all postponed metadata on REQ_OP_FLUSH
   dm-ploop: handle REQ_PREFLUSH

  drivers/md/dm-ploop-bat.c    |   1 +
  drivers/md/dm-ploop-map.c    | 154 +++++++++++++++++++++++++++--------
  drivers/md/dm-ploop-target.c |  13 +++
  drivers/md/dm-ploop.h        |   9 ++
  4 files changed, 143 insertions(+), 34 deletions(-)

these patches do not work at all over 32 GB ramdisk

dev007 /mnt/ramdisk # ploop init -s 8G /mnt/ramdisk/root.hds
Creating delta /mnt/ramdisk/root.hds bs=2048 size=16777216 sectors v2
Adding snapshot {5fbaabe3-6958-40ff-92a7-860e329aab41}
Storing /mnt/ramdisk/DiskDescriptor.xml
Opening delta /mnt/ramdisk/root.hds
Opening delta /mnt/ramdisk/root.hds
Adding delta dev=/dev/mapper/ploop30904 img=/mnt/ramdisk/root.hds (rw)
Update inuse state /mnt/ramdisk/root.hds state=1953459801
Opening delta /mnt/ramdisk/root.hds
Running: partprobe /dev/mapper/ploop30904
Set untrusted /dev/mapper/ploop30904
Running: parted -s /dev/mapper/ploop30904 mklabel gpt
Running: sgdisk -n 1:2048:16775167 /dev/mapper/ploop30904
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot or after you
run partprobe(8) or kpartx(8)
The operation has completed successfully.
Running: partprobe /dev/mapper/ploop30904
Running: mkfs.xfs -b size=4096 /dev/mapper/ploop30904p1
meta-data=/dev/mapper/ploop30904p1 isize=512    agcount=4, agsize=524160 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0          =                       reflink=1    bigtime=1 inobtcount=1 nrext64=0
data     =                       bsize=4096   blocks=2096640, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=16384, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
Discarding blocks...Done.
Metadata CRC error detected at 0x561dffbd10f0, xfs_agf block 0x1/0x200
Metadata CRC error detected at 0x561dffbd10f0, xfs_agf block 0x3ffc01/0x200 Metadata CRC error detected at 0x561dffbd10f0, xfs_agf block 0x7ff801/0x200 Metadata CRC error detected at 0x561dffbd10f0, xfs_agf block 0xbff401/0x200
Metadata CRC error detected at 0x561dffbee040, xfs_agi block 0x2/0x200
Metadata CRC error detected at 0x561dffbee040, xfs_agi block 0x3ffc02/0x200 Metadata CRC error detected at 0x561dffbee040, xfs_agi block 0x7ff802/0x200 Metadata CRC error detected at 0x561dffbee040, xfs_agi block 0xbff402/0x200
mkfs.xfs: Inode allocation failed [28 - No space left on device]
Error in run_prg_rc (util.c:309): Command mkfs.xfs -b size=4096 /dev/mapper/ploop30904p1  exited with code 1
Umount /dev/mapper/ploop30904
DM command: remove /dev/mapper/ploop30904p1
DM message: /dev/mapper/ploop30904 get_img_name 0
Opening delta /mnt/ramdisk/root.hds
DM command: suspend /dev/mapper/ploop30904
DM command: resume /dev/mapper/ploop30904
Unmounting device /dev/mapper/ploop30904
DM command: resume /dev/mapper/ploop30904
DM command: remove /dev/mapper/ploop30904
Clear inuse state
Dropping image /mnt/ramdisk/root.hds
dev007 /mnt/ramdisk #

rmmod after that has been hanged.

Den
[  351.210027] ============================================================================= [  351.210348] BUG ploop-pio (Tainted: G            E  X  ------- --- ): Objects remaining in ploop-pio on __kmem_cache_shutdown() [  351.210620] -----------------------------------------------------------------------------

[  351.210886] Slab 0x00000000afe69346 objects=36 used=3 fp=0x0000000028f0e0af flags=0x17ffffc0010200(slab|head|node=0|zone=2|lastcpupid=0x1fffff) [  351.211192] CPU: 2 PID: 15519 Comm: rmmod ve: / Kdump: loaded Tainted: G            E  X  -------  ---  5.14.0-427.37.1.ovz9.78.1 #6 ovz9.78.1 [  351.211564] Hardware name: Supermicro SYS-5019C-L/X11SCL-IF, BIOS 1.5 10/05/2020
[  351.211878] Call Trace:
[  351.212146]  <TASK>
[  351.212452]  dump_stack_lvl+0x34/0x48
[  351.212732]  slab_err+0xa2/0x110
[  351.213006]  __kmem_cache_shutdown+0x16a/0x310
[  351.213305]  kmem_cache_destroy+0x51/0x160
[  351.213583]  dm_ploop_exit+0x24/0x20d [ploop]
[  351.213867]  __do_sys_delete_module.constprop.0+0x175/0x280
[  351.214148]  ? syscall_trace_enter.constprop.0+0x126/0x1a0
[  351.214428]  do_syscall_64+0x59/0x90
[  351.214703]  ? __rseq_handle_notify_resume+0x26/0xb0
[  351.214980]  ? exit_to_user_mode_loop+0xd0/0x130
[  351.215257]  ? exit_to_user_mode_prepare+0xb6/0x100
[  351.215553]  ? syscall_exit_to_user_mode+0x22/0x40
[  351.215830]  ? do_syscall_64+0x69/0x90
[  351.216105]  ? exc_page_fault+0x62/0x150
[  351.216414]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[  351.216693] RIP: 0033:0x7f72bc83f5ab
[  351.216988] Code: 73 01 c3 48 8b 0d 75 a8 1b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 45 a8 1b 00 f7 d8 64 89 01 48 [  351.217391] RSP: 002b:00007fff49e1e798 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 [  351.217707] RAX: ffffffffffffffda RBX: 00005580d76568f0 RCX: 00007f72bc83f5ab [  351.217991] RDX: 000000000000000a RSI: 0000000000000800 RDI: 00005580d7656958 [  351.218313] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 [  351.218596] R10: 00007f72bc99eac0 R11: 0000000000000206 R12: 00007fff49e1e9f0 [  351.218881] R13: 00007fff49e1f070 R14: 00005580d76562a0 R15: 00005580d76568f0
[  351.219166]  </TASK>
[  351.219453] Disabling lock debugging due to kernel taint
[  351.219453] Disabling lock debugging due to kernel taint
[  351.219732] Kernel panic - not syncing: panic_on_warn set ...
[  351.220012] CPU: 2 PID: 15519 Comm: rmmod ve: / Kdump: loaded Tainted: G    B       E  X  -------  ---  5.14.0-427.37.1.ovz9.78.1 #6 ovz9.78.1 [  351.220345] Hardware name: Supermicro SYS-5019C-L/X11SCL-IF, BIOS 1.5 10/05/2020
[  351.220677] Call Trace:
[  351.220945]  <TASK>
[  351.221230]  dump_stack_lvl+0x34/0x48
[  351.221523]  panic+0x107/0x2f7
[  351.221796]  slab_err.cold+0xc/0xc
[  351.222073]  __kmem_cache_shutdown+0x16a/0x310
[  351.222368]  kmem_cache_destroy+0x51/0x160
[  351.222643]  dm_ploop_exit+0x24/0x20d [ploop]
[  351.222921]  __do_sys_delete_module.constprop.0+0x175/0x280
[  351.223216]  ? syscall_trace_enter.constprop.0+0x126/0x1a0
[  351.223519]  do_syscall_64+0x59/0x90
[  351.223793]  ? __rseq_handle_notify_resume+0x26/0xb0
[  351.224070]  ? exit_to_user_mode_loop+0xd0/0x130
[  351.224345]  ? exit_to_user_mode_prepare+0xb6/0x100
[  351.224622]  ? syscall_exit_to_user_mode+0x22/0x40
[  351.224897]  ? do_syscall_64+0x69/0x90
[  351.225170]  ? exc_page_fault+0x62/0x150
[  351.225460]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[  351.225737] RIP: 0033:0x7f72bc83f5ab
[  351.226011] Code: 73 01 c3 48 8b 0d 75 a8 1b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 45 a8 1b 00 f7 d8 64 89 01 48 [  351.226394] RSP: 002b:00007fff49e1e798 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 [  351.226709] RAX: ffffffffffffffda RBX: 00005580d76568f0 RCX: 00007f72bc83f5ab [  351.226995] RDX: 000000000000000a RSI: 0000000000000800 RDI: 00005580d7656958 [  351.227316] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 [  351.227601] R10: 00007f72bc99eac0 R11: 0000000000000206 R12: 00007fff49e1e9f0 [  351.227887] R13: 00007fff49e1f070 R14: 00005580d76562a0 R15: 00005580d76568f0
[  351.228171]  </TASK>


_______________________________________________
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

Reply via email to