On 10/9/20 7:44 am, Jay Vosburgh wrote:
> wgrant, you said:
> 
> That :a-0000152 is meant to be /sys/kernel/slab/:a-0000152. Even a
> working kernel shows some trouble there:
> 
>   $ uname -a
>   Linux <REDACTED> 5.4.0-42-generic #46~18.04.1-Ubuntu SMP Fri Jul 10 
> 07:21:24 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
>   $ ls -l /sys/kernel/slab | grep a-0000152
>   lrwxrwxrwx 1 root root 0 Sep 8 03:20 dm_bufio_buffer -> :a-0000152
> 
> Are you saying that the symlink is "some trouble" here?  Because that
> part isn't an error, that's the effect of slab merge (that the kernel
> normally treats all slabs of the same size as one big slab with multiple
> references, more or less).

The symlink itself is indeed not a bug. But there's one reference, and
the thing it's referencing doesn't exist. I don't think that symlink
should be dangling.

> Slab merge can be disabled via "slab_nomerge" on the command line.

Thanks for the slab_nomerge hint. That gets 5.4.0-47 to boot, but
dm_bufio_buffer interestingly doesn't show up in /proc/slabinfo or
/sys/kernel/slab at all, unlike in earlier kernels. There's no 152-byte
slab:

  $ sudo cat /sys/kernel/slab/*/slab_size | grep ^152$
  $

I've also just reproduced this on a second host by rebooting it into the
same updated kernel -- identical hardware except for a couple of things
like SSDs, and fairly similar software configuration.

... some digging later ...

The trigger on boot is the parallel pvscans launched by
lvm2-pvscan@.service in the presence of several PVs. If I mask that
service, the system boots fine on the updated kernel (without
slab_nomerge). And then this crashes it:

  for i in 259:1 259:2 259:3 8:32 8:48 8:64 8:80; do sudo /sbin/lvm
pvscan --cache --activate ay $i & done`

I think the key is to have no active VGs with snapshots, then
simultaneously activate two VGs with snapshots.

Armed with that hypothesis, I set up a boring local bionic qemu-kvm
instance, installed linux-generic-hwe-18.04, and reproduced the problem
with a couple of loop devices:

  $ sudo dd if=/dev/zero of=pv1.img bs=1M count=1 seek=1024
  $ sudo dd if=/dev/zero of=pv2.img bs=1M count=1 seek=1024
  $ sudo losetup -f pv1.img
  $ sudo losetup -f pv2.img
  $ sudo vgcreate vg1 /dev/loop0
  $ sudo vgcreate vg2 /dev/loop1
  $ sudo lvcreate --type snapshot -L4M -V10G -n test vg1
  $ sudo lvcreate --type snapshot -L4M -V10G -n test vg2
  $ sudo systemctl mask lvm2-pvscan@.service
  $ sudo reboot

  $ sudo losetup -f pv1.img
  $ sudo losetup -f pv2.img
  $ for i in 7:0 7:1; do sudo /sbin/lvm pvscan --cache --activate ay $i
& done
  $ # Be glad if you can still type by this point.

The oops is not 100% reproducible in this configuration, but it seems
fairly reliable with four vCPUs. If not, a few cycles of rebooting and
running those last three commands always worked for me.

The console sometimes remains responsive after the oops, allowing me to
capture good and bad `dmsetup table -v` output. Not sure how helpful
that is, but I've attached an example (from a slightly different
configuration, where each VG has a linear LV with a snapshot,
rather than a snapshot-backed thin LV).


I've also been able to reproduce the fault on a pure focal system, but
it doesn't always happen on boot; lvm2-pvscan@.service (or a manual
pvscan afterwards) fails to activate the VGs. Something is creating
/run/lvm/vgs_online/$VG too early, so pvscan thinks it's already done
and I end up needing to activate them manually later. This seems
unrelated, and only affects a subset of my VMs. But when it happens,
that actually makes it easier to reproduce, since the system boots
without having the unit masked. So you can then crash with just:

  $ for VG in vg1 vg2; do sudo vgchange -ay $VG & done

While debugging locally I also found that groovy with 5.8.0-18 is
affected. Because when I stopped a VM with PVs on real block devices the
host (my desktop, on which I nearly lost this email, oops) dutifully ran
pvscan over them, got very sad, and needed to be rebooted with
slab_nomerge to recover:

  [ DO NOT BLINDLY RUN THIS, it may well crash the host. ]
  $ lxc launch --vm ubuntu:focal bug-1894780-focal-2
  $ lxc storage volume create default lvm-1 --type=block size=10GB
  $ lxc storage volume create default lvm-2 --type=block size=10GB
  $ lxc stop bug-1894780-focal-2
  $ lxc storage volume attach default lvm-1 bug-1894780-focal-2 lvm-1
  $ lxc storage volume attach default lvm-2 bug-1894780-focal-2 lvm-2
  $ lxc start bug-1894780-focal-2
  $ lxc exec bug-1894780-focal-2 bash
  # vgcreate vg1 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_lxd_lvm-1
  # vgcreate vg2 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_lxd_lvm-2
  # lvcreate --type snapshot -L4M -V10G -n test vg1
  # lvcreate --type snapshot -L4M -V10G -n test vg2
  # poweroff
  $ # Host sadness here, unless you're somehow immune.
  $ lxc start bug-1894780-focal-2
  $ lxc exec bug-1894780-focal-2 bash
  # for VG in vg1 vg2; do sudo vgchange -ay $VG & done
  # # Guest sadness here.

So that's reproduced on metal and VM on 5.4.0-47 and 5.8.0-18 on two
different hosts (one an EPYC 7501 server, the other a Ryzen 7 1700X
desktop, both Zen 1 but I doubt that's relevant). Hopefully one of the
recipes works for you too.


** Attachment added: "dmsetup.bad"
   
https://bugs.launchpad.net/bugs/1894780/+attachment/5409274/+files/dmsetup.bad

** Attachment added: "dmsetup.good"
   
https://bugs.launchpad.net/bugs/1894780/+attachment/5409275/+files/dmsetup.good

** Attachment added: "vm.dmesg"
   https://bugs.launchpad.net/bugs/1894780/+attachment/5409276/+files/vm.dmesg

** Attachment added: "oops.desktop"
   
https://bugs.launchpad.net/bugs/1894780/+attachment/5409277/+files/oops.desktop

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1894780

Title:
  Oops and hang when starting LVM snapshots on 5.4.0-47

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Focal:
  New

Bug description:
  One of my bionic servers with HWE 5.4.0 hangs on boot (apparently
  while starting LVM snapshots) after upgrading from Linux 5.4.0-42 to
  5.4.0-47, with the following trace:

    [   29.126292] kobject_add_internal failed for :a-0000152 with -EEXIST, 
don't try to register things with the same name in the same directory.
    [   29.138854] BUG: kernel NULL pointer dereference, address: 
0000000000000020
    [   29.145977] #PF: supervisor read access in kernel mode
    [   29.145979] #PF: error_code(0x0000) - not-present page
    [   29.145981] PGD 0 P4D 0
    [   29.158800] Oops: 0000 [#1] SMP NOPTI
    [   29.162468] CPU: 6 PID: 2532 Comm: lvm Not tainted 5.4.0-46-generic 
#50~18.04.1-Ubuntu
    [   29.170378] Hardware name: Supermicro AS -2023US-TR4/H11DSU-iN, BIOS 1.3 
07/15/2019
    [   29.178038] RIP: 0010:free_percpu+0x120/0x1f0
    [   29.183786] Code: 43 64 48 01 d0 49 39 c4 0f 83 71 ff ff ff 65 8b 05 a5 
4e bc 58 48 8b 15 0e 4e 20 01 48 98 48 8b 3c c2 4c 01 e7 e8 f0 97 02 00 <48> 8b 
58 20 48 8b 53 38 e9 48 ff ff ff f3 c3 48 8b 43 38 48 89 45
    [   29.202530] RSP: 0018:ffffa2f69c3d38e8 EFLAGS: 00010046
    [   29.209204] RAX: 0000000000000000 RBX: ffff92202ff397c0 RCX: 
ffffffffa880a000
    [   29.216336] RDX: cf35c0f24f2cc3c0 RSI: 43817c451b92afcb RDI: 
0000000000000000
    [   29.223469] RBP: ffffa2f69c3d3918 R08: 0000000000000000 R09: 
ffffffffa74a5300
    [   29.230609] R10: ffffa2f69c3d3820 R11: 0000000000000000 R12: 
cf35c0f24f14c3c0
    [   29.237745] R13: cf362fb2a054c3c0 R14: 0000000000000287 R15: 
0000000000000008
    [   29.244878] FS:  00007f93a04b0900(0000) GS:ffff913faed80000(0000) 
knlGS:0000000000000000
    [   29.252961] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [   29.258707] CR2: 0000000000000020 CR3: 0000003fa9d90000 CR4: 
00000000003406e0
    [   29.265883] Call Trace:
    [   29.268346]  __kmem_cache_release+0x1a/0x30
    [   29.273913]  __kmem_cache_create+0x4f9/0x550
    [   29.278192]  ? __kmalloc_node+0x1eb/0x320
    [   29.282205]  ? kvmalloc_node+0x31/0x80
    [   29.285962]  create_cache+0x120/0x1f0
    [   29.291003]  kmem_cache_create_usercopy+0x17d/0x270
    [   29.295882]  kmem_cache_create+0x16/0x20
    [   29.300152]  dm_bufio_client_create+0x1af/0x3f0 [dm_bufio]
    [   29.305644]  ? snapshot_map+0x5e0/0x5e0 [dm_snapshot]
    [   29.310693]  persistent_read_metadata+0x1ed/0x500 [dm_snapshot]
    [   29.316627]  ? _cond_resched+0x19/0x40
    [   29.320384]  snapshot_ctr+0x79e/0x910 [dm_snapshot]
    [   29.325276]  dm_table_add_target+0x18d/0x370
    [   29.329552]  table_load+0x12a/0x370
    [   29.333045]  ctl_ioctl+0x1e2/0x590
    [   29.336450]  ? retrieve_status+0x1c0/0x1c0
    [   29.340551]  dm_ctl_ioctl+0xe/0x20
    [   29.343958]  do_vfs_ioctl+0xa9/0x640
    [   29.347547]  ? ksys_semctl.constprop.19+0xf7/0x190
    [   29.352337]  ksys_ioctl+0x75/0x80
    [   29.355663]  __x64_sys_ioctl+0x1a/0x20
    [   29.359421]  do_syscall_64+0x57/0x190
    [   29.363094]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [   29.368144] RIP: 0033:0x7f939f0286d7
    [   29.371732] Code: b3 66 90 48 8b 05 b1 47 2d 00 64 c7 00 26 00 00 00 48 
c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 
01 f0 ff ff 73 01 c3 48 8b 0d 81 47 2d 00 f7 d8 64 89 01 48
    [   29.390478] RSP: 002b:00007ffe918df168 EFLAGS: 00000202 ORIG_RAX: 
0000000000000010
    [   29.398045] RAX: ffffffffffffffda RBX: 0000561c107f672c RCX: 
00007f939f0286d7
    [   29.405175] RDX: 0000561c1107c610 RSI: 00000000c138fd09 RDI: 
0000000000000009
    [   29.412309] RBP: 00007ffe918df220 R08: 00007f939f59d120 R09: 
00007ffe918defd0
    [   29.419442] R10: 0000561c1107c6c0 R11: 0000000000000202 R12: 
00007f939f59c4e6
    [   29.426623] R13: 00007f939f59c4e6 R14: 00007f939f59c4e6 R15: 
00007f939f59c4e6
    [   29.433778] Modules linked in: dm_snapshot dm_bufio dm_zero 
nls_iso8859_1 ipmi_ssif input_leds amd64_edac_mod edac_mce_amd joydev kvm_amd 
kvm ccp k10temp ipmi_si ipmi_devintf ipmi_msghandler mac_hid sch_fq_codel 
ib_iser rdma_cm iw_cm ib_cm iscsi_tcp libiscsi_tcp libiscsi 
scsi_transport_iscsi sunrpc ip_tables x_tables autofs4 btrfs zstd_compress 
raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor 
raid6_pq libcrc32c raid1 raid0 multipath linear mlx5_ib ib_uverbs ib_core 
bcache crc64 hid_generic crct10dif_pclmul mlx5_core crc32_pclmul ast 
ghash_clmulni_intel drm_vram_helper pci_hyperv_intf ttm aesni_intel mpt3sas 
nvme crypto_simd drm_kms_helper syscopyarea igb cryptd raid_class sysfillrect 
ahci tls sysimgblt glue_helper dca usbhid fb_sys_fops libahci nvme_core mlxfw 
i2c_algo_bit scsi_transport_sas drm hid i2c_piix4
    [   29.507853] CR2: 0000000000000020
    [   29.511174] ---[ end trace 43bd923f80cbdf52 ]---

  That :a-0000152 is meant to be /sys/kernel/slab/:a-0000152. Even a
  working kernel shows some trouble there:

    $ uname -a
    Linux <REDACTED> 5.4.0-42-generic #46~18.04.1-Ubuntu SMP Fri Jul 10 
07:21:24 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
    $ ls -l /sys/kernel/slab | grep a-0000152
    lrwxrwxrwx 1 root root 0 Sep  8 03:20 dm_bufio_buffer -> :a-0000152

  So on 5.4.0-42 the named node doesn't get created, but at least it
  doesn't crash. The same thing is visible on my 5.8.0-18 desktop, but I
  can't reproduce the crash on other machines with snapshot thin volumes
  despite it happening every time (even with maxcpus=1) on the affected
  system.

  It should be noted that LVM was not in use on this system until just
  before it was rebooted into the new kernel, but downgrading to -42
  does work so it seems like a coincidence. Before I realised it was a
  recent regression I dug through mm/slub.c's history and found dde3c6b7
  ("mm/slub: fix a memory leak in sysfs_slab_add()") kind of suspicious
  -- it ostensibly fixes a leak from 80da026a ("mm/slub: fix slab
  double-free in case of duplicate sysfs filename"), exactly the
  codepath that seems to crash here.

  There's clearly some existing bug causing the slab sysfs node to not
  be added, and I guess dde3c6b7 turns that into a crash on some
  systems. This is a test system, so I can do whatever debugging is
  required to narrow down the trigger.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1894780/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to