** Attachment added: "version_signature from the last good kernel" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1894780/+attachment/5408514/+files/version.log
** Summary changed: - Oops when starting LVM snapshots on 5.4.0-47 + Oops and hang when starting LVM snapshots on 5.4.0-47 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1894780 Title: Oops and hang when starting LVM snapshots on 5.4.0-47 Status in linux package in Ubuntu: New Bug description: One of my bionic servers with HWE 5.4.0 hangs on boot (apparently while starting LVM snapshots) after upgrading from Linux 5.4.0-42 to 5.4.0-47, with the following trace: [ 29.126292] kobject_add_internal failed for :a-0000152 with -EEXIST, don't try to register things with the same name in the same directory. [ 29.138854] BUG: kernel NULL pointer dereference, address: 0000000000000020 [ 29.145977] #PF: supervisor read access in kernel mode [ 29.145979] #PF: error_code(0x0000) - not-present page [ 29.145981] PGD 0 P4D 0 [ 29.158800] Oops: 0000 [#1] SMP NOPTI [ 29.162468] CPU: 6 PID: 2532 Comm: lvm Not tainted 5.4.0-46-generic #50~18.04.1-Ubuntu [ 29.170378] Hardware name: Supermicro AS -2023US-TR4/H11DSU-iN, BIOS 1.3 07/15/2019 [ 29.178038] RIP: 0010:free_percpu+0x120/0x1f0 [ 29.183786] Code: 43 64 48 01 d0 49 39 c4 0f 83 71 ff ff ff 65 8b 05 a5 4e bc 58 48 8b 15 0e 4e 20 01 48 98 48 8b 3c c2 4c 01 e7 e8 f0 97 02 00 <48> 8b 58 20 48 8b 53 38 e9 48 ff ff ff f3 c3 48 8b 43 38 48 89 45 [ 29.202530] RSP: 0018:ffffa2f69c3d38e8 EFLAGS: 00010046 [ 29.209204] RAX: 0000000000000000 RBX: ffff92202ff397c0 RCX: ffffffffa880a000 [ 29.216336] RDX: cf35c0f24f2cc3c0 RSI: 43817c451b92afcb RDI: 0000000000000000 [ 29.223469] RBP: ffffa2f69c3d3918 R08: 0000000000000000 R09: ffffffffa74a5300 [ 29.230609] R10: ffffa2f69c3d3820 R11: 0000000000000000 R12: cf35c0f24f14c3c0 [ 29.237745] R13: cf362fb2a054c3c0 R14: 0000000000000287 R15: 0000000000000008 [ 29.244878] FS: 00007f93a04b0900(0000) GS:ffff913faed80000(0000) knlGS:0000000000000000 [ 29.252961] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 29.258707] CR2: 0000000000000020 CR3: 0000003fa9d90000 CR4: 00000000003406e0 [ 29.265883] Call Trace: [ 29.268346] __kmem_cache_release+0x1a/0x30 [ 29.273913] __kmem_cache_create+0x4f9/0x550 [ 29.278192] ? __kmalloc_node+0x1eb/0x320 [ 29.282205] ? kvmalloc_node+0x31/0x80 [ 29.285962] create_cache+0x120/0x1f0 [ 29.291003] kmem_cache_create_usercopy+0x17d/0x270 [ 29.295882] kmem_cache_create+0x16/0x20 [ 29.300152] dm_bufio_client_create+0x1af/0x3f0 [dm_bufio] [ 29.305644] ? snapshot_map+0x5e0/0x5e0 [dm_snapshot] [ 29.310693] persistent_read_metadata+0x1ed/0x500 [dm_snapshot] [ 29.316627] ? _cond_resched+0x19/0x40 [ 29.320384] snapshot_ctr+0x79e/0x910 [dm_snapshot] [ 29.325276] dm_table_add_target+0x18d/0x370 [ 29.329552] table_load+0x12a/0x370 [ 29.333045] ctl_ioctl+0x1e2/0x590 [ 29.336450] ? retrieve_status+0x1c0/0x1c0 [ 29.340551] dm_ctl_ioctl+0xe/0x20 [ 29.343958] do_vfs_ioctl+0xa9/0x640 [ 29.347547] ? ksys_semctl.constprop.19+0xf7/0x190 [ 29.352337] ksys_ioctl+0x75/0x80 [ 29.355663] __x64_sys_ioctl+0x1a/0x20 [ 29.359421] do_syscall_64+0x57/0x190 [ 29.363094] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 29.368144] RIP: 0033:0x7f939f0286d7 [ 29.371732] Code: b3 66 90 48 8b 05 b1 47 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 81 47 2d 00 f7 d8 64 89 01 48 [ 29.390478] RSP: 002b:00007ffe918df168 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 [ 29.398045] RAX: ffffffffffffffda RBX: 0000561c107f672c RCX: 00007f939f0286d7 [ 29.405175] RDX: 0000561c1107c610 RSI: 00000000c138fd09 RDI: 0000000000000009 [ 29.412309] RBP: 00007ffe918df220 R08: 00007f939f59d120 R09: 00007ffe918defd0 [ 29.419442] R10: 0000561c1107c6c0 R11: 0000000000000202 R12: 00007f939f59c4e6 [ 29.426623] R13: 00007f939f59c4e6 R14: 00007f939f59c4e6 R15: 00007f939f59c4e6 [ 29.433778] Modules linked in: dm_snapshot dm_bufio dm_zero nls_iso8859_1 ipmi_ssif input_leds amd64_edac_mod edac_mce_amd joydev kvm_amd kvm ccp k10temp ipmi_si ipmi_devintf ipmi_msghandler mac_hid sch_fq_codel ib_iser rdma_cm iw_cm ib_cm iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi sunrpc ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear mlx5_ib ib_uverbs ib_core bcache crc64 hid_generic crct10dif_pclmul mlx5_core crc32_pclmul ast ghash_clmulni_intel drm_vram_helper pci_hyperv_intf ttm aesni_intel mpt3sas nvme crypto_simd drm_kms_helper syscopyarea igb cryptd raid_class sysfillrect ahci tls sysimgblt glue_helper dca usbhid fb_sys_fops libahci nvme_core mlxfw i2c_algo_bit scsi_transport_sas drm hid i2c_piix4 [ 29.507853] CR2: 0000000000000020 [ 29.511174] ---[ end trace 43bd923f80cbdf52 ]--- That :a-0000152 is meant to be /sys/kernel/slab/:a-0000152. Even a working kernel shows some trouble there: $ uname -a Linux <REDACTED> 5.4.0-42-generic #46~18.04.1-Ubuntu SMP Fri Jul 10 07:21:24 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux $ ls -l /sys/kernel/slab | grep a-0000152 lrwxrwxrwx 1 root root 0 Sep 8 03:20 dm_bufio_buffer -> :a-0000152 So on 5.4.0-42 the named node doesn't get created, but at least it doesn't crash. The same thing is visible on my 5.8.0-18 desktop, but I can't reproduce the crash on other machines with snapshot thin volumes despite it happening every time (even with maxcpus=1) on the affected system. It should be noted that LVM was not in use on this system until just before it was rebooted into the new kernel, but downgrading to -42 does work so it seems like a coincidence. Before I realised it was a recent regression I dug through mm/slub.c's history and found dde3c6b7 ("mm/slub: fix a memory leak in sysfs_slab_add()") kind of suspicious -- it ostensibly fixes a leak from 80da026a ("mm/slub: fix slab double-free in case of duplicate sysfs filename"), exactly the codepath that seems to crash here. There's clearly some existing bug causing the slab sysfs node to not be added, and I guess dde3c6b7 turns that into a crash on some systems. This is a test system, so I can do whatever debugging is required to narrow down the trigger. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1894780/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp