A similar issue was seen upstream here - https://github.com/openzfs/zfs/issues/12543. However, setting spl_kmem_cache_slab_limit=0 caused the instances to crash - so that doesn't appear to be a tenable solution in this case.
** Bug watch added: github.com/openzfs/zfs/issues #12543 https://github.com/openzfs/zfs/issues/12543 -- You received this bug notification because you are a member of Canonical Platform QA Team, which is subscribed to ubuntu-kernel-tests. https://bugs.launchpad.net/bugs/2078300 Title: ubuntu_zfs_stress triggers kernel BUG at mm/usercopy.c:99 on J-generic/lowlatancy-64k Status in ubuntu-kernel-tests: New Bug description: Issue found with Jammy 5.15.0-121.131 generic-64k and lowlatency-64k kernel on openstack ARM64 instance. (They are good with 5.15.0-118.128 last cycle) The test failed because some process cannot be terminated properly and consequently making the test not finishing properly (so the sut-test failure was raised). stress-ng: info: [571640] aio stressor will be skipped, it is not implemented on this system: aarch64 Linux 5.15.0-121-lowlatency-64k gcc 11.4.0 (built without aio.h) stress-ng: info: [571640] setting to a 5 secs run per stressor stress-ng: info: [571640] dispatching hogs: 4 hdd, 4 link, 4 symlink, 4 lockf, 4 seek, 4 dentry, 4 dir, 4 fallocate, 4 fstat, 1 io, 4 lease, 2 mmap, 4 open, 4 rename, 4 chdir, 4 chmod, 4 filename, 4 rename stress-ng: info: [571640] note: /proc/sys/kernel/sched_autogroup_enabled is 1 and this can impact scheduling throughput for processes not attached to a tty. Setting this to 0 may improve performance metrics stress-ng: info: [571681] io: this is a legacy I/O sync stressor, consider using iomix instead stress-ng: info: [571688] open: using a maximum of 1048576 file descriptors stress-ng: info: [571698] chdir: removing 8192 directories stress-ng: warn: [571687] cannot terminate process 571939, gave up after 120 seconds stress-ng: warn: [571686] cannot terminate process 571938, gave up after 120 seconds sut-test TEST SYSTEM FAILURE DETECTED Test results file '/home/openstack/workspace/jammy-linux-lowlatency-lowlatency-64k-arm64-5.15.0-cpu2-ram4-disk20-ubuntu_zfs_stress/kernel-results.xml' not found. But if you look into the console output, there is something wrong here, below is the output from 5.15.0-121-lowlatency-64k: [ 840.682657] usercopy: Kernel memory exposure attempt detected from SLUB object 'zio_buf_comb_4096' (offset 1, size 8191)! [ 840.685084] usercopy: Kernel memory exposure attempt detected from SLUB object 'zio_buf_comb_4096' (offset 1, size 8191)! [ 840.687391] kernel BUG at mm/usercopy.c:99! [ 840.688377] Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP [ 840.689705] Modules linked in: zfs(PO) zunicode(PO) zzstd(O) zlua(O) zcommon(PO) znvpair(PO) zavl(PO) icp(PO) spl(O) binfmt_misc nls_iso8859_1 qemu_fw_cfg dm_multipath sch_fq_codel scsi_dh_rdac scsi_dh_emc scsi_dh_alua efi_pstore drm ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce virtio_net net_failover virtio_scsi failover aes_neon_bs aes_neon_blk aes_ce_blk crypto_simd cryptd aes_ce_cipher [ 840.699464] CPU: 0 PID: 628890 Comm: stress-ng Tainted: P O 5.15.0-121-lowlatency-64k #131-Ubuntu [ 840.701319] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 [ 840.702626] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 840.704008] pc : usercopy_abort+0x98/0x9c [ 840.704843] lr : usercopy_abort+0x98/0x9c [ 840.705597] sp : ffff8000253cf910 [ 840.706216] x29: ffff8000253cf920 x28: 0000000000000000 x27: 0000000000010000 [ 840.707562] x26: ffff00002c282400 x25: ffff8000253cfbe0 x24: 0000000000000001 [ 840.708804] x23: 0000000000000000 x22: 0000000000001000 x21: 0000000000001fff [ 840.710167] x20: ffff0000c001af00 x19: 0000000000000001 x18: 0000000000000000 [ 840.711505] x17: 656a626f2042554c x16: 53206d6f72662064 x15: 6574636574656420 [ 840.712845] x14: 74706d6574746120 x13: 2129313931382065 x12: 7a6973202c312074 [ 840.714286] x11: 657366666f282027 x10: 363930345f626d6f x9 : ffff8000082b5c48 [ 840.715615] x8 : 2079726f6d656d20 x7 : 0000000000000001 x6 : 0000000000000001 [ 840.716849] x5 : 0000000000000000 x4 : ffff0000ff9c2ac8 x3 : 0000000000000000 [ 840.718224] x2 : 0000000000000000 x1 : ffff0000db54fc00 x0 : 000000000000006d [ 840.719546] Call trace: [ 840.719976] usercopy_abort+0x98/0x9c [ 840.720656] __check_heap_object+0x194/0x1d0 [ 840.721476] __check_object_size.part.0+0x160/0x1e0 [ 840.722414] __check_object_size+0x28/0x40 [ 840.723197] zfs_uiomove_iter+0x68/0x110 [zfs] [ 840.724147] zfs_uiomove+0x40/0x60 [zfs] [ 840.725136] dmu_read_uio_dnode+0xc8/0x120 [zfs] [ 840.726141] dmu_read_uio_dbuf+0x58/0x80 [zfs] [ 840.727096] mappedread+0xe8/0x150 [zfs] [ 840.727955] zfs_read+0x164/0x350 [zfs] [ 840.728783] zpl_iter_read+0xa4/0x12c [zfs] [ 840.729640] new_sync_read+0xf0/0x184 [ 840.730320] vfs_read+0x15c/0x1f4 [ 840.730938] ksys_read+0x70/0x100 [ 840.731561] __arm64_sys_read+0x24/0x30 [ 840.732295] invoke_syscall+0x78/0x100 [ 840.732994] el0_svc_common.constprop.0+0x54/0x184 [ 840.733906] do_el0_svc+0x30/0xac [ 840.734514] el0_svc+0x48/0x160 [ 840.735113] el0t_64_sync_handler+0xa4/0x12c [ 840.735910] el0t_64_sync+0x1a4/0x1a8 [ 840.736617] Code: aa0003e3 90003020 910ce000 97fff353 (d4210000) [ 840.737783] ---[ end trace d4861bf0f486b2ad ]--- [ 840.876737] ------------[ cut here ]------------ [ 840.880650] kernel BUG at mm/usercopy.c:99! [ 840.982155] note: stress-ng[628890] exited with preempt_count 1 [ 840.982155] Internal error: Oops - BUG: 00000000f2000800 [#2] PREEMPT SMP [ 840.982160] Modules linked in: zfs(PO) zunicode(PO) zzstd(O) zlua(O) zcommon(PO) znvpair(PO) zavl(PO) icp(PO) spl(O) binfmt_misc nls_iso8859_1 qemu_fw_cfg dm_multipath sch_fq_codel scsi_dh_rdac scsi_dh_emc scsi_dh_alua efi_pstore drm ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce virtio_net net_failover virtio_scsi failover aes_neon_bs aes_neon_blk aes_ce_blk crypto_simd cryptd aes_ce_cipher [ 840.995156] CPU: 1 PID: 628887 Comm: stress-ng Tainted: P D O 5.15.0-121-lowlatency-64k #131-Ubuntu [ 840.997028] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 [ 840.998545] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 840.999821] pc : usercopy_abort+0x98/0x9c [ 841.000637] lr : usercopy_abort+0x98/0x9c [ 841.001401] sp : ffff80002536f880 [ 841.002030] x29: ffff80002536f890 x28: 0000000000000000 x27: 0000000000010000 [ 841.003403] x26: ffff00002c280480 x25: ffff80002536fb50 x24: 0000000000000001 [ 841.004743] x23: 0000000000000000 x22: 0000000000001000 x21: 0000000000001fff [ 841.006269] x20: ffff0000c001af00 x19: 0000000000000001 x18: 00000000a8cb9176 [ 841.007610] x17: ffff8000f5990000 x16: ffff800008020000 x15: ffff0000551aaf40 [ 841.008958] x14: ffff80000a968040 x13: ffff80000a967b28 x12: 0000000000000001 [ 841.010218] x11: 0000000000000004 x10: 0000000000001b30 x9 : ffff8000082b5c48 [ 841.011556] x8 : ffff0000d01fb510 x7 : ffff0000d3c78200 x6 : ffff0000d1d88ac8 [ 841.012906] x5 : 0000000000000000 x4 : ffff0000ffa62ac8 x3 : 0000000000000000 [ 841.014285] x2 : 0000000000000000 x1 : ffff0000d01f9980 x0 : 000000000000006d [ 841.015609] Call trace: [ 841.016085] usercopy_abort+0x98/0x9c [ 841.016787] __check_heap_object+0x194/0x1d0 [ 841.017603] __check_object_size.part.0+0x160/0x1e0 [ 841.018535] __check_object_size+0x28/0x40 [ 841.019321] zfs_uiomove_iter+0x68/0x110 [zfs] [ 841.020297] zfs_uiomove+0x40/0x60 [zfs] [ 841.021196] dmu_read_uio_dnode+0xc8/0x120 [zfs] [ 841.022182] dmu_read_uio_dbuf+0x58/0x80 [zfs] [ 841.023054] mappedread+0xe8/0x150 [zfs] [ 841.023893] zfs_read+0x164/0x350 [zfs] [ 841.024747] zpl_iter_read+0xa4/0x12c [zfs] [ 841.025637] new_sync_read+0xf0/0x184 [ 841.026329] vfs_read+0x15c/0x1f4 [ 841.026950] ksys_read+0x70/0x100 [ 841.027572] __arm64_sys_read+0x24/0x30 [ 841.028301] invoke_syscall+0x78/0x100 [ 841.029010] el0_svc_common.constprop.0+0x54/0x184 [ 841.029930] do_el0_svc+0x30/0xac [ 841.030556] el0_svc+0x48/0x160 [ 841.031256] el0t_64_sync_handler+0xa4/0x12c [ 841.032079] el0t_64_sync+0x1a4/0x1a8 [ 841.032833] Code: aa0003e3 90003020 910ce000 97fff353 (d4210000) [ 841.033995] ---[ end trace d4861bf0f486b2ae ]--- [ 841.035385] note: stress-ng[628887] exited with preempt_count 1 [ 841.040774] ------------[ cut here ]------------ [ 841.041656] WARNING: CPU: 0 PID: 0 at kernel/rcu/tree.c:614 rcu_eqs_enter.constprop.0+0xa4/0xb0 [ 841.043373] Modules linked in: zfs(PO) zunicode(PO) zzstd(O) zlua(O) zcommon(PO) znvpair(PO) zavl(PO) icp(PO) spl(O) binfmt_misc nls_iso8859_1 qemu_fw_cfg dm_multipath sch_fq_codel scsi_dh_rdac scsi_dh_emc scsi_dh_alua efi_pstore drm ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce virtio_net net_failover virtio_scsi failover aes_neon_bs aes_neon_blk aes_ce_blk crypto_simd cryptd aes_ce_cipher [ 841.053419] CPU: 0 PID: 0 Comm: swapper/0 Tainted: P D O 5.15.0-121-lowlatency-64k #131-Ubuntu [ 841.055253] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 [ 841.056585] pstate: 204000c5 (nzCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 841.057965] pc : rcu_eqs_enter.constprop.0+0xa4/0xb0 [ 841.058899] lr : rcu_idle_enter+0x18/0x24 [ 841.059659] sp : ffff80000a90fd30 [ 841.060308] x29: ffff80000a90fd30 x28: 000000012f320018 x27: 000000013b530aa0 [ 841.061618] x26: 000000013b530a20 x25: 000000013b530960 x24: 000000013f998528 [ 841.062927] x23: 0000000000060000 x22: ffff80000a933900 x21: ffff80000a933900 [ 841.064316] x20: 0000000000000000 x19: ffff0000ffa4d700 x18: 00000000fd3f2d21 [ 841.065793] x17: ffffffffffffffff x16: ffffffffffffffff x15: 7b1f040030771f04 [ 841.067147] x14: ffff80000a968040 x13: ffff80000a967b28 x12: 0000000000000000 [ 841.068507] x11: 000000000000000c x10: 0000000000001b30 x9 : ffff8000090b2740 [ 841.069872] x8 : ffff80000a935490 x7 : 0000000000014000 x6 : ffff0000ff9c3628 [ 841.071205] x5 : 00000000410fd0c0 x4 : ffff8000f58f0000 x3 : 0000000000000000 [ 841.072532] x2 : 0000000000000000 x1 : 4000000000000002 x0 : 4000000000000000 [ 841.073946] Call trace: [ 841.074415] rcu_eqs_enter.constprop.0+0xa4/0xb0 [ 841.075292] rcu_idle_enter+0x18/0x24 [ 841.075991] default_idle_call+0x40/0x1ac [ 841.076729] cpuidle_idle_call+0x174/0x200 [ 841.077478] do_idle+0xac/0x100 [ 841.078072] cpu_startup_entry+0x30/0x6c [ 841.078822] rest_init+0x104/0x130 [ 841.079468] arch_call_rest_init+0x18/0x24 [ 841.080256] start_kernel+0x4b4/0x4ec [ 841.080957] __primary_switched+0xbc/0xc4 [ 841.081773] ---[ end trace d4861bf0f486b2af ]--- We only catch this issue now because: * The test result for generic-64k kernel was lost during the infrastructure transition, it was found after rebuilding the test on the new jenkins. * This kernel BUG error message didn't came up with the 1st attempt on lowlatency-64k kernel. * The tool in CKCT is not scanning for "kernel BUG" pattern. Please find attachment for the complete console log retrieved for J-generic-64k. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/2078300/+subscriptions -- Mailing list: https://launchpad.net/~canonical-ubuntu-qa Post to : canonical-ubuntu-qa@lists.launchpad.net Unsubscribe : https://launchpad.net/~canonical-ubuntu-qa More help : https://help.launchpad.net/ListHelp