Public bug reported: Problem: erratic host behaviour after "kernel NULL pointer dereference" appears Config: host = Ryzen 3600 w/2x16GB, ZFS RAIDZ1 5 vdevs guest = Windows 7 w/16GB, zvol 64GB+512GB, QXL 16MB (planning for vfio-GPU-pt) Behavior: per dmesg, seems related to THP compacting + ARC memory allocation. once deference issue happens, host will become erratic. Reboot host lengthen from 60s to >500s, and sometimes bash will die as well.
============ /etc/modprobe.d/*.conf ================== options zfs zfs_arc_max=4294967296 options zfs zfs_arc_min=268435456 options zfs zfs_arc_sys_free=268435456 options vfio_iommu_type1 allow_unsafe_interrupts=1 disable_hugepages=1 options kvm ignore_msrs=1 report_ignored_msrs=0 ============ free -h ================== total used free shared buff/cache available Mem: 31Gi 28Gi 2.4Gi 1.0Mi 617Mi 2.7Gi Swap: 84Gi 0.0Ki 84Gi ============ THP info ================== /sys/kernel/mm/transparent_hugepage/enabled = always [madvise] never /sys/kernel/mm/transparent_hugepage/defrag = always defer defer+madvise [madvise] never AnonHugePages: 16322560 kB Hugepagesize: 2048 kB nr_anon_transparent_hugepages 7970 thp_fault_alloc 23907 thp_fault_fallback 0 thp_collapse_alloc 1 thp_collapse_alloc_failed 0 thp_file_alloc 0 thp_file_mapped 0 thp_split_page 9 thp_split_page_failed 164 thp_deferred_split_page 15938 thp_split_pmd 9 thp_split_pud 0 thp_zero_page_alloc 1 thp_zero_page_alloc_failed 0 thp_swpout 0 thp_swpout_fallback 0 ============ dmesg trace log ================== [ 2516.858188] BUG: kernel NULL pointer dereference, address: 000000000000006c [ 2516.858194] #PF: supervisor read access in kernel mode [ 2516.858196] #PF: error_code(0x0000) - not-present page [ 2516.858198] PGD 0 P4D 0 [ 2516.858201] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 2516.858204] CPU: 5 PID: 491 Comm: systemd-udevd Tainted: P O 5.3.0-46-lowlatency #38-Ubuntu [ 2516.858207] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./B450M Pro4, BIOS P3.90 12/09/2019 [ 2516.858246] RIP: 0010:arc_kmem_reap_soon+0x52/0xe0 [zfs] [ 2516.858249] Code: 05 5f 28 17 00 85 c0 0f 85 95 00 00 00 45 31 f6 45 31 e4 31 db eb 03 4d 89 ee 4c 89 e0 4c 8b 24 dd 80 79 d3 c0 49 39 c4 74 0d <41> 8b 74 24 6c 4c 89 e7 e8 91 f5 8f ff 4c 8b 2c dd 80 79 cf c0 4d [ 2516.858253] RSP: 0000:ffffbab7c0757958 EFLAGS: 00010207 [ 2516.858256] RAX: ffff8e3d63751800 RBX: 0000000000007000 RCX: 61c8864680b583eb [ 2516.858258] RDX: ffffffffae605d38 RSI: ffff8e3d63750c50 RDI: ffff8e3d63750c70 [ 2516.858260] RBP: ffffbab7c0757978 R08: ffff8e3d63750c50 R09: 000000000002840a [ 2516.858262] R10: ffff8e3d58ca0098 R11: ffff8e3d6e86a8b0 R12: 0000000000000000 [ 2516.858264] R13: ffff8e3d63750c00 R14: ffff8e3d63750c00 R15: 0000000000000000 [ 2516.858267] FS: 00007fb3bd2a7880(0000) GS:ffff8e3d6e740000(0000) knlGS:0000000000000000 [ 2516.858269] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2516.858272] CR2: 000000000000006c CR3: 00000007e86fc000 CR4: 0000000000340ee0 [ 2516.858274] Call Trace: [ 2516.858304] __arc_shrinker_func.isra.0+0xf4/0x190 [zfs] [ 2516.858334] arc_shrinker_func_scan_objects+0x15/0x30 [zfs] [ 2516.858339] do_shrink_slab+0x14f/0x2e0 [ 2516.858343] shrink_slab+0xac/0x260 [ 2516.858345] shrink_node+0xf4/0x490 [ 2516.858349] node_reclaim+0x1f6/0x340 [ 2516.858353] get_page_from_freelist+0xb8/0x390 [ 2516.858357] __alloc_pages_nodemask+0x166/0x320 [ 2516.858362] alloc_pages_vma+0xda/0x190 [ 2516.858366] wp_page_copy+0x88/0x850 [ 2516.858368] ? reuse_swap_page+0x70/0x370 [ 2516.858371] do_wp_page+0x90/0x620 [ 2516.858374] __handle_mm_fault+0x76f/0x7a0 [ 2516.858377] handle_mm_fault+0xd4/0x1f0 [ 2516.858381] do_user_addr_fault+0x201/0x450 [ 2516.858384] __do_page_fault+0x58/0x90 [ 2516.858386] do_page_fault+0x2c/0x100 [ 2516.858390] page_fault+0x34/0x40 [ 2516.858392] RIP: 0033:0x5618000f015e [ 2516.858395] Code: 28 00 00 00 75 06 48 83 c4 20 5d c3 e8 5b 2a ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 41 54 31 c0 ba ff ff ff ff 53 48 83 ec 08 <f0> 0f b1 15 c6 8f 08 00 83 f8 ff 74 35 41 89 c4 85 c0 75 21 31 c0 [ 2516.858400] RSP: 002b:00007fffeccd0da0 EFLAGS: 00010206 [ 2516.858402] RAX: 0000000000000000 RBX: 00005618013d4420 RCX: 0000000000000000 [ 2516.858404] RDX: 00000000ffffffff RSI: 00007fffeccd0de0 RDI: 00005618013d4420 [ 2516.858407] RBP: 00007fffeccd0de0 R08: 000056180015ea20 R09: 0000561801430600 [ 2516.858409] R10: 0000561801430628 R11: 0000000000000005 R12: 00000000000001eb [ 2516.858411] R13: 00005618013f87b0 R14: 00005618013d6fc0 R15: 0000000000000001 [ 2516.858415] Modules linked in: vhost_net vhost tap ebtable_filter ebtables ip6_tables iptable_filter bpfilter bridge stp llc tcp_westwood hid_logitech_hidpp joydev input_leds zfs(PO) edac_mce_amd kvm_amd zunicode(PO) zavl(PO) kvm icp(PO) zlua(PO) nls_iso8859_1 hid_logitech_dj crct10dif_pclmul ghash_clmulni_intel hid_generic zcommon(PO) aesni_intel znvpair(PO) aes_x86_64 crypto_simd cryptd ccp glue_helper usbhid k10temp spl(O) hid mac_hid sch_fq_codel vfio_pci vfio_virqfd vfio_iommu_type1 vfio irqbypass ip_tables x_tables nbd f2fs crc32_pclmul i2c_piix4 nvme r8169 ahci realtek nvme_core libahci gpio_amdpt gpio_generic [ 2516.858448] CR2: 000000000000006c [ 2516.858451] ---[ end trace 89ea7eb88d9d005a ]--- [ 2516.858480] RIP: 0010:arc_kmem_reap_soon+0x52/0xe0 [zfs] [ 2516.858483] Code: 05 5f 28 17 00 85 c0 0f 85 95 00 00 00 45 31 f6 45 31 e4 31 db eb 03 4d 89 ee 4c 89 e0 4c 8b 24 dd 80 79 d3 c0 49 39 c4 74 0d <41> 8b 74 24 6c 4c 89 e7 e8 91 f5 8f ff 4c 8b 2c dd 80 79 cf c0 4d [ 2516.858488] RSP: 0000:ffffbab7c0757958 EFLAGS: 00010207 [ 2516.858490] RAX: ffff8e3d63751800 RBX: 0000000000007000 RCX: 61c8864680b583eb [ 2516.858492] RDX: ffffffffae605d38 RSI: ffff8e3d63750c50 RDI: ffff8e3d63750c70 [ 2516.858495] RBP: ffffbab7c0757978 R08: ffff8e3d63750c50 R09: 000000000002840a [ 2516.858497] R10: ffff8e3d58ca0098 R11: ffff8e3d6e86a8b0 R12: 0000000000000000 [ 2516.858499] R13: ffff8e3d63750c00 R14: ffff8e3d63750c00 R15: 0000000000000000 [ 2516.858502] FS: 00007fb3bd2a7880(0000) GS:ffff8e3d6e740000(0000) knlGS:0000000000000000 [ 2516.858505] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2516.858507] CR2: 000000000000006c CR3: 00000007e86fc000 CR4: 0000000000340ee0 ** Affects: zfsutils (Ubuntu) Importance: Undecided Status: New ** Package changed: libvirt (Ubuntu) => zfsutils (Ubuntu) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1872567 Title: ZFS ARC has memory issue with THP enabled To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/zfsutils/+bug/1872567/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs