Public bug reported:

Problem: erratic host behaviour after "kernel NULL pointer dereference" appears
Config:
  host = Ryzen 3600 w/2x16GB, ZFS RAIDZ1 5 vdevs
  guest = Windows 7 w/16GB, zvol 64GB+512GB, QXL 16MB (planning for vfio-GPU-pt)
Behavior: per dmesg, seems related to THP compacting + ARC memory allocation.  
once deference issue happens, host will become erratic.  Reboot host lengthen 
from 60s to >500s, and sometimes bash will die as well.

============ /etc/modprobe.d/*.conf ==================
options zfs                     zfs_arc_max=4294967296
options zfs                     zfs_arc_min=268435456
options zfs                     zfs_arc_sys_free=268435456
options vfio_iommu_type1        allow_unsafe_interrupts=1 disable_hugepages=1
options kvm                     ignore_msrs=1 report_ignored_msrs=0

============ free -h ==================
              total        used        free      shared  buff/cache   available
Mem:           31Gi        28Gi       2.4Gi       1.0Mi       617Mi       2.7Gi
Swap:          84Gi       0.0Ki        84Gi

============ THP info ==================
/sys/kernel/mm/transparent_hugepage/enabled = always [madvise] never
/sys/kernel/mm/transparent_hugepage/defrag = always defer defer+madvise 
[madvise] never
AnonHugePages:  16322560 kB
Hugepagesize:       2048 kB
nr_anon_transparent_hugepages 7970
thp_fault_alloc 23907
thp_fault_fallback 0
thp_collapse_alloc 1
thp_collapse_alloc_failed 0
thp_file_alloc 0
thp_file_mapped 0
thp_split_page 9
thp_split_page_failed 164
thp_deferred_split_page 15938
thp_split_pmd 9
thp_split_pud 0
thp_zero_page_alloc 1
thp_zero_page_alloc_failed 0
thp_swpout 0
thp_swpout_fallback 0

============ dmesg trace log ==================
[ 2516.858188] BUG: kernel NULL pointer dereference, address: 000000000000006c
[ 2516.858194] #PF: supervisor read access in kernel mode
[ 2516.858196] #PF: error_code(0x0000) - not-present page
[ 2516.858198] PGD 0 P4D 0 
[ 2516.858201] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 2516.858204] CPU: 5 PID: 491 Comm: systemd-udevd Tainted: P           O      
5.3.0-46-lowlatency #38-Ubuntu
[ 2516.858207] Hardware name: To Be Filled By O.E.M. To Be Filled By 
O.E.M./B450M Pro4, BIOS P3.90 12/09/2019
[ 2516.858246] RIP: 0010:arc_kmem_reap_soon+0x52/0xe0 [zfs]
[ 2516.858249] Code: 05 5f 28 17 00 85 c0 0f 85 95 00 00 00 45 31 f6 45 31 e4 
31 db eb 03 4d 89 ee 4c 89 e0 4c 8b 24 dd 80 79 d3 c0 49 39 c4 74 0d <41> 8b 74 
24 6c 4c 89 e7 e8 91 f5 8f ff 4c 8b 2c dd 80 79 cf c0 4d
[ 2516.858253] RSP: 0000:ffffbab7c0757958 EFLAGS: 00010207
[ 2516.858256] RAX: ffff8e3d63751800 RBX: 0000000000007000 RCX: 61c8864680b583eb
[ 2516.858258] RDX: ffffffffae605d38 RSI: ffff8e3d63750c50 RDI: ffff8e3d63750c70
[ 2516.858260] RBP: ffffbab7c0757978 R08: ffff8e3d63750c50 R09: 000000000002840a
[ 2516.858262] R10: ffff8e3d58ca0098 R11: ffff8e3d6e86a8b0 R12: 0000000000000000
[ 2516.858264] R13: ffff8e3d63750c00 R14: ffff8e3d63750c00 R15: 0000000000000000
[ 2516.858267] FS:  00007fb3bd2a7880(0000) GS:ffff8e3d6e740000(0000) 
knlGS:0000000000000000
[ 2516.858269] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2516.858272] CR2: 000000000000006c CR3: 00000007e86fc000 CR4: 0000000000340ee0
[ 2516.858274] Call Trace:
[ 2516.858304]  __arc_shrinker_func.isra.0+0xf4/0x190 [zfs]
[ 2516.858334]  arc_shrinker_func_scan_objects+0x15/0x30 [zfs]
[ 2516.858339]  do_shrink_slab+0x14f/0x2e0
[ 2516.858343]  shrink_slab+0xac/0x260
[ 2516.858345]  shrink_node+0xf4/0x490
[ 2516.858349]  node_reclaim+0x1f6/0x340
[ 2516.858353]  get_page_from_freelist+0xb8/0x390
[ 2516.858357]  __alloc_pages_nodemask+0x166/0x320
[ 2516.858362]  alloc_pages_vma+0xda/0x190
[ 2516.858366]  wp_page_copy+0x88/0x850
[ 2516.858368]  ? reuse_swap_page+0x70/0x370
[ 2516.858371]  do_wp_page+0x90/0x620
[ 2516.858374]  __handle_mm_fault+0x76f/0x7a0
[ 2516.858377]  handle_mm_fault+0xd4/0x1f0
[ 2516.858381]  do_user_addr_fault+0x201/0x450
[ 2516.858384]  __do_page_fault+0x58/0x90
[ 2516.858386]  do_page_fault+0x2c/0x100
[ 2516.858390]  page_fault+0x34/0x40
[ 2516.858392] RIP: 0033:0x5618000f015e
[ 2516.858395] Code: 28 00 00 00 75 06 48 83 c4 20 5d c3 e8 5b 2a ff ff 66 66 
2e 0f 1f 84 00 00 00 00 00 41 54 31 c0 ba ff ff ff ff 53 48 83 ec 08 <f0> 0f b1 
15 c6 8f 08 00 83 f8 ff 74 35 41 89 c4 85 c0 75 21 31 c0
[ 2516.858400] RSP: 002b:00007fffeccd0da0 EFLAGS: 00010206
[ 2516.858402] RAX: 0000000000000000 RBX: 00005618013d4420 RCX: 0000000000000000
[ 2516.858404] RDX: 00000000ffffffff RSI: 00007fffeccd0de0 RDI: 00005618013d4420
[ 2516.858407] RBP: 00007fffeccd0de0 R08: 000056180015ea20 R09: 0000561801430600
[ 2516.858409] R10: 0000561801430628 R11: 0000000000000005 R12: 00000000000001eb
[ 2516.858411] R13: 00005618013f87b0 R14: 00005618013d6fc0 R15: 0000000000000001
[ 2516.858415] Modules linked in: vhost_net vhost tap ebtable_filter ebtables 
ip6_tables iptable_filter bpfilter bridge stp llc tcp_westwood 
hid_logitech_hidpp joydev input_leds zfs(PO) edac_mce_amd kvm_amd zunicode(PO) 
zavl(PO) kvm icp(PO) zlua(PO) nls_iso8859_1 hid_logitech_dj crct10dif_pclmul 
ghash_clmulni_intel hid_generic zcommon(PO) aesni_intel znvpair(PO) aes_x86_64 
crypto_simd cryptd ccp glue_helper usbhid k10temp spl(O) hid mac_hid 
sch_fq_codel vfio_pci vfio_virqfd vfio_iommu_type1 vfio irqbypass ip_tables 
x_tables nbd f2fs crc32_pclmul i2c_piix4 nvme r8169 ahci realtek nvme_core 
libahci gpio_amdpt gpio_generic
[ 2516.858448] CR2: 000000000000006c
[ 2516.858451] ---[ end trace 89ea7eb88d9d005a ]---
[ 2516.858480] RIP: 0010:arc_kmem_reap_soon+0x52/0xe0 [zfs]
[ 2516.858483] Code: 05 5f 28 17 00 85 c0 0f 85 95 00 00 00 45 31 f6 45 31 e4 
31 db eb 03 4d 89 ee 4c 89 e0 4c 8b 24 dd 80 79 d3 c0 49 39 c4 74 0d <41> 8b 74 
24 6c 4c 89 e7 e8 91 f5 8f ff 4c 8b 2c dd 80 79 cf c0 4d
[ 2516.858488] RSP: 0000:ffffbab7c0757958 EFLAGS: 00010207
[ 2516.858490] RAX: ffff8e3d63751800 RBX: 0000000000007000 RCX: 61c8864680b583eb
[ 2516.858492] RDX: ffffffffae605d38 RSI: ffff8e3d63750c50 RDI: ffff8e3d63750c70
[ 2516.858495] RBP: ffffbab7c0757978 R08: ffff8e3d63750c50 R09: 000000000002840a
[ 2516.858497] R10: ffff8e3d58ca0098 R11: ffff8e3d6e86a8b0 R12: 0000000000000000
[ 2516.858499] R13: ffff8e3d63750c00 R14: ffff8e3d63750c00 R15: 0000000000000000
[ 2516.858502] FS:  00007fb3bd2a7880(0000) GS:ffff8e3d6e740000(0000) 
knlGS:0000000000000000
[ 2516.858505] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2516.858507] CR2: 000000000000006c CR3: 00000007e86fc000 CR4: 0000000000340ee0

** Affects: zfsutils (Ubuntu)
     Importance: Undecided
         Status: New

** Package changed: libvirt (Ubuntu) => zfsutils (Ubuntu)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1872567

Title:
  ZFS ARC has memory issue with THP enabled

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfsutils/+bug/1872567/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to