We are having the same bug (I believe) after upgrading from kernel linux-image-4.4.0-178-generic to linux-image-4.4.0-184-generic.
We have around 100 VMs there are affected. For now, we have rolled back to the previous kernel. I am not sure why but not all VMs are affected, from what I have found, it looks like unbound (DNS server) is triggering the kernel oops our clients environment. I can help test a new kernel if that could help/be useful. I also have a kernel dump from linux-crashdump, but I am not currently sure if I am allow to share it, but I will try to figure it out if needed. ### Our kernel crash [ 128.503474] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 [ 128.503608] IP: [<ffffffff818288ab>] icmp6_send+0x1fb/0x970 [ 128.503673] PGD 80000004275f2067 PUD 427495067 PMD 0 [ 128.503736] Oops: 0000 [#1] SMP [ 128.503800] Modules linked in: vmw_vsock_vmci_transport vsock zfs(PO) zunicode(PO) zcommon(PO) znvpair(PO) spl(O) zavl(PO) vmw_balloon input_leds joydev serio_raw shpchp vmw_vmci i2c_piix4 mac_hid ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd vmwgfx ttm drm_kms_helper psmouse syscopyarea sysfillrect vmxnet3 sysimgblt vmw_pvscsi fb_sys_fops pata_acpi drm ahci libahci fjes [ 128.504798] CPU: 0 PID: 0 Comm: swapper/0 Tainted: P O 4.4.0-184-generic #214-Ubuntu [ 128.504990] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018 [ 128.505401] task: ffffffff81e13500 ti: ffffffff81e00000 task.ti: ffffffff81e00000 [ 128.505637] RIP: 0010:[<ffffffff818288ab>] [<ffffffff818288ab>] icmp6_send+0x1fb/0x970 [ 128.505892] RSP: 0018:ffff88042d603d00 EFLAGS: 00010246 [ 128.506143] RAX: 0000000000000000 RBX: ffff880423804a00 RCX: 0000000000000020 [ 128.506409] RDX: 0000000000000001 RSI: 0000000000000200 RDI: ffff880427ce1856 [ 128.506686] RBP: ffff88042d603e20 R08: 0000000000000000 R09: ffff880427ce1866 [ 128.506962] R10: 0000000000000080 R11: 0000000000000000 R12: ffff880427ce184e [ 128.507246] R13: ffffffff81efb6c0 R14: 0000000000000001 R15: 0000000000000003 [ 128.507539] FS: 0000000000000000(0000) GS:ffff88042d600000(0000) knlGS:0000000000000000 [ 128.507842] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 128.508176] CR2: 0000000000000018 CR3: 0000000427782000 CR4: 0000000000360670 [ 128.508530] Stack: [ 128.508859] 0000000000000001 0000000000000000 0000000000000000 4a7e338b0c959fd7 [ 128.509212] ffff88042b139a38 ffff88042b139a80 000000002b139a20 ffff880427ce1856 [ 128.509577] ffff880400000001 ffffffff00000000 ffff880427ce1866 0000000000000000 [ 128.509945] Call Trace: [ 128.510314] <IRQ> [ 128.510324] [<ffffffff81868280>] ? _raw_spin_unlock_bh+0x20/0x50 [ 128.511089] [<ffffffff81841ed1>] icmpv6_send+0x21/0x30 [ 128.511483] [<ffffffff8182fe95>] ip6_expire_frag_queue+0x115/0x1b0 [ 128.511892] [<ffffffff8182ff30>] ? ip6_expire_frag_queue+0x1b0/0x1b0 [ 128.512301] [<ffffffff8182ff4f>] ip6_frag_expire+0x1f/0x30 [ 128.512723] [<ffffffff810f57c7>] call_timer_fn+0x37/0x140 [ 128.513116] [<ffffffff8182ff30>] ? ip6_expire_frag_queue+0x1b0/0x1b0 [ 128.513509] [<ffffffff810f70d4>] run_timer_softirq+0x234/0x330 [ 128.513902] [<ffffffff8108b509>] __do_softirq+0x109/0x2b0 [ 128.514291] [<ffffffff8108b825>] irq_exit+0xa5/0xb0 [ 128.514673] [<ffffffff8186c250>] smp_apic_timer_interrupt+0x50/0x70 [ 128.515045] [<ffffffff81869994>] apic_timer_interrupt+0xd4/0xe0 [ 128.515414] <EOI> [ 128.515423] [<ffffffff81039130>] ? speculation_ctrl_update_tif+0x80/0x80 [ 128.516123] [<ffffffff81067af2>] ? native_safe_halt+0x12/0x20 [ 128.516466] [<ffffffff8103914e>] default_idle+0x1e/0xe0 [ 128.516802] [<ffffffff81039ff5>] arch_cpu_idle+0x15/0x20 [ 128.517124] [<ffffffff810cc03a>] default_idle_call+0x2a/0x40 [ 128.517441] [<ffffffff810cc3b3>] cpu_startup_entry+0x303/0x360 [ 128.517757] [<ffffffff8185bc2c>] rest_init+0x7c/0x80 [ 128.518055] [<ffffffff81f68fb7>] start_kernel+0x483/0x4a4 [ 128.518367] [<ffffffff81f68120>] ? early_idt_handler_array+0x120/0x120 [ 128.518665] [<ffffffff81f682da>] x86_64_start_reservations+0x2a/0x2c [ 128.518952] [<ffffffff81f68426>] x86_64_start_kernel+0x14a/0x16d [ 128.519234] Code: 8b 5c 24 40 75 46 f6 c2 02 74 05 f6 c2 30 75 3c 48 8b 43 58 44 89 5c 24 34 89 54 24 40 44 89 44 24 48 4c 89 4c 24 60 48 83 e0 fe <48> 8b 78 18 e8 4c 0b 03 00 41 89 c2 4c 8b 4c 24 60 44 8b 44 24 [ 128.520205] RIP [<ffffffff818288ab>] icmp6_send+0x1fb/0x970 [ 128.520511] RSP <ffff88042d603d00> [ 128.520818] CR2: 0000000000000018 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1883498 Title: Frequent Panic in ip6_expire_frag_queue->icmpv6_send on 4.4.0-184-generic Status in linux package in Ubuntu: Confirmed Bug description: I happened to do an upgrade on a number of servers last week. Some of them got 4.4.0-179-generic and the ones upgraded a bit later during the week got 4.4.0-184-generic as it was just released. The ones with 4.4.0-184-generic started getting stuck. With linux-crashdump installed I obtained the dmesgs and crash dumps. The backtrace appears somewhat similar to #202669 but that one only happened on bare hardware for us - this one is on KVM virtual instances. #202669 paniced in icmpv6_route_lookup and this one dies already in icmpv6_send. Between 2020-06-11 and 2020-06-15, on a set of 12 VMs running 4.4.0-184-generic, there were 85 crashes like this, on servers with noticeable IPv6 traffic. All of the 12 VMs with 4.4.0-184-generic crashed at least once. (There are more than 12 VMs experiencing this, this is just the set I had linux-crashdump on.) [57063.487084] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 [57063.487184] IP: [<ffffffff818288ab>] icmp6_send+0x1fb/0x970 [57063.487218] PGD 0 [57063.487231] Oops: 0000 [#1] SMP [57063.488665] Call Trace: [57063.488679] <IRQ> [57063.488705] [<ffffffff81756ee8>] ? __netif_receive_skb+0x18/0x60 [57063.488739] [<ffffffff810c3758>] ? task_tick_fair+0x4c8/0x8e0 [57063.488771] [<ffffffff81868280>] ? _raw_spin_unlock_bh+0x20/0x50 [57063.488802] [<ffffffff81841ed1>] icmpv6_send+0x21/0x30 [57063.488829] [<ffffffff8182fe95>] ip6_expire_frag_queue+0x115/0x1b0 [57063.488862] [<ffffffffc0366260>] ? nf_ct_net_exit+0x50/0x50 [nf_defrag_ipv6] [57063.488897] [<ffffffffc036627f>] nf_ct_frag6_expire+0x1f/0x30 [nf_defrag_ipv6] [57063.488937] [<ffffffff810f57c7>] call_timer_fn+0x37/0x140 [57063.488965] [<ffffffffc0366260>] ? nf_ct_net_exit+0x50/0x50 [nf_defrag_ipv6] [57063.489002] [<ffffffff810f70d4>] run_timer_softirq+0x234/0x330 ... To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1883498/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp