We are having the same bug (I believe) after upgrading from kernel
linux-image-4.4.0-178-generic to linux-image-4.4.0-184-generic.

We have around 100 VMs there are affected. For now, we have rolled back
to the previous kernel. I am not sure why but not all VMs are affected,
from what I have found, it looks like unbound (DNS server) is triggering
the kernel oops our clients environment.

I can help test a new kernel if that could help/be useful. I also have a
kernel dump from linux-crashdump, but I am not currently sure if I am
allow to share it, but I will try to figure it out if needed.

### Our kernel crash
[  128.503474] BUG: unable to handle kernel NULL pointer dereference at 
0000000000000018
[  128.503608] IP: [<ffffffff818288ab>] icmp6_send+0x1fb/0x970
[  128.503673] PGD 80000004275f2067 PUD 427495067 PMD 0
[  128.503736] Oops: 0000 [#1] SMP
[  128.503800] Modules linked in: vmw_vsock_vmci_transport vsock zfs(PO) 
zunicode(PO) zcommon(PO) znvpair(PO) spl(O) zavl(PO) vmw_balloon input_leds 
joydev serio_raw shpchp vmw_vmci i2c_piix4 mac_hid ib_iser rdma_cm iw_cm ib_cm 
ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi 
scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov 
async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 
multipath linear crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel 
aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd vmwgfx ttm 
drm_kms_helper psmouse syscopyarea sysfillrect vmxnet3 sysimgblt vmw_pvscsi 
fb_sys_fops pata_acpi drm ahci libahci fjes
[  128.504798] CPU: 0 PID: 0 Comm: swapper/0 Tainted: P           O    
4.4.0-184-generic #214-Ubuntu
[  128.504990] Hardware name: VMware, Inc. VMware Virtual Platform/440BX 
Desktop Reference Platform, BIOS 6.00 12/12/2018
[  128.505401] task: ffffffff81e13500 ti: ffffffff81e00000 task.ti: 
ffffffff81e00000
[  128.505637] RIP: 0010:[<ffffffff818288ab>]  [<ffffffff818288ab>] 
icmp6_send+0x1fb/0x970
[  128.505892] RSP: 0018:ffff88042d603d00  EFLAGS: 00010246
[  128.506143] RAX: 0000000000000000 RBX: ffff880423804a00 RCX: 0000000000000020
[  128.506409] RDX: 0000000000000001 RSI: 0000000000000200 RDI: ffff880427ce1856
[  128.506686] RBP: ffff88042d603e20 R08: 0000000000000000 R09: ffff880427ce1866
[  128.506962] R10: 0000000000000080 R11: 0000000000000000 R12: ffff880427ce184e
[  128.507246] R13: ffffffff81efb6c0 R14: 0000000000000001 R15: 0000000000000003
[  128.507539] FS:  0000000000000000(0000) GS:ffff88042d600000(0000) 
knlGS:0000000000000000
[  128.507842] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  128.508176] CR2: 0000000000000018 CR3: 0000000427782000 CR4: 0000000000360670
[  128.508530] Stack:
[  128.508859]  0000000000000001 0000000000000000 0000000000000000 
4a7e338b0c959fd7
[  128.509212]  ffff88042b139a38 ffff88042b139a80 000000002b139a20 
ffff880427ce1856
[  128.509577]  ffff880400000001 ffffffff00000000 ffff880427ce1866 
0000000000000000
[  128.509945] Call Trace:
[  128.510314]  <IRQ>
[  128.510324]  [<ffffffff81868280>] ? _raw_spin_unlock_bh+0x20/0x50
[  128.511089]  [<ffffffff81841ed1>] icmpv6_send+0x21/0x30
[  128.511483]  [<ffffffff8182fe95>] ip6_expire_frag_queue+0x115/0x1b0
[  128.511892]  [<ffffffff8182ff30>] ? ip6_expire_frag_queue+0x1b0/0x1b0
[  128.512301]  [<ffffffff8182ff4f>] ip6_frag_expire+0x1f/0x30
[  128.512723]  [<ffffffff810f57c7>] call_timer_fn+0x37/0x140
[  128.513116]  [<ffffffff8182ff30>] ? ip6_expire_frag_queue+0x1b0/0x1b0
[  128.513509]  [<ffffffff810f70d4>] run_timer_softirq+0x234/0x330
[  128.513902]  [<ffffffff8108b509>] __do_softirq+0x109/0x2b0
[  128.514291]  [<ffffffff8108b825>] irq_exit+0xa5/0xb0
[  128.514673]  [<ffffffff8186c250>] smp_apic_timer_interrupt+0x50/0x70
[  128.515045]  [<ffffffff81869994>] apic_timer_interrupt+0xd4/0xe0
[  128.515414]  <EOI>
[  128.515423]  [<ffffffff81039130>] ? speculation_ctrl_update_tif+0x80/0x80
[  128.516123]  [<ffffffff81067af2>] ? native_safe_halt+0x12/0x20
[  128.516466]  [<ffffffff8103914e>] default_idle+0x1e/0xe0
[  128.516802]  [<ffffffff81039ff5>] arch_cpu_idle+0x15/0x20
[  128.517124]  [<ffffffff810cc03a>] default_idle_call+0x2a/0x40
[  128.517441]  [<ffffffff810cc3b3>] cpu_startup_entry+0x303/0x360
[  128.517757]  [<ffffffff8185bc2c>] rest_init+0x7c/0x80
[  128.518055]  [<ffffffff81f68fb7>] start_kernel+0x483/0x4a4
[  128.518367]  [<ffffffff81f68120>] ? early_idt_handler_array+0x120/0x120
[  128.518665]  [<ffffffff81f682da>] x86_64_start_reservations+0x2a/0x2c
[  128.518952]  [<ffffffff81f68426>] x86_64_start_kernel+0x14a/0x16d
[  128.519234] Code: 8b 5c 24 40 75 46 f6 c2 02 74 05 f6 c2 30 75 3c 48 8b 43 
58 44 89 5c 24 34 89 54 24 40 44 89 44 24 48 4c 89 4c 24 60 48 83 e0 fe <48> 8b 
78 18 e8 4c 0b 03 00 41 89 c2 4c 8b 4c 24 60 44 8b 44 24
[  128.520205] RIP  [<ffffffff818288ab>] icmp6_send+0x1fb/0x970
[  128.520511]  RSP <ffff88042d603d00>
[  128.520818] CR2: 0000000000000018

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1883498

Title:
  Frequent Panic in ip6_expire_frag_queue->icmpv6_send on
  4.4.0-184-generic

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  I happened to do an upgrade on a number of servers last week. Some of
  them got 4.4.0-179-generic and the ones upgraded a bit later during
  the week got 4.4.0-184-generic as it was just released. The ones with
  4.4.0-184-generic started getting stuck. With linux-crashdump
  installed I obtained the dmesgs and crash dumps. The backtrace appears
  somewhat similar to #202669 but that one only happened on bare
  hardware for us - this one is on KVM virtual instances. #202669
  paniced in icmpv6_route_lookup and this one dies already in
  icmpv6_send.

  Between 2020-06-11 and 2020-06-15, on a set of 12 VMs running
  4.4.0-184-generic, there were 85 crashes like this, on servers with
  noticeable IPv6 traffic. All of the 12 VMs with 4.4.0-184-generic
  crashed at least once. (There are more than 12 VMs experiencing this,
  this is just the set I had linux-crashdump on.)

  [57063.487084] BUG: unable to handle kernel NULL pointer dereference at 
0000000000000018
  [57063.487184] IP: [<ffffffff818288ab>] icmp6_send+0x1fb/0x970
  [57063.487218] PGD 0 
  [57063.487231] Oops: 0000 [#1] SMP 
  [57063.488665] Call Trace:
  [57063.488679]  <IRQ> 
  [57063.488705]  [<ffffffff81756ee8>] ? __netif_receive_skb+0x18/0x60
  [57063.488739]  [<ffffffff810c3758>] ? task_tick_fair+0x4c8/0x8e0
  [57063.488771]  [<ffffffff81868280>] ? _raw_spin_unlock_bh+0x20/0x50
  [57063.488802]  [<ffffffff81841ed1>] icmpv6_send+0x21/0x30
  [57063.488829]  [<ffffffff8182fe95>] ip6_expire_frag_queue+0x115/0x1b0
  [57063.488862]  [<ffffffffc0366260>] ? nf_ct_net_exit+0x50/0x50 
[nf_defrag_ipv6]
  [57063.488897]  [<ffffffffc036627f>] nf_ct_frag6_expire+0x1f/0x30 
[nf_defrag_ipv6]
  [57063.488937]  [<ffffffff810f57c7>] call_timer_fn+0x37/0x140
  [57063.488965]  [<ffffffffc0366260>] ? nf_ct_net_exit+0x50/0x50 
[nf_defrag_ipv6]
  [57063.489002]  [<ffffffff810f70d4>] run_timer_softirq+0x234/0x330
  ...

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1883498/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to