James Page asked me to post some findings here: Here’s the trace I’m getting (same as one in comment #10:
[ 5152.142936] device s1 left promiscuous mode [ 5152.427823] BUG: unable to handle kernel NULL pointer dereference at (null) [ 5152.428422] IP: rtmsg_ifa+0x30/0xd0 [ 5152.428816] *pdpt = 0000000033f65001 *pde = 0000000000000000 [ 5152.428820] [ 5152.429682] Oops: 0000 [#1] SMP [ 5152.430046] Modules linked in: veth netconsole openvswitch nf_conntrack_ipv6 nf_nat_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_defrag_ipv6 nf_nat nf_conntrack ppdev snd_hda_codec_generic snd_hda_intel snd_hda_codec joydev snd_hda_core snd_hwdep snd_pcm input_leds serio_raw snd_timer snd pvpanic parport_pc i2c_piix4 soundcore mac_hid parport sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear qxl ttm crc32_pclmul drm_kms_helper pcbc aesni_intel syscopyarea aes_i586 sysfillrect crypto_simd sysimgblt fb_sys_fops psmouse cryptd virtio_net virtio_blk drm pata_acpi floppy [ 5152.433348] CPU: 1 PID: 90 Comm: kworker/u4:3 Tainted: G W 4.13.0-16-generic #19-Ubuntu [ 5152.433852] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 5152.434346] Workqueue: netns cleanup_net [ 5152.434816] task: f17aa100 task.stack: f4ef0000 [ 5152.435302] EIP: rtmsg_ifa+0x30/0xd0 [ 5152.435780] EFLAGS: 00010246 CPU: 1 [ 5152.436254] EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: 014000c0 [ 5152.436764] ESI: 00000000 EDI: f063a6c0 EBP: f4ef1dcc ESP: f4ef1db4 [ 5152.437267] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 [ 5152.437780] CR0: 80050033 CR2: 00000000 CR3: 33c3f4a0 CR4: 001406f0 [ 5152.438311] Call Trace: [ 5152.438816] __inet_del_ifa+0xbb/0x260 [ 5152.439344] ? igmpv3_clear_delrec+0x28/0xa0 [ 5152.439868] inetdev_event+0x22f/0x4e0 [ 5152.440401] ? skb_dequeue+0x5b/0x70 [ 5152.440934] ? wireless_nlevent_flush+0x4c/0x90 [ 5152.441487] notifier_call_chain+0x4e/0x70 [ 5152.442016] raw_notifier_call_chain+0x11/0x20 [ 5152.442554] call_netdevice_notifiers_info+0x2a/0x60 [ 5152.443097] rollback_registered_many+0x21c/0x380 [ 5152.443646] unregister_netdevice_many.part.102+0x10/0x80 [ 5152.444180] default_device_exit_batch+0x134/0x160 [ 5152.444709] ? do_wait_intr_irq+0x80/0x80 [ 5152.445223] ops_exit_list.isra.8+0x4d/0x60 [ 5152.445744] cleanup_net+0x18e/0x260 [ 5152.446264] process_one_work+0x1a0/0x390 [ 5152.446790] worker_thread+0x37/0x440 [ 5152.447321] kthread+0xf3/0x110 [ 5152.447843] ? process_one_work+0x390/0x390 [ 5152.448380] ? kthread_create_on_node+0x20/0x20 [ 5152.448919] ret_from_fork+0x19/0x24 [ 5152.449462] Code: 55 89 e5 57 56 53 89 d7 89 ce 83 ec 0c 85 c9 89 45 e8 c7 45 f0 00 00 00 00 74 06 8b 41 08 89 45 f0 8b 47 0c 31 c9 ba c0 00 40 01 <8b> 00 8b 80 20 03 00 00 6a ff 89 45 ec b8 60 00 00 00 e8 19 46 [ 5152.450719] EIP: rtmsg_ifa+0x30/0xd0 SS:ESP: 0068:f4ef1db4 [ 5152.451308] CR2: 0000000000000000 [ 5152.451885] ---[ end trace 5cdfc95a5b343f5c ]--- Looks to me like 4.13 is missing this set of patches from Cong Wang: commit 623859ae06b85cabba79ce78f0d49e67783d4c34 Merge: 8f56246 35c55fc Author: David S. Miller <da...@davemloft.net> Date: Thu Nov 9 10:03:10 2017 +0900 Merge branch 'net-sched-race-fix' Cong Wang says: ==================== net_sched: close the race between call_rcu() and cleanup_net() This patchset tries to fix the race between call_rcu() and cleanup_net() again. Without holding the netns refcnt the tc_action_net_exit() in netns workqueue could be called before filter destroy works in tc filter workqueue. This patchset moves the netns refcnt from tc actions to tcf_exts, without breaking per-netns tc actions. Patch 1 reverts the previous fix, patch 2 introduces two new API's to help to address the bug and the rest patches switch to the new API's. Please see each patch for details. I was not able to reproduce this bug, but now after adding some delay in filter destroy work I manage to trigger the crash. After this patchset, the crash is not reproducible any more and the debugging printk's show the order is expected too. ==================== Fixes: ddf97ccdd7cb ("net_sched: add network namespace support for tc action Reported-by: Lucas Bates <luc...@mojatatu.com> Cc: Lucas Bates <luc...@mojatatu.com> Cc: Jamal Hadi Salim <j...@mojatatu.com> Cc: Jiri Pirko <j...@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangc...@gmail.com> Signed-off-by: David S. Miller <da...@davemloft.net> Note the comment he makes about “filter destroy work” and how the final function in the trace is __inet_del_ifa(). As you can see from the trace the machine is executing the netns cleanup_net() function when the panic occurs. This series of patches has not been backported to the 4.13.16 kernel. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1736390 Title: openvswitch: kernel opps destroying interfaces on i386 Status in linux package in Ubuntu: Triaged Status in openvswitch package in Ubuntu: New Status in linux source package in Bionic: Triaged Status in openvswitch source package in Bionic: New Bug description: Reproducable on bionic using the autopkgtest's from openvswitch on i386: [ 41.420568] BUG: unable to handle kernel NULL pointer dereference at (null) [ 41.421000] IP: igmp_group_dropped+0x21/0x220 [ 41.421246] *pdpt = 000000001d62c001 *pde = 0000000000000000 [ 41.421659] Oops: 0000 [#1] SMP [ 41.421852] Modules linked in: veth openvswitch nf_conntrack_ipv6 nf_nat_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_defrag_ipv6 nf_nat nf_conntrack libcrc32c 9p fscache ppdev kvm_intel kvm 9pnet_virtio irqbypass input_leds joydev 9pnet parport_pc serio_raw parport i2c_piix4 qemu_fw_cfg mac_hid sch_fq_codel ip_tables x_tables autofs4 btrfs xor raid6_pq psmouse virtio_blk virtio_net pata_acpi floppy [ 41.423855] CPU: 0 PID: 5 Comm: kworker/u2:0 Tainted: G W 4.13.0-18-generic #21-Ubuntu [ 41.424355] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 [ 41.424849] Workqueue: netns cleanup_net [ 41.425071] task: db8fba80 task.stack: dba10000 [ 41.425346] EIP: igmp_group_dropped+0x21/0x220 [ 41.425656] EFLAGS: 00010202 CPU: 0 [ 41.425864] EAX: 00000000 EBX: dd726360 ECX: dba11e6c EDX: 00000002 [ 41.426335] ESI: 00000000 EDI: dd4db500 EBP: dba11dcc ESP: dba11d94 [ 41.426687] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 [ 41.426990] CR0: 80050033 CR2: 00000000 CR3: 1e6d6d60 CR4: 000006f0 [ 41.427340] Call Trace: [ 41.427485] ? __wake_up+0x36/0x40 [ 41.427680] ip_mc_down+0x27/0x90 [ 41.427869] inetdev_event+0x398/0x4e0 [ 41.428082] ? skb_dequeue+0x5b/0x70 [ 41.428286] ? wireless_nlevent_flush+0x4c/0x90 [ 41.428541] notifier_call_chain+0x4e/0x70 [ 41.428772] raw_notifier_call_chain+0x11/0x20 [ 41.429023] call_netdevice_notifiers_info+0x2a/0x60 [ 41.429301] dev_close_many+0x9d/0xe0 [ 41.429509] rollback_registered_many+0xd7/0x380 [ 41.429768] unregister_netdevice_many.part.102+0x10/0x80 [ 41.430075] default_device_exit_batch+0x134/0x160 [ 41.430344] ? do_wait_intr_irq+0x80/0x80 [ 41.430650] ops_exit_list.isra.8+0x4d/0x60 [ 41.430886] cleanup_net+0x18e/0x260 [ 41.431090] process_one_work+0x1a0/0x390 [ 41.431317] worker_thread+0x37/0x450 [ 41.431525] kthread+0xf3/0x110 [ 41.431714] ? process_one_work+0x390/0x390 [ 41.431941] ? kthread_create_on_node+0x20/0x20 [ 41.432187] ret_from_fork+0x19/0x24 [ 41.432382] Code: 90 90 90 90 90 90 90 90 90 90 3e 8d 74 26 00 55 89 e5 57 56 53 89 c3 83 ec 2c 8b 33 65 a1 14 00 00 00 89 45 f0 31 c0 80 7b 4b 00 <8b> 06 8b b8 20 03 00 00 8b 43 04 0f 85 5e 01 00 00 3d e0 00 00 [ 41.433405] EIP: igmp_group_dropped+0x21/0x220 SS:ESP: 0068:dba11d94 [ 41.433750] CR2: 0000000000000000 [ 41.433961] ---[ end trace 595db54cab84070c ]--- system then becomes unresponsive; no further interfaces can be created. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1736390/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp