OK thanks Joseph. We'll run a test against the latest 3.19. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1484919
Title: Kernel oops associated with BIRD/netlink Status in linux package in Ubuntu: Fix Released Status in linux source package in Vivid: In Progress Status in linux source package in Wily: Fix Released Status in linux source package in Xenial: Fix Released Bug description: Scale testing our product, which uses the BIRD BGP daemon, on Google's GCE cloud, we see frequent (40% of hosts) Kernel Oopses and reboots on kernel 3.19.0-25-generic #26~14.04.1-Ubuntu with BIRD running. This is the standard GCE-provided Ubuntu image. If we replace the image with a stock Ubuntu one (kernel 3.13.0-61-generic #100-Ubuntu), installed from ISO, then we do not see the issue. If we stop BIRD then we no longer see the issue. I suspect that this is an issue with the way BIRD is using netlink, triggering a kernel bug. It seems to happen more at scale, when BIRD is doing more with netlink and we have thousands of routes in place. Here's a sample kernel oops: [ 266.033276] BUG: unable to handle kernel paging request at 000000190000003c [ 266.035142] IP: [<ffffffff811d1f0b>] __kmalloc_node_track_caller+0xfb/0x2c0 [ 266.036009] PGD b9e5e067 PUD 0 [ 266.036009] Oops: 0000 [#1] SMP [ 266.036009] Modules linked in: bridge stp llc dummy xt_mac xt_mark nf_conntrack_ipv6 nf_defrag_ipv6 xt_comment xt_set ip_set_hash_ip ip_set nfnetlink ebtable_nat ebtables xt_nat ipip tunnel4 ip_tunnel ipt_REJECT nf_reject_ipv4 xt_conntrack xt_CHECKSUM xt_tcpudp iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip6table_filter ip6_tables iptable_filter ip_tables x_tables nbd ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi dm_crypt ppdev dm_multipath scsi_dh 8250_fintek parport_pc i2c_piix4 mac_hid serio_raw parport crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse virtio_scsi [ 266.036009] CPU: 2 PID: 3456 Comm: bird Tainted: G C 3.19.0-25-generic #26~14.04.1-Ubuntu [ 266.036009] Hardware name: Google Google, BIOS Google 01/01/2011 [ 266.036009] task: ffff8801210775c0 ti: ffff880036a08000 task.ti: ffff880036a08000 [ 266.036009] RIP: 0010:[<ffffffff811d1f0b>] [<ffffffff811d1f0b>] __kmalloc_node_track_caller+0xfb/0x2c0 [ 266.036009] RSP: 0018:ffff880036a0b7f8 EFLAGS: 00010246 [ 266.036009] RAX: 0000000000000000 RBX: 00000000000102d0 RCX: 000000000008b0f6 [ 266.036009] RDX: 000000000008b0f5 RSI: 0000000000000000 RDI: 00000000000171c0 [ 266.036009] RBP: ffff880036a0b848 R08: ffff8801263171c0 R09: ffff880121c01600 [ 266.036009] R10: 0000000000000000 R11: ffff880121c01600 R12: 00000000000102d0 [ 266.036009] R13: 0000000000000180 R14: 00000000ffffffff R15: 000000190000003c [ 266.036009] FS: 00007fd470753740(0000) GS:ffff880126300000(0000) knlGS:0000000000000000 [ 266.036009] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 266.036009] CR2: 000000190000003c CR3: 00000000bb207000 CR4: 00000000001406e0 [ 266.036009] Stack: [ 266.036009] 0000000100000180 00000000000000c3 ffff880121c01600 ffffffff816992ea [ 266.036009] 0000000000000001 ffff8800b1486a00 0000000000000000 00000000000000d0 [ 266.036009] 0000000000000180 00000000ffffffff ffff880036a0b898 ffffffff81697261 [ 266.036009] Call Trace: [ 266.036009] [<ffffffff816992ea>] ? pskb_expand_head+0x6a/0x260 [ 266.036009] [<ffffffff81697261>] __kmalloc_reserve.isra.27+0x31/0x90 [ 266.036009] [<ffffffff816992ea>] pskb_expand_head+0x6a/0x260 [ 266.036009] [<ffffffff816d6d13>] netlink_trim+0xa3/0xe0 [ 266.036009] [<ffffffff816d984e>] netlink_unicast+0x3e/0x200 [ 266.036009] [<ffffffff816da323>] nlmsg_notify+0x93/0xb0 [ 266.036009] [<ffffffff816b8d3e>] rtnl_notify+0x2e/0x40 [ 266.036009] [<ffffffff81727525>] rtmsg_fib+0x115/0x160 [ 266.036009] [<ffffffff8172a09d>] ? trie_rebalance+0x10d/0x130 [ 266.036009] [<ffffffff8172a34a>] fib_table_insert+0x1da/0x8e0 [ 266.036009] [<ffffffff817242a8>] inet_rtm_newroute+0x48/0x60 [ 266.036009] [<ffffffff816b97c5>] rtnetlink_rcv_msg+0x95/0x250 [ 266.036009] [<ffffffff813bb4a6>] ? rhashtable_lookup_compare+0x36/0x70 [ 266.036009] [<ffffffff816d631e>] ? __netlink_lookup+0x3e/0x50 [ 266.036009] [<ffffffff816b9730>] ? rtnetlink_rcv+0x40/0x40 [ 266.036009] [<ffffffff816da271>] netlink_rcv_skb+0xc1/0xe0 [ 266.036009] [<ffffffff816b971c>] rtnetlink_rcv+0x2c/0x40 [ 266.036009] [<ffffffff816d9906>] netlink_unicast+0xf6/0x200 [ 266.036009] [<ffffffff816d9d1c>] netlink_sendmsg+0x30c/0x680 [ 266.036009] [<ffffffff81351610>] ? aa_sk_perm.isra.4+0x70/0x150 [ 266.036009] [<ffffffff8168f2ec>] do_sock_sendmsg+0x8c/0x100 [ 266.036009] [<ffffffff81209a13>] ? __fdget+0x13/0x20 [ 266.036009] [<ffffffff8168f547>] SYSC_sendto+0x157/0x200 [ 266.036009] [<ffffffff81690252>] ? __sys_recvmsg+0x42/0x80 [ 266.036009] [<ffffffff8168fd2e>] SyS_sendto+0xe/0x10 [ 266.036009] [<ffffffff817b668d>] system_call_fastpath+0x16/0x1b [ 266.036009] Code: fb 41 8b 53 18 0f 1f 44 00 00 48 83 c4 28 48 89 d8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 1f 40 00 49 63 41 20 48 8d 4a 01 49 8b 39 <49> 8b 1c 07 4c 89 f8 65 48 0f c7 0f 0f 94 c0 84 c0 0f 84 53 ff [ 266.036009] RIP [<ffffffff811d1f0b>] __kmalloc_node_track_caller+0xfb/0x2c0 [ 266.036009] RSP <ffff880036a0b7f8> [ 266.036009] CR2: 000000190000003c [ 266.131166] ---[ end trace 246ae06038901786 ]--- Running our product on CoreOS, we see similar, but less frequent crashes. Their kernel is 4.1-based: https://github.com/coreos/bugs/issues/435 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1484919/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp