Hello, I'm using openvswitch on kernel 3.12.28 and опенжсвитцх 2.3.1 recently got the following warnings from the kernel:
[8003509.804409] ------------[ cut here ]------------ [8003509.804653] WARNING: CPU: 28 PID: 12584 at mm/slub.c:3318 ksize+0xbd/0xc0() [8003509.804880] Modules linked in: xt_REDIRECT tcp_diag inet_diag act_police cls_basic sch_ingress xt_iprange xt_multiport xt_pkttype xt_state veth openvswitch gre vxlan ip_tunnel xt_owner xt_conntrack iptable_mangle xt_nat iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat xt_CT nf_conntrack iptable_raw ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core ext2 dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio dm_mirror dm_region_hash dm_log i2c_i801 lpc_ich mfd_core ioapic ioatdma igb dca ipmi_devintf ipmi_si ipmi_msghandler megaraid_sas [8003509.806801] CPU: 28 PID: 12584 Comm: handler42 Not tainted 3.12.28-clouder7 #1 [8003509.807026] Hardware name: Supermicro X9DRD-7LN4F(-JBOD)/X9DRD-EF/X9DRD-7LN4F, BIOS 3.2 01/16/2015 [8003509.807465] 0000000000000cf6 ffff883fcfc75858 ffffffff815717b9 0000000000000cf6 [8003509.807699] 0000000000000000 ffff883fcfc75898 ffffffff810786c2 ffff883fcfc75960 [8003509.807932] ffffea005264d5c0 ffff883fcfc759b0 0000000000000000 0000000000000008 [8003509.808163] Call Trace: [8003509.808385] [<ffffffff815717b9>] dump_stack+0x49/0x60 [8003509.808611] [<ffffffff810786c2>] warn_slowpath_common+0x82/0xb0 [8003509.808836] [<ffffffff81078705>] warn_slowpath_null+0x15/0x20 [8003509.809062] [<ffffffff8113a0dd>] ksize+0xbd/0xc0 [8003509.809290] [<ffffffffa0290390>] reserve_sfa_size+0x30/0xe0 [openvswitch] [8003509.809518] [<ffffffffa02907a3>] validate_and_copy_actions+0x1e3/0x530 [openvswitch] [8003509.809748] [<ffffffff8113c1f1>] ? __kmalloc+0x31/0x190 [8003509.809973] [<ffffffffa0290cc3>] ovs_packet_cmd_execute+0x1a3/0x230 [openvswitch] [8003509.810202] [<ffffffff814e8d61>] genl_family_rcv_msg+0x221/0x390 [8003509.810432] [<ffffffff814e8ed0>] ? genl_family_rcv_msg+0x390/0x390 [8003509.810660] [<ffffffff814e8f2b>] genl_rcv_msg+0x5b/0xa0 [8003509.810885] [<ffffffff814e72b9>] netlink_rcv_skb+0x99/0xc0 [8003509.811110] [<ffffffff814e87d7>] genl_rcv+0x27/0x40 [8003509.811335] [<ffffffff814e634f>] netlink_unicast+0x10f/0x190 [8003509.811563] [<ffffffff814e7cf3>] netlink_sendmsg+0x2c3/0x750 [8003509.811788] [<ffffffff811666e0>] ? __pollwait+0xf0/0xf0 [8003509.812014] [<ffffffff814a620b>] sock_sendmsg+0x8b/0xb0 [8003509.812239] [<ffffffff811666e0>] ? __pollwait+0xf0/0xf0 [8003509.812465] [<ffffffff8157583b>] ? _raw_spin_lock_bh+0x1b/0x40 [8003509.812689] [<ffffffff81575700>] ? _raw_spin_unlock_bh+0x10/0x20 [8003509.812916] [<ffffffff814b3091>] ? verify_iovec+0x61/0xe0 [8003509.813143] [<ffffffff814a70d1>] ___sys_sendmsg+0x411/0x430 [8003509.813373] [<ffffffff81193ca9>] ? ep_scan_ready_list+0x189/0x1b0 [8003509.813600] [<ffffffff81193e2f>] ? ep_poll+0x13f/0x370 [8003509.813825] [<ffffffff81089c4b>] ? k_getrusage+0x13b/0x380 [8003509.814047] [<ffffffff814a72f4>] __sys_sendmsg+0x44/0x80 [8003509.814268] [<ffffffff814a7344>] SyS_sendmsg+0x14/0x20 [8003509.819684] [<ffffffff815763a2>] system_call_fastpath+0x16/0x1b [8003509.819910] ---[ end trace af1b9000d8cc8f35 ]--- And almost instantly after that: [8003509.820257] ------------[ cut here ]------------ [8003509.820480] kernel BUG at mm/slub.c:3338! [8003509.820699] invalid opcode: 0000 [#1] SMP [8003509.820922] Modules linked in: xt_REDIRECT tcp_diag inet_diag act_police cls_basic sch_ingress xt_iprange xt_multiport xt_pkttype xt_state veth openvswitch gre vxlan ip_tunnel xt_owner xt_conntrack iptable_mangle xt_nat iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat xt_CT nf_conntrack iptable_raw ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core ext2 dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio dm_mirror dm_region_hash dm_log i2c_i801 lpc_ich mfd_core ioapic ioatdma igb dca ipmi_devintf ipmi_si ipmi_msghandler megaraid_sas [8003509.822840] CPU: 28 PID: 12584 Comm: handler42 Tainted: G W 3.12.28-clouder7 #1 [8003509.823068] Hardware name: Supermicro X9DRD-7LN4F(-JBOD)/X9DRD-EF/X9DRD-7LN4F, BIOS 3.2 01/16/2015 [8003509.823514] task: ffff883fce76dac0 ti: ffff883fcfc74000 task.ti: ffff883fcfc74000 [8003509.823739] RIP: 0010:[<ffffffff8113a9b2>] [<ffffffff8113a9b2>] kfree+0xe2/0xf0 [8003509.823970] RSP: 0018:ffff883fcfc75938 EFLAGS: 00010246 [8003509.824194] RAX: 02fc000000000824 RBX: ffff880261eb4cf0 RCX: ffff883fce76dac0 [8003509.824423] RDX: 0000000000402140 RSI: 0000000000000000 RDI: ffff8814993577e0 [8003509.824647] RBP: ffff883fcfc75948 R08: ffff881fffc91640 R09: ffffea005264d5c0 [8003509.824874] R10: ffffffff814b04ca R11: 0000000000000000 R12: 0000000000000000 [8003509.825098] R13: ffff880261eb4cf0 R14: ffff880261eb4d28 R15: ffff8818a080ec00 [8003509.825327] FS: 00007fa9c9ffb700(0000) GS:ffff881fffc80000(0000) knlGS:0000000000000000 [8003509.825556] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [8003509.825780] CR2: ffffffffff600400 CR3: 0000003fce560000 CR4: 00000000001407e0 [8003509.826007] Stack: [8003509.826225] ffff883fcfc75978 ffff880261eb4cf0 ffff883fcfc75968 ffffffffa0296c98 [8003509.826459] ffff883fcfc75978 ffff880261eb4cf0 ffff883fcfc75988 ffffffffa0296ce9 [8003509.826693] ffff8818a080ff00 ffff883fce2f3300 ffff883fcfc759e8 ffffffffa0290d2c [8003509.826926] Call Trace: [8003509.827150] [<ffffffffa0296c98>] __flow_free+0x18/0x30 [openvswitch] [8003509.827377] [<ffffffffa0296ce9>] ovs_flow_free+0x39/0x70 [openvswitch] [8003509.827609] [<ffffffffa0290d2c>] ovs_packet_cmd_execute+0x20c/0x230 [openvswitch] [8003509.827842] [<ffffffff814e8d61>] genl_family_rcv_msg+0x221/0x390 [8003509.828067] [<ffffffff814e8ed0>] ? genl_family_rcv_msg+0x390/0x390 [8003509.828288] [<ffffffff814e8f2b>] genl_rcv_msg+0x5b/0xa0 [8003509.828508] [<ffffffff814e72b9>] netlink_rcv_skb+0x99/0xc0 [8003509.828729] [<ffffffff814e87d7>] genl_rcv+0x27/0x40 [8003509.828949] [<ffffffff814e634f>] netlink_unicast+0x10f/0x190 [8003509.829170] [<ffffffff814e7cf3>] netlink_sendmsg+0x2c3/0x750 [8003509.829393] [<ffffffff811666e0>] ? __pollwait+0xf0/0xf0 [8003509.829615] [<ffffffff814a620b>] sock_sendmsg+0x8b/0xb0 [8003509.829837] [<ffffffff811666e0>] ? __pollwait+0xf0/0xf0 [8003509.830063] [<ffffffff8157583b>] ? _raw_spin_lock_bh+0x1b/0x40 [8003509.830289] [<ffffffff81575700>] ? _raw_spin_unlock_bh+0x10/0x20 [8003509.830517] [<ffffffff814b3091>] ? verify_iovec+0x61/0xe0 [8003509.830742] [<ffffffff814a70d1>] ___sys_sendmsg+0x411/0x430 [8003509.830971] [<ffffffff81193ca9>] ? ep_scan_ready_list+0x189/0x1b0 [8003509.831199] [<ffffffff81193e2f>] ? ep_poll+0x13f/0x370 [8003509.831426] [<ffffffff81089c4b>] ? k_getrusage+0x13b/0x380 [8003509.831649] [<ffffffff814a72f4>] __sys_sendmsg+0x44/0x80 [8003509.831876] [<ffffffff814a7344>] SyS_sendmsg+0x14/0x20 [8003509.832102] [<ffffffff815763a2>] system_call_fastpath+0x16/0x1b [8003509.832327] Code: ce 4c 89 d7 e8 f0 fc ff ff eb d6 66 41 f7 01 00 c0 74 18 49 8b 01 31 f6 f6 c4 40 74 04 41 8b 71 68 4c 89 cf e8 f0 3e fc ff eb b6 <0f> 0b eb fe 66 2e 0f 1f 84 00 00 00 00 00 55 8b 05 79 33 ab 00 [8003509.833151] RIP [<ffffffff8113a9b2>] kfree+0xe2/0xf0 [8003509.833380] RSP <ffff883fcfc75938> [8003509.833972] ---[ end trace af1b9000d8cc8f36 ]--- It goes without saying that when this occurs the machine loses connectivity on all interface which are hooked up in an openvswitch bridge and ultimately the machine crashes. I did some review of the surrounding code, particularly in the ovs_packet_cmd_execute->validate_and_copy_actions -> copy_action -> reserve_sfa_size -> ksize call chain and it is pretty clear that the sfa pointer is indeed allocated from the heap (it's the acts in ovs_packet_cmd_execute, which is allocated in ovs_flow_actions_alloc), yet the sanity check in the kernel fails to see if this is indeed a slab allocated memory. Same thing with the second bug splat. The flow is being allocated with ovs_flow_alloc() and by the time it has to be freed it is corrupted. I can see there are a lot of calculations happening with the netlink headers and packets and frankly this is making me a bit uneasy as it seems error prone. So does any of you have experienced similar corruptions? Regards, Nikolay _______________________________________________ discuss mailing list discuss@openvswitch.org http://openvswitch.org/mailman/listinfo/discuss