So far I've only seen it once but I will keep monitoring the
situation. Unfortunately even if I observe this crash again it most
likely will be on live environment so enabling the SLUB debugging aid
would be out of the question. Do you have any ideas how to stress this
particular code path artificially ?

On Sat, Sep 26, 2015 at 2:26 AM, Pravin Shelar <pshe...@nicira.com> wrote:
> On Fri, Sep 25, 2015 at 5:00 AM, Nikolay Borisov
> <n.bori...@siteground.com> wrote:
>> Hello,
>>
>> I'm using openvswitch on kernel 3.12.28 and опенжсвитцх 2.3.1 recently got 
>> the following warnings from the kernel:
>>
>> [8003509.804409] ------------[ cut here ]------------
>> [8003509.804653] WARNING: CPU: 28 PID: 12584 at mm/slub.c:3318 
>> ksize+0xbd/0xc0()
>> [8003509.804880] Modules linked in: xt_REDIRECT tcp_diag inet_diag 
>> act_police cls_basic sch_ingress xt_iprange xt_multiport xt_pkttype xt_state 
>> veth openvswitch gre vxlan ip_tunnel xt_owner xt_conntrack iptable_mangle 
>> xt_nat iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat xt_CT 
>> nf_conntrack iptable_raw ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm 
>> ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core ext2 dm_thin_pool 
>> dm_bio_prison dm_persistent_data dm_bufio dm_mirror dm_region_hash dm_log 
>> i2c_i801 lpc_ich mfd_core ioapic ioatdma igb dca ipmi_devintf ipmi_si 
>> ipmi_msghandler megaraid_sas
>> [8003509.806801] CPU: 28 PID: 12584 Comm: handler42 Not tainted 
>> 3.12.28-clouder7 #1
>> [8003509.807026] Hardware name: Supermicro 
>> X9DRD-7LN4F(-JBOD)/X9DRD-EF/X9DRD-7LN4F, BIOS 3.2 01/16/2015
>> [8003509.807465]  0000000000000cf6 ffff883fcfc75858 ffffffff815717b9 
>> 0000000000000cf6
>> [8003509.807699]  0000000000000000 ffff883fcfc75898 ffffffff810786c2 
>> ffff883fcfc75960
>> [8003509.807932]  ffffea005264d5c0 ffff883fcfc759b0 0000000000000000 
>> 0000000000000008
>> [8003509.808163] Call Trace:
>> [8003509.808385]  [<ffffffff815717b9>] dump_stack+0x49/0x60
>> [8003509.808611]  [<ffffffff810786c2>] warn_slowpath_common+0x82/0xb0
>> [8003509.808836]  [<ffffffff81078705>] warn_slowpath_null+0x15/0x20
>> [8003509.809062]  [<ffffffff8113a0dd>] ksize+0xbd/0xc0
>> [8003509.809290]  [<ffffffffa0290390>] reserve_sfa_size+0x30/0xe0 
>> [openvswitch]
>> [8003509.809518]  [<ffffffffa02907a3>] validate_and_copy_actions+0x1e3/0x530 
>> [openvswitch]
>> [8003509.809748]  [<ffffffff8113c1f1>] ? __kmalloc+0x31/0x190
>> [8003509.809973]  [<ffffffffa0290cc3>] ovs_packet_cmd_execute+0x1a3/0x230 
>> [openvswitch]
>> [8003509.810202]  [<ffffffff814e8d61>] genl_family_rcv_msg+0x221/0x390
>> [8003509.810432]  [<ffffffff814e8ed0>] ? genl_family_rcv_msg+0x390/0x390
>> [8003509.810660]  [<ffffffff814e8f2b>] genl_rcv_msg+0x5b/0xa0
>> [8003509.810885]  [<ffffffff814e72b9>] netlink_rcv_skb+0x99/0xc0
>> [8003509.811110]  [<ffffffff814e87d7>] genl_rcv+0x27/0x40
>> [8003509.811335]  [<ffffffff814e634f>] netlink_unicast+0x10f/0x190
>> [8003509.811563]  [<ffffffff814e7cf3>] netlink_sendmsg+0x2c3/0x750
>> [8003509.811788]  [<ffffffff811666e0>] ? __pollwait+0xf0/0xf0
>> [8003509.812014]  [<ffffffff814a620b>] sock_sendmsg+0x8b/0xb0
>> [8003509.812239]  [<ffffffff811666e0>] ? __pollwait+0xf0/0xf0
>> [8003509.812465]  [<ffffffff8157583b>] ? _raw_spin_lock_bh+0x1b/0x40
>> [8003509.812689]  [<ffffffff81575700>] ? _raw_spin_unlock_bh+0x10/0x20
>> [8003509.812916]  [<ffffffff814b3091>] ? verify_iovec+0x61/0xe0
>> [8003509.813143]  [<ffffffff814a70d1>] ___sys_sendmsg+0x411/0x430
>> [8003509.813373]  [<ffffffff81193ca9>] ? ep_scan_ready_list+0x189/0x1b0
>> [8003509.813600]  [<ffffffff81193e2f>] ? ep_poll+0x13f/0x370
>> [8003509.813825]  [<ffffffff81089c4b>] ? k_getrusage+0x13b/0x380
>> [8003509.814047]  [<ffffffff814a72f4>] __sys_sendmsg+0x44/0x80
>> [8003509.814268]  [<ffffffff814a7344>] SyS_sendmsg+0x14/0x20
>> [8003509.819684]  [<ffffffff815763a2>] system_call_fastpath+0x16/0x1b
>> [8003509.819910] ---[ end trace af1b9000d8cc8f35 ]---
>>
>> And almost instantly after that:
>>
>> [8003509.820257] ------------[ cut here ]------------
>> [8003509.820480] kernel BUG at mm/slub.c:3338!
>> [8003509.820699] invalid opcode: 0000 [#1] SMP
>> [8003509.820922] Modules linked in: xt_REDIRECT tcp_diag inet_diag 
>> act_police cls_basic sch_ingress xt_iprange xt_multiport xt_pkttype xt_state 
>> veth openvswitch gre vxlan ip_tunnel xt_owner xt_conntrack iptable_mangle 
>> xt_nat iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat xt_CT 
>> nf_conntrack iptable_raw ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm 
>> ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core ext2 dm_thin_pool 
>> dm_bio_prison dm_persistent_data dm_bufio dm_mirror dm_region_hash dm_log 
>> i2c_i801 lpc_ich mfd_core ioapic ioatdma igb dca ipmi_devintf ipmi_si 
>> ipmi_msghandler megaraid_sas
>> [8003509.822840] CPU: 28 PID: 12584 Comm: handler42 Tainted: G        W    
>> 3.12.28-clouder7 #1
>> [8003509.823068] Hardware name: Supermicro 
>> X9DRD-7LN4F(-JBOD)/X9DRD-EF/X9DRD-7LN4F, BIOS 3.2 01/16/2015
>> [8003509.823514] task: ffff883fce76dac0 ti: ffff883fcfc74000 task.ti: 
>> ffff883fcfc74000
>> [8003509.823739] RIP: 0010:[<ffffffff8113a9b2>]  [<ffffffff8113a9b2>] 
>> kfree+0xe2/0xf0
>> [8003509.823970] RSP: 0018:ffff883fcfc75938  EFLAGS: 00010246
>> [8003509.824194] RAX: 02fc000000000824 RBX: ffff880261eb4cf0 RCX: 
>> ffff883fce76dac0
>> [8003509.824423] RDX: 0000000000402140 RSI: 0000000000000000 RDI: 
>> ffff8814993577e0
>> [8003509.824647] RBP: ffff883fcfc75948 R08: ffff881fffc91640 R09: 
>> ffffea005264d5c0
>> [8003509.824874] R10: ffffffff814b04ca R11: 0000000000000000 R12: 
>> 0000000000000000
>> [8003509.825098] R13: ffff880261eb4cf0 R14: ffff880261eb4d28 R15: 
>> ffff8818a080ec00
>> [8003509.825327] FS:  00007fa9c9ffb700(0000) GS:ffff881fffc80000(0000) 
>> knlGS:0000000000000000
>> [8003509.825556] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [8003509.825780] CR2: ffffffffff600400 CR3: 0000003fce560000 CR4: 
>> 00000000001407e0
>> [8003509.826007] Stack:
>> [8003509.826225]  ffff883fcfc75978 ffff880261eb4cf0 ffff883fcfc75968 
>> ffffffffa0296c98
>> [8003509.826459]  ffff883fcfc75978 ffff880261eb4cf0 ffff883fcfc75988 
>> ffffffffa0296ce9
>> [8003509.826693]  ffff8818a080ff00 ffff883fce2f3300 ffff883fcfc759e8 
>> ffffffffa0290d2c
>> [8003509.826926] Call Trace:
>> [8003509.827150]  [<ffffffffa0296c98>] __flow_free+0x18/0x30 [openvswitch]
>> [8003509.827377]  [<ffffffffa0296ce9>] ovs_flow_free+0x39/0x70 [openvswitch]
>> [8003509.827609]  [<ffffffffa0290d2c>] ovs_packet_cmd_execute+0x20c/0x230 
>> [openvswitch]
>> [8003509.827842]  [<ffffffff814e8d61>] genl_family_rcv_msg+0x221/0x390
>> [8003509.828067]  [<ffffffff814e8ed0>] ? genl_family_rcv_msg+0x390/0x390
>> [8003509.828288]  [<ffffffff814e8f2b>] genl_rcv_msg+0x5b/0xa0
>> [8003509.828508]  [<ffffffff814e72b9>] netlink_rcv_skb+0x99/0xc0
>> [8003509.828729]  [<ffffffff814e87d7>] genl_rcv+0x27/0x40
>> [8003509.828949]  [<ffffffff814e634f>] netlink_unicast+0x10f/0x190
>> [8003509.829170]  [<ffffffff814e7cf3>] netlink_sendmsg+0x2c3/0x750
>> [8003509.829393]  [<ffffffff811666e0>] ? __pollwait+0xf0/0xf0
>> [8003509.829615]  [<ffffffff814a620b>] sock_sendmsg+0x8b/0xb0
>> [8003509.829837]  [<ffffffff811666e0>] ? __pollwait+0xf0/0xf0
>> [8003509.830063]  [<ffffffff8157583b>] ? _raw_spin_lock_bh+0x1b/0x40
>> [8003509.830289]  [<ffffffff81575700>] ? _raw_spin_unlock_bh+0x10/0x20
>> [8003509.830517]  [<ffffffff814b3091>] ? verify_iovec+0x61/0xe0
>> [8003509.830742]  [<ffffffff814a70d1>] ___sys_sendmsg+0x411/0x430
>> [8003509.830971]  [<ffffffff81193ca9>] ? ep_scan_ready_list+0x189/0x1b0
>> [8003509.831199]  [<ffffffff81193e2f>] ? ep_poll+0x13f/0x370
>> [8003509.831426]  [<ffffffff81089c4b>] ? k_getrusage+0x13b/0x380
>> [8003509.831649]  [<ffffffff814a72f4>] __sys_sendmsg+0x44/0x80
>> [8003509.831876]  [<ffffffff814a7344>] SyS_sendmsg+0x14/0x20
>> [8003509.832102]  [<ffffffff815763a2>] system_call_fastpath+0x16/0x1b
>> [8003509.832327] Code: ce 4c 89 d7 e8 f0 fc ff ff eb d6 66 41 f7 01 00 c0 74 
>> 18 49 8b 01 31 f6 f6 c4 40 74 04 41 8b 71 68 4c 89 cf e8 f0 3e fc ff eb b6 
>> <0f> 0b eb fe 66 2e 0f 1f 84 00 00 00 00 00 55 8b 05 79 33 ab 00
>> [8003509.833151] RIP  [<ffffffff8113a9b2>] kfree+0xe2/0xf0
>> [8003509.833380]  RSP <ffff883fcfc75938>
>> [8003509.833972] ---[ end trace af1b9000d8cc8f36 ]---
>>
>> It goes without saying that when this occurs the machine loses connectivity 
>> on all interface which are hooked up in an openvswitch bridge and ultimately 
>> the machine crashes. I did some review of the surrounding code, particularly 
>> in the
>>
>> ovs_packet_cmd_execute->validate_and_copy_actions -> copy_action -> 
>> reserve_sfa_size -> ksize call chain and it is pretty clear that the sfa 
>> pointer is indeed allocated from the heap (it's the acts in 
>> ovs_packet_cmd_execute, which is allocated in ovs_flow_actions_alloc), yet 
>> the sanity check in the kernel fails to see if this is indeed a slab 
>> allocated memory. Same thing with the second bug splat. The flow is being 
>> allocated with ovs_flow_alloc() and by the time it has to be freed it is 
>> corrupted. I can see there are a lot of calculations happening with the 
>> netlink headers and packets and frankly this is making me a bit uneasy as it 
>> seems error prone.
>>
>> So does any of you have experienced similar corruptions?
>>
>
> I have not seen such bug report before. Is this reproducible?
_______________________________________________
discuss mailing list
discuss@openvswitch.org
http://openvswitch.org/mailman/listinfo/discuss

Reply via email to