On Fri, Oct 21, 2016 at 11:39 AM, Joe Stringer <j...@ovn.org> wrote:
> On 21 October 2016 at 11:37, Joe Stringer <j...@ovn.org> wrote:
>> On 21 October 2016 at 10:55, Pravin B Shelar <pshe...@ovn.org> wrote:
>>> On upstream kernel datapath OVS make use of networking devices
>>> where networking stack does check if device is UP. following
>>> patch adds same check in case of compat tunneling implementation.
>>> This check also fixes kernel crash in case vxlan device was
>>> brought down by user.
>>>
>>> CPU: 4 PID: 12988 Comm: handler903 Tainted:
>>> RIP: 0010:[<ffffffffa05e5407>] vxlan_xmit_one.constprop.50+0x47/0x1210 
>>> [openvswitch]
>>> Call Trace:
>>>  [<ffffffffa05e6625>] rpl_vxlan_xmit+0x55/0x80 [openvswitch]
>>>  [<ffffffffa05d5ad4>] ovs_vport_send+0x44/0xb0 [openvswitch]
>>>  [<ffffffffa05c62a5>] do_output+0x65/0x180 [openvswitch]
>>>  [<ffffffffa05c70dc>] do_execute_actions+0x10c/0x860 [openvswitch]
>>>  [<ffffffffa05c7870>] ovs_execute_actions+0x40/0x130 [openvswitch]
>>>  [<ffffffffa05cbb59>] ovs_packet_cmd_execute+0x2c9/0x2f0 [openvswitch]
>>>  [<ffffffff8155f31d>] genl_family_rcv_msg+0x1cd/0x400
>>>  [<ffffffff8155f5e1>] genl_rcv_msg+0x91/0xd0
>>>  [<ffffffff8155d549>] netlink_rcv_skb+0xa9/0xc0
>>>  [<ffffffff8155da78>] genl_rcv+0x28/0x40
>>>  [<ffffffff8155ceba>] netlink_unicast+0x16a/0x210
>>>  [<ffffffff8155d277>] netlink_sendmsg+0x317/0x430
>>>  [<ffffffff81514fd0>] sock_sendmsg+0xb0/0xf0
>>>  [<ffffffff81515409>] ___sys_sendmsg+0x3a9/0x3c0
>>>  [<ffffffff815162f1>] __sys_sendmsg+0x51/0x90
>>>  [<ffffffff81516342>] SyS_sendmsg+0x12/0x20
>>>  [<ffffffff81649609>] system_call_fastpath+0x16/0x1b
>>>
>>> Reported-by: Huanglili (lee) <huanglili.hu...@huawei.com>
>>> Signed-off-by: Pravin B Shelar <pshe...@ovn.org>
>>
>> I'm not 100% convinced that this solves the issue (or whether this
>> affects upstream as well), but it seems like this at least makes it
>> significantly less likely to crash - If I understand correctly,
>> setting the vxlan port down currently guarantees a crash, whereas this
>> patch would mean it either doesn't crash, or it only crashes based on
>> a race condition. Specifically, one where the reconfiguration gets
>> past the synchronize_net() into ndo_stop() (before clearing IFF_UP)
>> while OVS manages to execute all of a flow up until output and
>> proceeds past this check before the reconfiguration thread clears it.
>>
>> Probably the type of test to check this would be running heavy traffic
>> through a vxlan port, then toggle the port up and down.
>>
>> Will you also check upstream?
>>
>> Anyway, this LGTM so:
>> Acked-by: Joe Stringer <j...@ovn.org>
>
> Hmm, the original reporter said that the issue is not fixed. Perhaps
> we should hold off on this?

Yes, I would like to see another solution before merging this patch.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to