+ Damjan, since Dave might not be available From: Juraj Linkes -X (jlinkes - PANTHEON TECHNOLOGIES at Cisco) Sent: Tuesday, 17 January, 2017 11:51 To: Dave Barach (dbarach) <dbar...@cisco.com>; John Lo (loj) <l...@cisco.com>; vpp-dev@lists.fd.io Cc: Pierre Pfister (ppfister) <ppfis...@cisco.com> Subject: RE: VPP-556 - vpp crashing in an openstack odl stack
Hi Dave, John, I've tried building the latest 17.01 vpp (using "make V=0 PLATFORM=vpp TAG=vpp_debug install-rpm" - I understand that's where the TAG=vpp_debug is refereing to) and the issue is no longer present there, but there is something else - now vpp crashes when I delete a vhost-user port. I've looked at patches submitted for master that could solve this and I've found https://gerrit.fd.io/r/#/c/4619/, but that didn't help. I've attached post-mortem api traces and backtrace. Pierre, could you please look at it? I also have two other questions: * what's the difference between a regular image and an TAG=vpp_debug image? * I've tried configuring the core files in a number of different ways, but nothings seems to be working - the core files are just not being created. Is there a guide on how to set it up for CentOS7? For reference, here's one of the guides<https://www.unixmen.com/how-to-enable-core-dumps-in-rhel6/> that I used. And the last thing is that Honeycomb now should work with vpp 17.04, so I'm going to try that one as well. Thanks, Juraj From: Dave Barach (dbarach) Sent: Wednesday, 11 January, 2017 23:43 To: John Lo (loj) <l...@cisco.com<mailto:l...@cisco.com>>; Juraj Linkes -X (jlinkes - PANTHEON TECHNOLOGIES at Cisco) <jlin...@cisco.com<mailto:jlin...@cisco.com>>; vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> Subject: RE: VPP-556 - vpp crashing in an openstack odl stack +1... Hey John, thanks a lot for the detailed analysis... Dave From: John Lo (loj) Sent: Wednesday, January 11, 2017 5:40 PM To: Dave Barach (dbarach) <dbar...@cisco.com<mailto:dbar...@cisco.com>>; Juraj Linkes -X (jlinkes - PANTHEON TECHNOLOGIES at Cisco) <jlin...@cisco.com<mailto:jlin...@cisco.com>>; vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> Subject: RE: VPP-556 - vpp crashing in an openstack odl stack Hi Juraj, I looked at the custom-dump of the API trace and noticed this "interesting" sequence: SCRIPT: vxlan_add_del_tunnel src 192.168.11.22 dst 192.168.11.20 decap-next -1 vni 1 SCRIPT: sw_interface_set_flags sw_if_index 4 admin-up link-up SCRIPT: sw_interface_set_l2_bridge sw_if_index 4 bd_id 1 shg 1 enable SCRIPT: sw_interface_set_l2_bridge sw_if_index 2 disable SCRIPT: bridge_domain_add_del bd_id 1 del Any idea why BD1 is deleted while the VXLAN tunnel with sw_if_index still in the BD? May be this is what is causing the crash. From your vppctl output capture for "compute_that_crashed.txt", I do see BD 1 presen with vxlan_tunnel0 on it: [root@overcloud-novacompute-1 ~]# vppctl show bridge-domain ID Index Learning U-Forwrd UU-Flood Flooding ARP-Term BVI-Intf 0 0 off off off off off local0 1 1 on on on on off N/A [root@overcloud-novacompute-1 ~]# vppctl show bridge-domain 1 detail ID Index Learning U-Forwrd UU-Flood Flooding ARP-Term BVI-Intf 1 1 on on on on off N/A Interface Index SHG BVI TxFlood VLAN-Tag-Rewrite vxlan_tunnel0 3 1 - * none I did install a vpp 1701 image on my server and performed an api trace replay of your api_post_mortem. Thereafter, I do not see BD 1 present while vxlan_tunnel1 is still configured as in BD 1: DBGvpp# show bridge ID Index Learning U-Forwrd UU-Flood Flooding ARP-Term BVI-Intf 0 0 off off off off off local0 DBGvpp# sho vxlan tunnel [1] src 192.168.11.22 dst 192.168.11.20 vni 1 sw_if_index 4 encap_fib_index 0 fib_entry_index 12 decap_next l2 DBGvpp# sho int addr GigabitEthernet2/3/0 (dn): VirtualEthernet0/0/0 (up): local0 (dn): vxlan_tunnel0 (dn): vxlan_tunnel1 (up): l2 bridge bd_id 1 shg 1 DBGvpp# show int Name Idx State Counter Count GigabitEthernet2/3/0 1 down VirtualEthernet0/0/0 2 up local0 0 down vxlan_tunnel0 3 down vxlan_tunnel1 4 up DBGvpp# With system in this state, I can easily imaging a packet received by vxlan_tunnel1 and forwarded in a non-existing BD causes VPP crash. I will look into VPP code from this angle. In general, however, there is really no need to create and delete BDs on VPP. Adding an interface/tunnel to a BD will cause it to be created. Deleting a BD without removing all the ports in it can cause problems which may well be the cause here. If a BD is to be not used, all the ports on it should be removed. If a BD is to be reused, just add ports to it. As mentioned by Dave, please test using a known good image like 1701 and preferably built with debug enabled (with TAG-vpp_debug) so it is easier to find any issues. Regards, John From: Dave Barach (dbarach) Sent: Wednesday, January 11, 2017 9:01 AM To: Juraj Linkes -X (jlinkes - PANTHEON TECHNOLOGIES at Cisco) <jlin...@cisco.com<mailto:jlin...@cisco.com>>; vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>; John Lo (loj) <l...@cisco.com<mailto:l...@cisco.com>> Subject: RE: VPP-556 - vpp crashing in an openstack odl stack Dear Juraj, I took a look. It appears that the last operation in the post-mortem API trace was to kill a vxlan tunnel. Is there a reasonable chance that other interfaces in the bridge group containing the tunnel were still admin-up? Was the tunnel interface removed from the bridge group prior to killing it? The image involved is not stable/1701/LATEST. It's missing at least 20 fixes considered critical enough to justify merging them into the release throttle: [root@overcloud-novacompute-1 ~]# vppctl show version verbose Version: v17.01-rc0~242-gabd98b2~b1576 Compiled by: jenkins Compile host: centos-7-a8b Compile date: Mon Dec 12 18:55:56 UTC 2016 Please re-test with stable/1701/LATEST. Please use a TAG=vpp_debug image. If the problem is reproducible, we'll need a core file to make further progress. Copying John Lo ("Dr. Vxlan") for any further thoughts he might have... Thanks... Dave From: vpp-dev-boun...@lists.fd.io<mailto:vpp-dev-boun...@lists.fd.io> [mailto:vpp-dev-boun...@lists.fd.io] On Behalf Of Juraj Linkes -X (jlinkes - PANTHEON TECHNOLOGIES at Cisco) Sent: Wednesday, January 11, 2017 3:47 AM To: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> Subject: [vpp-dev] VPP-556 - vpp crashing in an openstack odl stack Hi vpp-dev, I just wanted to ask whether anyone has taken a look at VPP-556<https://jira.fd.io/browse/VPP-556>? There might not be enough logs, I collected just backtrace from gdb - if we need anything more, please give me a little bit of a guidance on what could help/how to get it. This is one the last few issues we're facing with the openstack odl scenario where we use vpp jsut for l2 and it's been there for a while. Thanks, Juraj
_______________________________________________ vpp-dev mailing list vpp-dev@lists.fd.io https://lists.fd.io/mailman/listinfo/vpp-dev