On Thu, Mar 21, 2024 at 07:45:21PM +0100, Ilya Maximets via discuss wrote: > On 3/21/24 08:48, Qiang Qiang45 Zhang via discuss wrote: > > Problem: > > > > When the Geneve tunnel is the Bond port, TCP traffic between virtual > > machines > > is not connected, and PING is allowed; When the Geneve tunnel is a single > > physical port, both TCP and PING are normal. > > > > Version: > > > > ovs-vswitchd (Open vSwitch) 3.3.0 > > DPDK 23.11.0 > > > > Main configuration: > > > > ovs-vsctl set op . other_config:userspace-tso-enable=true > > ovs-vsctl add-br int-br -- set bridge int-br datapath_type=netdev > > ovs-vsctl add-port int-br geneve0 -- set interface geneve0 type=geneve -- > > set interface geneve0 options:csum=false -- set interface geneve0 > > options:key=flow -- set interface geneve0 options:remote_ip="192.168.54.183" > > ovs-vsctl add-port int-br vm0 -- set interface vm0 type=dpdkvhostuserclient > > ovs-ofctl show int-br > > ovs-ofctl del-flows int-br > > ovs-ofctl add-flow int-br 'table=0,in_port=2,actions=output:1' > > ovs-ofctl add-flow int-br 'table=0,in_port=1,actions=output:2' > > > > > > > > Topology: > > > > > Key points for investigation: Comparison between ordinary physical and BOND > > tunnel entrances > > > > > The tunnel entrance is a single physical entrance: > > > > The tunnel entrance is a bond: > > > > The key issue is that there is an additional redirect (0x1) for the bond, > > which > > will cause the corresponding flag to be filled with outer Geneve > > information when > > setting mbuf flags. The actual GDB stack for virtual machine debugging is > > as follows > > (the inner layer is SSH TCP traffic, but it has set the csum_udp > > identifier. This > > analysis code reads the outer layer header, which is the UDP header of > > Geneve) > > Thanks for the report and analysis! > > I agree, this is indeed a bug in recirculation + tunnel offload logic. > > To recap: > > 1. TCP packet enters OVS. > 2. TCP packet gets encapsulated into UDP tunnel. > 3. Recirculation happens. > 4. Packet is re-parsed after recirculation with miniflow_extract() > or similar function. > 5. Packet is marked for UDP checksumming because we parse the outermost set > of headers. But since it is tunneled, it means inner UDP checksumming. > And that makes no sense, because the inner packet is TCP.
Hi everyone, we are actually seeing a similar issue, allthough in our case we are crashing ovs with the following assertion when sending pings: util(pmd-c07/id:9)|EMER|lib/dp-packet.h:1425: assertion ip failed in dp_packet_ip_set_header_csum() or a segmentation fault when sending tcp traffic (because packet_tcp_complete_csum has not asserts) > > I'm assuming that this re-parsing is also messing up all the offsets in the > dp-packet. We likely need fix all the checksums before recirculation and > clear all the flags. For TSO however that would mean that we will need to > drop the packet as we don't have GSO yet and we can't actually segment > traffic in the middle of pipeline anyway... So miniflow_extract resets inner_l3_ofs and inner_l4_ofs quite at the start of the function and has no logic to add these values back (at least as far as i saw that). > > Mike, any thoughts on this? > > > A workaround for this particular situation hwoever would be enabling support > for lb-output action on the bond: > > ovs-vsctl set Port bond0 other_config:lb-output-action=true This helps at least for our issue. > > This should avoid the recirculation in this particular scenario. > But we still need to find a solution for the recirculation issue. I currently have no real idea how we could recover inner_l3_ofs for geneve/vxlan packets. Since for UDP traffic we do not know the upper protocol it is basically impossible to find out for sure if a given communication is geneve/vxlan. Therefor recovering inner_l3_ofs for a given packet sounds to me either like guessing or not really possible. An alternative i have though about is not resetting the offsets in miniflow_extract for packets that are recirculating. In this case we should have started with the original packet at some point and only modified it by actions we defined. If all of these actions modify the offsets as appropriate is there any reason at all to recalculate them? The really ugly alternative that is probably also rather guesswork would be: <<< start of miniflow_extract tun_l3_ofs = inner_l3_ofs - l4_ofs; tun_l4_ofs = inner_l4_ofs - inner_l3_ofs; <<< main miniflow_extract code packet->inner_l3_ofs = packet->l4_ofs + tun_l3_ofs; packet->inner_l4_ofs = packet->inner_l3_ofs + tun_l4_ofs; <<< end of miniflow_extract But i do not think that is a good idea. Thanks Felix _______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss