Attached topology Vào Th 2, 19 thg 5, 2025 vào lúc 17:19 Q Kay <tqkhang...@gmail.com> đã viết:
> Dear OVN Team, > > I would like to report an issue observed with OVN networking related to > asymmetric routing. The problem occurs when using instances to transit > traffic between two routed logical switch, and appears to be caused by OVN > connection tracking, which I wish to bypass for stateless forwarding. > > Environment Information > > - OVN Version: 24.03.2 (same issue observed on 24.09). > - Port security disabled. > > Issue Description > > I have two instances, each with a loopback IP configured (5.5.5.5 on > Instance A and 6.6.6.6 on Instance B), deployed on different compute nodes > (Compute 1 and Compute 2 respectively). The instances are connected to two > different networks (10.10.10.0/24 and 10.10.20.0/24). > I have configured static routes on both instances as follows: > > - Instance A: Route 6.6.6.6/32 via 10.10.10.218 > - Instance B: Route 5.5.5.5/32 via 10.10.20.41 > > > Topology is in attached file below. > Expected Behavior > I should be able to communicate using ICMP. between the two endpoint IPs > (5.5.5.5 and 6.6.6.6) with the routing path as configured above. > ICMP: > > - On Instance A: ping 6.6.6.6 -I 5.5.5.5 (using 5.5.5.5 as source IP) > => should succeed > - On Instance B: ping 5.5.5.5 -I 6.6.6.6 (using 6.6.6.6 as source IP) > => should succeed > > > Actual Behavior > When attempting to ping between these loopback IPs, I observe that traffic > only works in one direction: > > - On Instance A: ping 6.6.6.6 -I 5.5.5.5 (using 5.5.5.5 as source IP) > => fails > - On Instance B: ping 5.5.5.5 -I 6.6.6.6 (using 6.6.6.6 as source IP) > => succeeds > > > Despite disabling port security and ensuring necessary routes are > configured, the asymmetric routing scenario still fails in one direction in > ICMP, and both failed in TCP. I have verified that packet handling at the > instance level is working correctly (confirmed with tcpdump at the tap > port). > I've tried moving both instances to a single compute node, but the same > issue still occurs. > Troubleshooting Steps1. Reversed routing direction: > > - On Instance A: route 6.6.6.6/32 via 10.10.10.78 > - On Instance B: route 5.5.5.5/32 via 10.10.20.102 => Result: Ping > from A to B succeeds, from B to A fails (opposite of initial results) > > 2. Using OVN trace: > ovn-trace --no-leader-only 70974da0-2e9d-469a-9782-455a0380ab95 'inport == > "319cd637-10fb-4b45-9708-d02beefd698a" && eth.src==fa:16:3e:ea:67:18 && > eth.dst==fa:16:3e:04:28:c7 && ip4.src==6.6.6.6 && ip4.dst==5.5.5.5 && > ip.proto==1 && ip.ttl==64' > > *Output*: > ingress(dp="A", inport="319cd6") 0. ls_in_check_port_sec: priority 50 > reg0[15] = check_in_port_sec(); next; 2. ls_in_lookup_fdb: inport == > "319cd6", priority 100 reg0[11] = lookup_fdb(inport, eth.src); next; 27. > ls_in_l2_lkup: eth.dst == fa:16:3e:04:28:c7, priority 50 outport = > "869b33"; output; > egress(dp="A", inport="319cd6", outport="869b33") 9. > ls_out_check_port_sec: priority 0 reg0[15] = check_out_port_sec(); next; > 10. ls_out_apply_port_sec: priority 0 output; /* output to "869b33" */ > > 3. Examining recirculation to identify where my flow is being dropped > *For successful ping flow: 5.5.5.5 -> 6.6.6.6* > *- On Compute 1 (containing source instance): * > > 'recirc_id(0x3d71),in_port(28),ct_state(+new-est-rel-rpl-inv+trk),ct_mark(0/0x1),eth(src=fa:16:3e:81:ed:92,dst=fa:16:3e:72:fd:e5),eth_type(0x0800),ipv4(src= > 4.0.0.0/252.0.0.0,dst=0.0.0.0/248.0.0.0,proto=1,tos=0/0x3,frag=no), > packets:55, bytes:5390, used:0.205s, > actions:ct(commit,zone=87,mark=0/0x1,nat(src)),set(tunnel(tun_id=0x6,dst=10.10.10.85,ttl=64,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0x50006}),flags(df|csum|key))),9' > > 'recirc_id(0),in_port(28),eth(src=fa:16:3e:81:ed:92,dst=fa:16:3e:72:fd:e5),eth_type(0x0800),ipv4(proto=1,frag=no), > packets:55, bytes:5390, used:0.205s, actions:ct(zone=87),recirc(0x3d71)' > > 'recirc_id(0),tunnel(tun_id=0x2,src=10.10.10.85,dst=10.10.10.84,geneve({class=0x102,type=0x80,len=4,0xb000a/0x7fffffff}),flags(-df+csum+key)),in_port(9),eth(src=fa:16:3e:ea:67:18,dst=00:00:00:00:00:00/01:00:00:00:00:00),eth_type(0x0800),ipv4(proto=1,frag=no),icmp(type=0/0xfe), > packets:55, bytes:5390, used:0.204s, actions:29' > > *- On Compute 2: * > 'recirc_id(0),tunnel(tun_id=0x6,src=10.10.10.84,dst=10.10.10.85,geneve({class=0x102,type=0x80,len=4,0x50006/0x7fffffff}),flags(-df+csum+key)),in_port(10),eth(src=fa:16:3e:81:ed:92,dst=00:00:00:00:00:00/01:00:00:00:00:00),eth_type(0x0800),ipv4(proto=1,frag=no),icmp(type=8/0xf8), > packets:193, bytes:18914, used:0.009s, actions:ct(zone=53),recirc(0x1791e)' > > 'recirc_id(0x1791e),tunnel(tun_id=0x6,src=10.10.10.84,dst=10.10.10.85,geneve({}{}),flags(-df+csum+key)),in_port(10),ct_state(+new-est-rel-rpl-inv+trk),ct_mark(0/0x1),eth(src=fa:16:3e:81:ed:92,dst=00:00:00:00:00:00/01:00:00:00:00:00),eth_type(0x0800),ipv4(frag=no), > packets:193, bytes:18914, used:0.009s, > actions:ct(commit,zone=53,mark=0/0x1,nat(src)),23' > > 'recirc_id(0),in_port(21),eth(src=fa:16:3e:ea:67:18,dst=fa:16:3e:04:28:c7),eth_type(0x0800),ipv4(src=6.6.6.6,dst=5.5.5.5,proto=1,tos=0/0x3,frag=no), > packets:193, bytes:18914, used:0.008s, > actions:set(tunnel(tun_id=0x2,dst=10.10.10.84,ttl=64,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0xb000a}),flags(df|csum|key))),10' > > > *For failed ping flow: 6.6.6.6 -> 5.5.5.5* > *- On Compute 2 (containing source instance): * > 'recirc_id(0),in_port(21),eth(src=fa:16:3e:ea:67:18,dst=fa:16:3e:04:28:c7),eth_type(0x0800),ipv4(src=6.6.6.6,dst=5.5.5.5,proto=1,tos=0/0x3,frag=no), > packets:5, bytes:490, used:0.728s, > actions:set(tunnel(tun_id=0x2,dst=10.10.10.84,ttl=64,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0xb000a}),flags(df|csum|key))),10' > > *- On Compute 1: * > 'recirc_id(0),tunnel(tun_id=0x2,src=10.10.10.85,dst=10.10.10.84,geneve({class=0x102,type=0x80,len=4,0xb000a/0x7fffffff}),flags(-df+csum+key)),in_port(9),eth(src=fa:16:3e:ea:67:18,dst=00:00:00:00:00:00/01:00:00:00:00:00),eth_type(0x0800),ipv4(proto=1,frag=no),icmp(type=8/0xf8), > packets:48, bytes:4704, used:0.940s, actions:29' > > 'recirc_id(0),in_port(28),eth(src=fa:16:3e:81:ed:92,dst=fa:16:3e:72:fd:e5),eth_type(0x0800),ipv4(proto=1,frag=no), > packets:48, bytes:4704, used:0.940s, actions:ct(zone=87),recirc(0x3d77)' > > 'recirc_id(0x3d77),in_port(28),ct_state(-new-est-rel-rpl+inv+trk),ct_mark(0/0x1),eth(),eth_type(0x0800),ipv4(frag=no), > packets:48, bytes:4704, used:0.940s, actions:drop' > > Observations > I've noticed that packet handling at the compute nodes is not consistent. > My hypothesis is that the handling of ct_state flags is causing the return > traffic to be dropped. This may be because the outgoing and return > connections do not share the same logical_switch datapath. > The critical evidence is in the failed flow, where we see: > 'recirc_id(0x3d77),in_port(28),ct_state(-new-est-rel-rpl+inv+trk),ct_mark(0/0x1),eth(),eth_type(0x0800),ipv4(frag=no), > packets:48, bytes:4704, used:0.940s, actions:drop' > The packet is being marked as invalid (+inv) and subsequently dropped. > Impact > This unexplained packet drop significantly impacts my service when I use > instances for transit purpose in OVN environment. Although I have disabled > port security to use stateless mode, the behavior is not as expected. > Request for Clarification > Based on the situation described above, I have the following questions: > > 1. Is the packet drop behavior described above consistent with OVN's > design? > 2. If this is the expected behavior of OVN, please explain why packets > are being dropped. > 3. If this is not the expected behavior, could you confirm whether > this is a bug that will be fixed in the future? > > > I can provide additional information as needed. Please let me know if you > require any further details. > Thank you very much for your time and support. I greatly appreciate your > guidance to better understand OVN's behavior design here. > > > Best regards, > Ice Bear > >
_______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss