On Mon, May 5, 2025 at 8:07 AM Han Zhou <hz...@ovn.org> wrote: > > > > On Mon, May 5, 2025 at 2:23 AM Ilya Maximets <i.maxim...@ovn.org> wrote: > > > > On 5/5/25 6:35 AM, Han Zhou wrote: > > > > > > > > > On Fri, May 2, 2025 at 3:10 AM Ilya Maximets <i.maxim...@ovn.org <mailto:i.maxim...@ovn.org>> wrote: > > >> > > >> On 5/1/25 12:02 AM, Han Zhou wrote: > > > > <snip> > > > > >> > > >> Hi, Han. > > >> > > >> That's an interesting issue. The documentation you're looking for is RFC 6040 > > >> that describes how the ECN bits should be passed around during encap/decap process. > > >> Specifically: > > >> https://www.rfc-editor.org/rfc/rfc6040#page-10 > > >> > > >> Tunnel implementations in the linux kernel seem to support the same logic from the > > >> RFC, so it might be possible to avoid exact matches on the ECN bits. However, there > > >> is no real way for ofproto layer to detect if this is implemented in the datapath or > > >> not. Userspace datapath doesn't implement this, Windows datapath doesn't, we use > > >> raw encap for rte_flow, so that will also likely not support this. TC offload I'm > > >> not sure about, I saw some references to ECN in the mlx driver, but I'm not confident > > >> these are relevant. Besides, implementing support for this in userspace datapath may > > >> be problematic from the performance point of view as we'll need to add extra parsing > > >> logic to the datapath and modify headers on per-packet basis. Though it's hard to > > >> tell what the performance impact will be. > > >> > > >> So, if we find a good way of detecting the datapath support, then it might be a > > >> good improvement to avoid flow-based ECN handling when datapath supports it. > > >> But it may not be worth implementing this handling in all the datapaths. > > > > > > Hi Ilya, thanks for the information. For the datapaths that don't have this ECN > > > handling logic, how would the exact match help? As we can see in the example I gave, > > > for the packets that have the CE bit changed to 1, the newly generated megaflow has > > > the same datapath action, which doesn't instruct any ECN processing. I thought the > > > exact match was just useless, only bringing the performance penalty. But now that > > > you mentioned some datapaths didn't implement this handling, does that mean the ECN > > > handling is just broken in those cases? Did I miss anything? > > > > Your inner header is Not-ECT (nw_ecn=0x0) and the outer is ECT(1) > > (nw_ecn=0x1), that corresponds to Not-ECT intersection in the table > > in the RFC after decap. > > > > In my case, both the outer and inner header have the ECT bit set, and then at some point when congestion happens the CE bit is set in the outer header when the packet is received at the receiving side, and OVS set the CE bit to the inner header implicitly by its ECN handling logic. If you look at the megaflow example I gave: > > - Before the congestion has the match for the inner header: tos=0/0, it is wildcarded. > - When congestion happens the match for the inner header became: tos=0x3/0x3 > > So it is not the invalid case you are referring to. >
I see what's happening. In my case, I am testing with the kernel datapath. Since the kernel datapath has the implicit logic that sets the inner header CE bit to 1 (i.e. set tos=0x3/0x3)) when the outer header has CE bit == 1, the packet is firstly modified by the kernel, and then goes to the megaflow cache, and it missed the cache and upcall to slowpath, and so the slowpath sees both the outer header and inner header have the CE bit set already, so it doesn't apply any change to the header, thus the newly generated megaflow doesn't include any datapath actions to change the tos. If I do a ofproto/trace with the outer header being 0x3/0x3 and inner header 0x2/0x3, pretending if the upcall received didn't change the inner header yet, then it ends up with actions: set(ipv4(tos=0x3/0x3)), ... So it behaves exactly as you mentioned, and it should work for the datapaths that don't implement the ECN handling logic. For your point: > > >> So, if we find a good way of detecting the datapath support, then it might be a > > >> good improvement to avoid flow-based ECN handling when datapath supports it. > > >> But it may not be worth implementing this handling in all the datapaths. I agree it is the right way to go. We will see if we can add this support, or we will need your help. Thanks again! Han > Best, > Han > > > """ > > * If the inner ECN field is Not-ECT and the outer ECN field is > > CE, the decapsulator MUST drop the packet. > > > > * If the inner ECN field is Not-ECT and the outer ECN field is > > Not-ECT, ECT(0), or ECT(1), the decapsulator MUST forward the > > outgoing packet with the ECN field cleared to Not-ECT. > > """ > > > > So, for the inner packet with nw_tos=0x0 there are only two options - > > not to update or drop. That's why you don't see any ECN-related > > datapath action in your case. But if you set nw_tos to 0x3 on the > > outer header, the packet must be dropped. And that's the reason for > > having the field in the match - different actions can be taken > > depending on the actual values. We have a corresponding test in the > > tunnel.at - "tunnel - ECN decapsulation". > > > > However, as you can see in the table in the RFC 6040, this intersection > > is marked with '(!!!)', which means the combination should not be > > possible in a normally functioning network: > > > > """ > > o Certain combinations of inner and outer ECN fields cannot result > > from any transition in any current or previous ECN tunneling > > specification. These currently unused (CU) combinations are > > indicated in Figure 4 by '(!!!)' or '(!)', where '(!!!)' means the > > combination is CU and always potentially dangerous, while '(!)' > > means it is CU and possibly dangerous. In these cases, > > particularly the more dangerous ones, the decapsulator SHOULD log > > the event and MAY also raise an alarm. > > """ > > > > So, the fact that the inner packet is marked as Not-ECN-Capable-Transport > > (Not-Ect, 0x0) and yet something in the network marks the outer header as > > Capable (ECT(1), 0x1) likely means that there is a malfunctioning box > > somewhere between the tunnel endpoints. This should not happen. Routers > > should only set full CE (0x3) indicators and only when the packet is > > marked as ECN-capable (0x10 or 0x01). > > > > Note: ECT(1) doesn't mean there is a congestion, it means that the > > transport is capable of reacting to congestion, and only the sender can > > decide to enable that. > > > > Note2: The "ECT" and "CE" bit names are obsolete names from RFC 2481. > > Starting with RFC 3168, bits are not recognized separately, instead we > > have 4 distinct states: Not-ECT, ECT(0), ECT(1), CE. ECT(1) combination > > had no meaning in obsolete RFCs, because it would mean congestion on a > > non-capable transport. In RFC 3168 onwards it just means a capable > > transport and practically similar to ECT(0). > > But switching from 00 to 01 by some network appliance in the middle > > didn't make sense in any of these RFCs, AFAICT. > > > > Best regards, Ilya Maximets. >
_______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss