On Fri, May 2, 2025 at 3:10 AM Ilya Maximets <i.maxim...@ovn.org> wrote:
>
> On 5/1/25 12:02 AM, Han Zhou wrote:
> > Hi,
> >
> > We encountered a problem of bursts of packets missing the megaflow
cache and going to slowpath when these packets have the ECN CE (congestion
experienced) bit set to 1. While the purpose of the ECN CE bit is to signal
a congestion experienced by the router/switch and hoping the receiving side
application can react by notifying the sender to slow down, the CE bit
change in the header actually results in slowpath handling on the receiving
side, which may make the congestion worst for a short time because it slows
down the receiving side handling for the burst of the CE-bit-set packets.
> >
> > The reason why the ECN CE bit change leading to megaflow cache miss is
because for any traffic that have non-zero TOS (including DSCP and ECN)
field in the tunnel header triggers megaflow generation with exact match in
the TOS field in the tunnel header, for example:
> >
> >
skb_priority(0/0),tunnel(tun_id=0x231,src=10.2.0.152,dst=10.2.64.11,tos=0x8a,...,ipv4(src=10.194.120.8,dst=10.194.168.251,proto=17,tos=0/0,ttl=0/0,frag=no),...,
actions:73
> >
> > Here for the tos=0x8a, the last 2 bits are ECN: 10 - means ECN-Capable
(ECT)
> > So far all packets of the flow hit the megaflow cache. Now when
congestion is detected by the uplink router, the router set the last bit to
1, and the packets doesn't match this existing megaflow and go through
slowpath and generates a new megaflow:
> >
> >
skb_priority(0/0),tunnel(tun_id=0x231,src=10.2.0.152,dst=10.2.64.11,tos=0x8b,...,ipv4(src=10.194.120.8,dst=10.194.168.251,proto=17,tos=0x3/0x3,ttl=0/0,frag=no),...,
actions:73
> >
> > Here for the tos=0x8b, the last 2 bits are ECN: 11 - means Congestion
Experienced (CE)
> > In our case the packet rate is very high during the congestion and
before the new megaflow is generated there are already thousands of packets
going to slowpath.
> >
> > The root cause of the issue is that OVS handles the ECN bits implicitly
(not configurable/controllable by OpenFlow rules). It seems it unwildcards
those bits in the match just to detect the CE bit change so that it can
replicate the CE bit to the inner header. From the tcpdump on both tunnel
interface and the VM interface, we confirmed that the OVS indeed set the CE
bit of the inner IP header according to the CE bit of the tunnel outer IP
header. We believe this is a desired behavior because otherwise the
congestion control signalling won't work when the router only sees and
updates the outer header, while the application/protocol layer that handles
the congestion control bit is on the overlay logical network which can only
see the inner header.
> >
> > I didn't find any detailed documentation for this behavior, except that
in https://www.openvswitch.org/support/dist-docs/ovs-vswitchd.conf.db.5.txt
<https://www.openvswitch.org/support/dist-docs/ovs-vswitchd.conf.db.5.txt>
it is briefly mentioned as a side note in the description of the
options:tos for tunnel configuration.
> >
> > The code that updates the inner header ECN is:
https://github.com/openvswitch/ovs/blob/main/ofproto/tunnel.c#L357 <
https://github.com/openvswitch/ovs/blob/main/ofproto/tunnel.c#L357>
> >
> > We can also see the code initialized the tunnel TOS field to all 1
(unwildcard):
> > https://github.com/openvswitch/ovs/blob/main/ofproto/tunnel.c#L378 <
https://github.com/openvswitch/ovs/blob/main/ofproto/tunnel.c#L378>
> >
> > We did see megaflows with TOS field wildcarded (0/0) for traffic that
has TOS value 0, but didn't find the corresponding code that wildcarded it.
> >
> > We believe in the fastpath (e.g. OVS kernel module) implementation it
must have followed the same implicit logic, because as we can see the
action part of the megaflow doesn't have any actions that modifies the CE
bit.
> >
> > So now the question is, how to avoid the exact match for those bits in
the megaflow, while still being able to satisfy the requirement of ECN
handling for tunneled packets. It would be good to see a more detailed
explanation of the behavior and its original requirement. While I didn't
find any such details other than the above pieces of document and code, I
guess the requirement is just to replicate the CE bit from the outer header
to the inner header, i.e. set the inner CE bit to 1 if the CE bit in the
outer header is 1. If that's the case, then we wonder if we could just
always wildcard these bits and always do the same implicit handling for
tunneled packets, which would solve the megaflow cache miss while still
satisfying the ECN handling requirement? Or, is there any other reason we
do exact match for these bits?
>
> Hi, Han.
>
> That's an interesting issue.  The documentation you're looking for is RFC
6040
> that describes how the ECN bits should be passed around during
encap/decap process.
> Specifically:
>   https://www.rfc-editor.org/rfc/rfc6040#page-10
>
> Tunnel implementations in the linux kernel seem to support the same logic
from the
> RFC, so it might be possible to avoid exact matches on the ECN bits.
However, there
> is no real way for ofproto layer to detect if this is implemented in the
datapath or
> not.  Userspace datapath doesn't implement this, Windows datapath
doesn't, we use
> raw encap for rte_flow, so that will also likely not support this.  TC
offload I'm
> not sure about, I saw some references to ECN in the mlx driver, but I'm
not confident
> these are relevant.  Besides, implementing support for this in userspace
datapath may
> be problematic from the performance point of view as we'll need to add
extra parsing
> logic to the datapath and modify headers on per-packet basis.  Though
it's hard to
> tell what the performance impact will be.
>
> So, if we find a good way of detecting the datapath support, then it
might be a
> good improvement to avoid flow-based ECN handling when datapath supports
it.
> But it may not be worth implementing this handling in all the datapaths.

Hi Ilya, thanks for the information. For the datapaths that don't have this
ECN handling logic, how would the exact match help? As we can see in the
example I gave, for the packets that have the CE bit changed to 1, the
newly generated megaflow has the same datapath action, which doesn't
instruct any ECN processing. I thought the exact match was just useless,
only bringing the performance penalty. But now that you mentioned some
datapaths didn't implement this handling, does that mean the ECN handling
is just broken in those cases? Did I miss anything?

Thanks,
Han

>
> Best regards, Ilya Maximets.
>
> >
> > Any information or suggestions are highly appreciated.
> >
> > Best regards,
> > Han
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to