Hi,

We encountered a problem of bursts of packets missing the megaflow cache
and going to slowpath when these packets have the ECN CE (congestion
experienced) bit set to 1. While the purpose of the ECN CE bit is to signal
a congestion experienced by the router/switch and hoping the receiving side
application can react by notifying the sender to slow down, the CE bit
change in the header actually results in slowpath handling on the receiving
side, which may make the congestion worst for a short time because it slows
down the receiving side handling for the burst of the CE-bit-set packets.

The reason why the ECN CE bit change leading to megaflow cache miss is
because for any traffic that have non-zero TOS (including DSCP and ECN)
field in the tunnel header triggers megaflow generation with exact match in
the TOS field in the tunnel header, for example:

skb_priority(0/0),tunnel(tun_id=0x231,src=10.2.0.152,dst=10.2.64.11,tos=0x8a,...,ipv4(src=10.194.120.8,dst=10.194.168.251,proto=17,tos=0/0,ttl=0/0,frag=no),...,
actions:73

Here for the tos=0x8a, the last 2 bits are ECN: 10 - means ECN-Capable (ECT)
So far all packets of the flow hit the megaflow cache. Now when congestion
is detected by the uplink router, the router set the last bit to 1, and the
packets doesn't match this existing megaflow and go through slowpath and
generates a new megaflow:

skb_priority(0/0),tunnel(tun_id=0x231,src=10.2.0.152,dst=10.2.64.11,tos=0x8b,...,ipv4(src=10.194.120.8,dst=10.194.168.251,proto=17,tos=0x3/0x3,ttl=0/0,frag=no),...,
actions:73

Here for the tos=0x8b, the last 2 bits are ECN: 11 - means Congestion
Experienced (CE)
In our case the packet rate is very high during the congestion and before
the new megaflow is generated there are already thousands of packets going
to slowpath.

The root cause of the issue is that OVS handles the ECN bits implicitly
(not configurable/controllable by OpenFlow rules). It seems it unwildcards
those bits in the match just to detect the CE bit change so that it can
replicate the CE bit to the inner header. From the tcpdump on both tunnel
interface and the VM interface, we confirmed that the OVS indeed set the CE
bit of the inner IP header according to the CE bit of the tunnel outer IP
header. We believe this is a desired behavior because otherwise the
congestion control signalling won't work when the router only sees and
updates the outer header, while the application/protocol layer that handles
the congestion control bit is on the overlay logical network which can only
see the inner header.

I didn't find any detailed documentation for this behavior, except that in
https://www.openvswitch.org/support/dist-docs/ovs-vswitchd.conf.db.5.txt it
is briefly mentioned as a side note in the description of the options:tos
for tunnel configuration.

The code that updates the inner header ECN is:
https://github.com/openvswitch/ovs/blob/main/ofproto/tunnel.c#L357

We can also see the code initialized the tunnel TOS field to all 1
(unwildcard):
https://github.com/openvswitch/ovs/blob/main/ofproto/tunnel.c#L378

We did see megaflows with TOS field wildcarded (0/0) for traffic that has
TOS value 0, but didn't find the corresponding code that wildcarded it.

We believe in the fastpath (e.g. OVS kernel module) implementation it must
have followed the same implicit logic, because as we can see the action
part of the megaflow doesn't have any actions that modifies the CE bit.

So now the question is, how to avoid the exact match for those bits in the
megaflow, while still being able to satisfy the requirement of ECN handling
for tunneled packets. It would be good to see a more detailed explanation
of the behavior and its original requirement. While I didn't find any such
details other than the above pieces of document and code, I guess the
requirement is just to replicate the CE bit from the outer header to the
inner header, i.e. set the inner CE bit to 1 if the CE bit in the outer
header is 1. If that's the case, then we wonder if we could just always
wildcard these bits and always do the same implicit handling for tunneled
packets, which would solve the megaflow cache miss while still satisfying
the ECN handling requirement? Or, is there any other reason we do exact
match for these bits?

Any information or suggestions are highly appreciated.

Best regards,
Han
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to