Hi, We encountered a problem of bursts of packets missing the megaflow cache and going to slowpath when these packets have the ECN CE (congestion experienced) bit set to 1. While the purpose of the ECN CE bit is to signal a congestion experienced by the router/switch and hoping the receiving side application can react by notifying the sender to slow down, the CE bit change in the header actually results in slowpath handling on the receiving side, which may make the congestion worst for a short time because it slows down the receiving side handling for the burst of the CE-bit-set packets.
The reason why the ECN CE bit change leading to megaflow cache miss is because for any traffic that have non-zero TOS (including DSCP and ECN) field in the tunnel header triggers megaflow generation with exact match in the TOS field in the tunnel header, for example: skb_priority(0/0),tunnel(tun_id=0x231,src=10.2.0.152,dst=10.2.64.11,tos=0x8a,...,ipv4(src=10.194.120.8,dst=10.194.168.251,proto=17,tos=0/0,ttl=0/0,frag=no),..., actions:73 Here for the tos=0x8a, the last 2 bits are ECN: 10 - means ECN-Capable (ECT) So far all packets of the flow hit the megaflow cache. Now when congestion is detected by the uplink router, the router set the last bit to 1, and the packets doesn't match this existing megaflow and go through slowpath and generates a new megaflow: skb_priority(0/0),tunnel(tun_id=0x231,src=10.2.0.152,dst=10.2.64.11,tos=0x8b,...,ipv4(src=10.194.120.8,dst=10.194.168.251,proto=17,tos=0x3/0x3,ttl=0/0,frag=no),..., actions:73 Here for the tos=0x8b, the last 2 bits are ECN: 11 - means Congestion Experienced (CE) In our case the packet rate is very high during the congestion and before the new megaflow is generated there are already thousands of packets going to slowpath. The root cause of the issue is that OVS handles the ECN bits implicitly (not configurable/controllable by OpenFlow rules). It seems it unwildcards those bits in the match just to detect the CE bit change so that it can replicate the CE bit to the inner header. From the tcpdump on both tunnel interface and the VM interface, we confirmed that the OVS indeed set the CE bit of the inner IP header according to the CE bit of the tunnel outer IP header. We believe this is a desired behavior because otherwise the congestion control signalling won't work when the router only sees and updates the outer header, while the application/protocol layer that handles the congestion control bit is on the overlay logical network which can only see the inner header. I didn't find any detailed documentation for this behavior, except that in https://www.openvswitch.org/support/dist-docs/ovs-vswitchd.conf.db.5.txt it is briefly mentioned as a side note in the description of the options:tos for tunnel configuration. The code that updates the inner header ECN is: https://github.com/openvswitch/ovs/blob/main/ofproto/tunnel.c#L357 We can also see the code initialized the tunnel TOS field to all 1 (unwildcard): https://github.com/openvswitch/ovs/blob/main/ofproto/tunnel.c#L378 We did see megaflows with TOS field wildcarded (0/0) for traffic that has TOS value 0, but didn't find the corresponding code that wildcarded it. We believe in the fastpath (e.g. OVS kernel module) implementation it must have followed the same implicit logic, because as we can see the action part of the megaflow doesn't have any actions that modifies the CE bit. So now the question is, how to avoid the exact match for those bits in the megaflow, while still being able to satisfy the requirement of ECN handling for tunneled packets. It would be good to see a more detailed explanation of the behavior and its original requirement. While I didn't find any such details other than the above pieces of document and code, I guess the requirement is just to replicate the CE bit from the outer header to the inner header, i.e. set the inner CE bit to 1 if the CE bit in the outer header is 1. If that's the case, then we wonder if we could just always wildcard these bits and always do the same implicit handling for tunneled packets, which would solve the megaflow cache miss while still satisfying the ECN handling requirement? Or, is there any other reason we do exact match for these bits? Any information or suggestions are highly appreciated. Best regards, Han
_______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss