This patchset is optimizing the ICMP-reply code path, for ICMP packets that gets rate limited. A remote party can easily trigger this code path by sending packets to port number with no listening service.
Generally the patchset moves the sysctl_icmp_msgs_per_sec ratelimit checking to earlier in the code path and removes an allocation. Use-case: The specific case I experienced this being a bottleneck is, sending UDP packets to a port with no listener, which obviously result in kernel replying with ICMP Destination Unreachable (type:3), Port Unreachable (code:3), which cause the bottleneck. After Eric and Paolo optimized the UDP socket code, the kernels PPS processing capabilities is lower for no-listen ports, than normal UDP sockets. This is bad for capacity planning when restarting a service. UDP no-listen benchmark 8xCPUs using pktgen_sample04_many_flows.sh: Baseline: 6.6 Mpps Patch: 14.7 Mpps Driver mlx5 at 50Gbit/s. --- Jesper Dangaard Brouer (3): Revert "icmp: avoid allocating large struct on stack" net: reduce cycles spend on ICMP replies that gets rate limited net: for rate-limited ICMP replies save one atomic operation net/ipv4/icmp.c | 125 +++++++++++++++++++++++++++++++++---------------------- net/ipv6/icmp.c | 68 +++++++++++++++++++++--------- 2 files changed, 123 insertions(+), 70 deletions(-) --