Forcibly disabling RSS with the IPSec deferred patch seems to have fixed the issue. Given the wide ranging deleterious effects with RSS on vs. a bit of IPsec theoretical maximum bandwidth loss with it off, we'll take the bandwidth hit at the moment. ;)
Are there any significant concerns with running the patch for deferred IPSec input? From my analysis of the code, I think the absolute worst case might be a reordered packet or two, but that's always possible with IPSec over UDP transport AFAIK. ----- Original Message ----- > From: "Timothy Pearson" <tpear...@raptorengineeringinc.com> > To: "freebsd-net" <freebsd-net@FreeBSD.org> > Sent: Saturday, January 18, 2025 7:24:57 PM > Subject: Re: FreeBSD 13: IPSec netisr overload causes unrelated packet loss > Quick update --tried the IPSec deferred update patch [1], no change. > > A few tunables I forgot to include as well: > net.route.netisr_maxqlen: 256 > net.isr.numthreads: 32 > net.isr.maxprot: 16 > net.isr.defaultqlimit: 256 > net.isr.maxqlimit: 10240 > net.isr.bindthreads: 1 > net.isr.maxthreads: 32 > net.isr.dispatch: direct > > [1] https://www.mail-archive.com/freebsd-net@freebsd.org/msg64742.html > > ----- Original Message ----- >> From: "Timothy Pearson" <tpear...@raptorengineeringinc.com> >> To: "freebsd-net" <freebsd-net@FreeBSD.org> >> Sent: Saturday, January 18, 2025 4:16:29 PM >> Subject: FreeBSD 13: IPSec netisr overload causes unrelated packet loss > >> Hi all, >> >> I've been pulling my hair out over a rather interesting problem that I've >> traced >> into an interaction between IPSec and the rest of the network stack. I'm not >> sure if this is a bug or if there's a tunable I'm missing somewhere, so here >> goes... >> >> We have a pf-based multi-CPU firewall running FreeBSD 13.x with multiple >> subnets >> directly attached, one per NIC, as well as multiple IPSec tunnels to remote >> sites alongside a UDP multicast proxy system (this becomes important later). >> For the most part the setup works very well, however we have discovered >> through extensive trial and error / debugging that we can induce major packet >> loss on the firewall host itself by simply flooding the system with small >> IPSec >> packets (high PPS, low bandwidth). >> >> The aforementioned (custom) multicast UDP proxy is an excellent canary for >> the >> problem, as it checks for and reports any dropped packets in the receive data >> stream. Normally, there are no dropped packets even with saturated links on >> any of the local interfaces or when *sending* high packet rates over IPsec. >> As >> soon as high packet rates are *received* over IPsec, the following happens: >> >> 1.) netisr on one core only goes to 100% interrupt load >> 2.) net.inet.ip.intr_queue_drops starts incrementing rapidly >> 3.) The multicast receiver, which only receives traffic from one of the >> *local* >> interfaces (not any of the IPsec tunnels), begins to see packet loss despite >> more than adequate buffers in place with no buffer overflows in the UDP >> stack / >> application buffering. The packets are simply never received by the kernel >> UDP >> stack. >> 4.) Other applications (e.g. NTP) start to see sporadic packet loss as well, >> again on local traffic not over IPsec. >> >> As soon as the IPSec receive traffic is lowered enough to get the netisr >> interrupt load below 100% on the one CPU core, everything recovers and >> functions normally. Note this has to be done by lowering the IPSec transmit >> rate on the remote system, there is no way I have discovered to "protect" the >> receiver from this kind of overload. >> >> While I would expect packet loss in an overloaded IPSec link scenario like >> this >> just due to the decryption not keeping up, I would also expect that loss to >> be >> confined to the IPSec tunnel. It should not spider out into the rest of the >> system and start affecting all of the other applications and >> routing/firewalling on the box -- this is what was miserable to debug, as the >> IPSec link was originally only hitting the PPS limits described above >> sporadically during overnight batch processing. Now that I know what's going >> on, I can provoke easily with iperf3 in UDP mode. On the boxes we are using, >> the limit seems to be around 50kPPS before we hit 100% netisr CPU load -- >> this >> limit is *much* lower with async crypto turned off. >> >> Important tunables already set: >> >> net.inet.ipsec.async_crypto=1 (turning this off just makes the symptoms >> appear >> at lower PPS rates) >> net.isr.dispatch=direct (deferred or hybrid does nothing to change the >> symptoms) >> net.inet.ip.intr_queue_maxlen=4096 >> >> Thoughts are welcome...if there's any way to stop the "spread" of the loss >> I'm >> all ears. It seems that somehow the IPSec traffic (perhaps by nature of its >> lengthy decryption process) is able to grab an unfair share of netisr queue >> 0, >> and that interferes with the other traffic. If there was a way to move the >> IPSec decryption to another netisr queue, that might fix the problem, but I >> don't see any tunables to do so. >> > > Thanks!