Hi all,

I've been pulling my hair out over a rather interesting problem that I've 
traced into an interaction between IPSec and the rest of the network stack.  
I'm not sure if this is a bug or if there's a tunable I'm missing somewhere, so 
here goes...

We have a pf-based multi-CPU firewall running FreeBSD 13.x with multiple 
subnets directly attached, one per NIC, as well as multiple IPSec tunnels to 
remote sites alongside a UDP multicast proxy system (this becomes important 
later).  For the most part the setup works very well, however we have 
discovered through extensive trial and error / debugging that we can induce 
major packet loss on the firewall host itself by simply flooding the system 
with small IPSec packets (high PPS, low bandwidth).

The aforementioned (custom) multicast UDP proxy is an excellent canary for the 
problem, as it checks for and reports any dropped packets in the receive data 
stream.  Normally, there are no dropped packets even with saturated links on 
any of the local interfaces or when *sending* high packet rates over IPsec.  As 
soon as high packet rates are *received* over IPsec, the following happens:

1.) netisr on one core only goes to 100% interrupt load
2.) net.inet.ip.intr_queue_drops starts incrementing rapidly
3.) The multicast receiver, which only receives traffic from one of the *local* 
interfaces (not any of the IPsec tunnels), begins to see packet loss despite 
more than adequate buffers in place with no buffer overflows in the UDP stack / 
application buffering.  The packets are simply never received by the kernel UDP 
stack.
4.) Other applications (e.g. NTP) start to see sporadic packet loss as well, 
again on local traffic not over IPsec.

As soon as the IPSec receive traffic is lowered enough to get the netisr 
interrupt load below 100% on the one CPU core, everything recovers and 
functions normally.  Note this has to be done by lowering the IPSec transmit 
rate on the remote system, there is no way I have discovered to "protect" the 
receiver from this kind of overload.

While I would expect packet loss in an overloaded IPSec link scenario like this 
just due to the decryption not keeping up, I would also expect that loss to be 
confined to the IPSec tunnel.  It should not spider out into the rest of the 
system and start affecting all of the other applications and 
routing/firewalling on the box -- this is what was miserable to debug, as the 
IPSec link was originally only hitting the PPS limits described above 
sporadically during overnight batch processing.  Now that I know what's going 
on, I can provoke easily with iperf3 in UDP mode.  On the boxes we are using, 
the limit seems to be around 50kPPS before we hit 100% netisr CPU load -- this 
limit is *much* lower with async crypto turned off.

Important tunables already set:

net.inet.ipsec.async_crypto=1 (turning this off just makes the symptoms appear 
at lower PPS rates)
net.isr.dispatch=direct (deferred or hybrid does nothing to change the symptoms)
net.inet.ip.intr_queue_maxlen=4096

Thoughts are welcome...if there's any way to stop the "spread" of the loss I'm 
all ears.  It seems that somehow the IPSec traffic (perhaps by nature of its 
lengthy decryption process) is able to grab an unfair share of netisr queue 0, 
and that interferes with the other traffic.  If there was a way to move the 
IPSec decryption to another netisr queue, that might fix the problem, but I 
don't see any tunables to do so.

Thanks!

Reply via email to