Dear Gábor,
I have a hint. My question is if iperf uses a single IP address pair. If
it is so, then there is a chance that only two CPU cores (one core per
direction) process all the interrupts.
You can easily check my hypothesis. During the iperf test, you should
execute a top command and check the load of every single CPU. If only
two of them have interrupt load, and they are 100% utilized, then this
is the root cause of the issue. Otherwise my hypothesis is refuted.
If my hypothesis is confirmed, then the underlying issue is that the RSS
implemented in OpenBSD in a way that the hash function used to
distribute the interrupts among the CPU cores only includes the IP
addresses and it does not include the port numbers.
Best regards,
Gábor
On 4/16/2024 8:22 PM, Szél Gábor wrote:
Dear @misc!
We have several more complex networks where openbsd is the router.
Structure of the network:
* OpenBSD redundant routers
- two OpenBSD
- CARP
- pfsync
- LACP trunks for LAN (2x 10Gbit) (1 side switch #1, 2 side
switch #2 + VPC ) use OpenBSD aggr device
* Cisco Nexus 3K switch-es
- VPC (2x40Gbit)
- redundant LACP links (1 side switch #1, 2 side switch #2 + VPC )
* many VLANs
* PF default block all trafic, and allowed traffic only
* the servers connected usually 2x10Gbit LACP
*hardware:*
* we updated this system in one place to OpenBSD 7.4
hardware: Dell PE 640 (2x Xeon Gold 6134 CPU, 64Gb RAM, Intel X710
network cards)
* we migrated the settings from the previous system (OpenBSD 7.0)
the previous hardware was different! (2x Xeon E5-2650, 64Gb RAM,
Intel X520 network cards)
*Problem:*
After upgrade with hardware change, we have very poor network
performance!!
Example: A simple veeam backup restore that goes through the openbsd
router hangs the network completely (very big lag)
In this case, the SSH connection on the router is have lag!
But OpenBSD dont have high CPU usage.
If i make simple iperf speed test from OpenBSD to other server (all
device have 10Gbit LACP link):
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 171 MBytes 1.44 Gbits/sec
[ 5] 1.00-2.00 sec 313 MBytes 2.63 Gbits/sec
[ 5] 2.00-3.00 sec 398 MBytes 3.34 Gbits/sec
[ 5] 3.00-4.00 sec 384 MBytes 3.22 Gbits/sec
[ 5] 4.00-5.00 sec 419 MBytes 3.51 Gbits/sec
[ 5] 5.00-6.00 sec 376 MBytes 3.16 Gbits/sec
[ 5] 6.00-7.00 sec 325 MBytes 2.73 Gbits/sec
[ 5] 7.00-8.00 sec 337 MBytes 2.82 Gbits/sec
[ 5] 8.00-9.00 sec 339 MBytes 2.85 Gbits/sec
[ 5] 9.00-10.00 sec 332 MBytes 2.78 Gbits/sec
[ 5] 10.00-10.19 sec 62.5 MBytes 2.75 Gbits/sec
Between other devices, servers, etc ... , the speed is perfectly fine
(stable 9-10 Gbits/sec)
Only routed performace is very-very slow.
if I make a speed test between two OpenBSDs (master router, backup router)
Better value but not perfect:
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 740 MBytes 6.20 Gbits/sec
[ 5] 1.00-2.00 sec 781 MBytes 6.55 Gbits/sec
[ 5] 2.00-3.00 sec 784 MBytes 6.58 Gbits/sec
[ 5] 3.00-4.00 sec 783 MBytes 6.57 Gbits/sec
[ 5] 4.00-5.00 sec 786 MBytes 6.59 Gbits/sec
[ 5] 5.00-6.00 sec 796 MBytes 6.68 Gbits/sec
[ 5] 6.00-7.00 sec 779 MBytes 6.54 Gbits/sec
[ 5] 7.00-8.00 sec 774 MBytes 6.49 Gbits/sec
[ 5] 8.00-9.00 sec 780 MBytes 6.55 Gbits/sec
[ 5] 9.00-10.00 sec 786 MBytes 6.59 Gbits/sec
[ 5] 10.00-10.00 sec 640 KBytes 10.2 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate
[ 5] 0.00-10.00 sec 7.61 GBytes 6.54 Gbits/sec
receiver
PF have ~2000 rules, but
If i disabled PF on tested OpenBSD router, nothing changes.
we've run out of ideas, what would be worth watching?
--
Regards
Gábor Szél
------------
email:gabor.s...@wantax.eu