Dear Stuart and Hrvoje,
Thank you very much for your answers!
Please see my answers inline.
On 10/16/2022 4:30 AM, Hrvoje Popovski wrote:
A few network drivers have support for multiple queues (if my grepping
is correct: aq igc bnxt ix ixl mcx vmx) - typically you will see the
nunber of queues reported in the dmesg attach line if supported
Yes, in my case, 16 queues are supported:
ix0 at pci5 dev 0 function 0 "Intel X540T" rev 0x01, msix, 16 queues,
address a0:36:9f:c5:e6:58
ix1 at pci5 dev 0 function 1 "Intel X540T" rev 0x01, msix, 16 queues,
address a0:36:9f:c5:e6:5a
- but
there's no interface to adjust what's fed into the hash function.
Thank you very much for this information, then I will stop seeking for it!
But I am really sad that it is so.
I have performed some IPv4 and IPv6 packet forwarding throughput tests
(using bidirectional traffic with 64-byte frames for IPv4 and 84-byte
frames for IPv6) booting OpenBSD first in SP and then in MP mode.
For IPv4, the MP results are about 28% higher than SP results. For IPv6,
the increase only about 20%.
I have examined the output of the top command during the bidirectional
throughput tests. It seems that only two cores (CPU09 and CPU25) were
used to process the interrupts and a single core (CPU02) was used to
perform the packet forwarding (I used bold font for the three numbers
below, I hope they will be still visible after the listserver processes
my e-mail):
CPU00 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr,
100% idle
CPU01 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr,
100% idle
CPU02 states: 0.0% user, 0.0% nice, *93.8% sys*, 6.2% spin, 0.0%
intr, 0.0% idle
CPU03 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr,
100% idle
CPU04 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr,
100% idle
CPU05 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr,
100% idle
CPU06 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr,
100% idle
CPU07 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr,
100% idle
CPU08 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr,
100% idle
CPU09 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, *25.0%
intr*, 75.0% idle
CPU10 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr,
100% idle
CPU11 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr,
100% idle
CPU12 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr,
100% idle
CPU13 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr,
100% idle
CPU14 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr,
100% idle
CPU15 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr,
100% idle
CPU16 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr,
100% idle
CPU17 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr,
100% idle
CPU18 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr,
100% idle
CPU19 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr,
100% idle
CPU20 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr,
100% idle
CPU21 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr,
100% idle
CPU22 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr,
100% idle
CPU23 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr,
100% idle
CPU24 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr,
100% idle
CPU25 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, *26.7%
intr*, 73.3% idle
CPU26 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr,
100% idle
CPU27 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr,
100% idle
CPU28 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr,
100% idle
CPU29 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr,
100% idle
CPU30 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr,
100% idle
CPU31 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr,
100% idle
Memory: Real: 32M/1397M act/tot Free: 371G Cache: 712M Swap: 0K/256M
Then it is clear, why the performance did not increase significantly in
the MP case at 32 cores.
32 cores is quite a lot for OpenBSD, more than around 8 is likely to
be a waste for current versions in many use cases.
Yes, you are definitely right.
Hi,
does it make sense to mention RSS and other stuff like TSO, MSI-X,
Multiple queues in man ?
I would say definitely: YES.
Even though, the high number of cores and the 16 queues seem to be
useless in my case, it is just because I used always the same IP address
pairs (one pair for each direction). But in other use cases, the
different source and destination IP addresses may cause a somewhat even
distribution of the interrupts among the CPU cores. So the 16 queues can
be useful.
Best regards,
Gábor
Something like
https://leaf.dragonflybsd.org/cgi/web-man?command=ix§ion=ANY