Dear Stuart and Hrvoje,

Thank you very much for your answers!

Please see my answers inline.

On 10/16/2022 4:30 AM, Hrvoje Popovski wrote:
A few network drivers have support for multiple queues (if my grepping
is correct: aq igc bnxt ix ixl mcx vmx) - typically you will see the
nunber of queues reported in the dmesg attach line if supported

Yes, in my case, 16 queues are supported:

ix0 at pci5 dev 0 function 0 "Intel X540T" rev 0x01, msix, 16 queues, address a0:36:9f:c5:e6:58 ix1 at pci5 dev 0 function 1 "Intel X540T" rev 0x01, msix, 16 queues, address a0:36:9f:c5:e6:5a

  - but
there's no interface to adjust what's fed into the hash function.

Thank you very much for this information, then I will stop seeking for it!

But I am really sad that it is so.

I have performed some IPv4 and IPv6 packet forwarding throughput tests (using bidirectional traffic with 64-byte frames for IPv4 and 84-byte frames for IPv6) booting OpenBSD first in SP and then in MP mode.

For IPv4, the MP results are about 28% higher than SP results. For IPv6, the increase only about 20%.

I have examined the output of the top command during the bidirectional throughput tests. It seems that only two cores (CPU09 and CPU25) were used to process the interrupts and a single core (CPU02) was used to perform the packet forwarding (I used bold font for the three numbers below, I hope they will be still visible after the listserver processes my e-mail):

CPU00 states:  0.0% user,  0.0% nice, 0.0% sys,  0.0% spin,  0.0% intr,  100% idle CPU01 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  100% idle CPU02 states:  0.0% user,  0.0% nice, *93.8% sys*,  6.2% spin,  0.0% intr,  0.0% idle CPU03 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  100% idle CPU04 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  100% idle CPU05 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  100% idle CPU06 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  100% idle CPU07 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  100% idle CPU08 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  100% idle CPU09 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, *25.0% intr*, 75.0% idle CPU10 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  100% idle CPU11 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  100% idle CPU12 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  100% idle CPU13 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  100% idle CPU14 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  100% idle CPU15 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  100% idle CPU16 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  100% idle CPU17 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  100% idle CPU18 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  100% idle CPU19 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  100% idle CPU20 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  100% idle CPU21 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  100% idle CPU22 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  100% idle CPU23 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  100% idle CPU24 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  100% idle CPU25 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, *26.7% intr*, 73.3% idle CPU26 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  100% idle CPU27 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  100% idle CPU28 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  100% idle CPU29 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  100% idle CPU30 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  100% idle CPU31 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  100% idle
Memory: Real: 32M/1397M act/tot Free: 371G Cache: 712M Swap: 0K/256M

Then it is clear, why the performance did not increase significantly in the MP case at 32 cores.


32 cores is quite a lot for OpenBSD, more than around 8 is likely to
be a waste for current versions in many use cases.

Yes, you are definitely right.



Hi,

does it make sense to mention RSS and other stuff like TSO, MSI-X,
Multiple queues in man ?

I would say definitely: YES.

Even though, the high number of cores and the 16 queues seem to be useless in my case, it is just because I used always the same IP address pairs (one pair for each direction). But in other use cases, the different source and destination IP addresses may cause a somewhat even distribution of the interrupts among the CPU cores. So the 16 queues can be useful.

Best regards,

Gábor

Something like
https://leaf.dragonflybsd.org/cgi/web-man?command=ix&section=ANY

Reply via email to