Re: Does OpenBSD support Receive Side Scaling (also called: multi-queue receiving)

Gabor LENCSE Tue, 18 Oct 2022 07:04:57 -0700

Dear Stuart and Hrvoje,

Thank you very much for your answers!


Please see my answers inline.

On 10/16/2022 4:30 AM, Hrvoje Popovski wrote:

A few network drivers have support for multiple queues (if my grepping
is correct: aq igc bnxt ix ixl mcx vmx) - typically you will see the
nunber of queues reported in the dmesg attach line if supported


Yes, in my case, 16 queues are supported:

ix0 at pci5 dev 0 function 0 "Intel X540T" rev 0x01, msix, 16 queues,address a0:36:9f:c5:e6:58ix1 at pci5 dev 0 function 1 "Intel X540T" rev 0x01, msix, 16 queues,address a0:36:9f:c5:e6:5a

  - but
there's no interface to adjust what's fed into the hash function.


Thank you very much for this information, then I will stop seeking for it!

But I am really sad that it is so.

I have performed some IPv4 and IPv6 packet forwarding throughput tests(using bidirectional traffic with 64-byte frames for IPv4 and 84-byteframes for IPv6) booting OpenBSD first in SP and then in MP mode.

For IPv4, the MP results are about 28% higher than SP results. For IPv6,the increase only about 20%.

I have examined the output of the top command during the bidirectionalthroughput tests. It seems that only two cores (CPU09 and CPU25) wereused to process the interrupts and a single core (CPU02) was used toperform the packet forwarding (I used bold font for the three numbersbelow, I hope they will be still visible after the listserver processesmy e-mail):

CPU00 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr, 100% idleCPU01 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr, 100% idleCPU02 states: 0.0% user, 0.0% nice, *93.8% sys*, 6.2% spin, 0.0%intr, 0.0% idleCPU03 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr, 100% idleCPU04 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr, 100% idleCPU05 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr, 100% idleCPU06 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr, 100% idleCPU07 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr, 100% idleCPU08 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr, 100% idleCPU09 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, *25.0%intr*, 75.0% idleCPU10 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr, 100% idleCPU11 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr, 100% idleCPU12 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr, 100% idleCPU13 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr, 100% idleCPU14 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr, 100% idleCPU15 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr, 100% idleCPU16 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr, 100% idleCPU17 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr, 100% idleCPU18 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr, 100% idleCPU19 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr, 100% idleCPU20 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr, 100% idleCPU21 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr, 100% idleCPU22 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr, 100% idleCPU23 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr, 100% idleCPU24 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr, 100% idleCPU25 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, *26.7%intr*, 73.3% idleCPU26 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr, 100% idleCPU27 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr, 100% idleCPU28 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr, 100% idleCPU29 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr, 100% idleCPU30 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr, 100% idleCPU31 states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr, 100% idle

Memory: Real: 32M/1397M act/tot Free: 371G Cache: 712M Swap: 0K/256M

Then it is clear, why the performance did not increase significantly inthe MP case at 32 cores.


32 cores is quite a lot for OpenBSD, more than around 8 is likely to
be a waste for current versions in many use cases.


Yes, you are definitely right.


Hi,

does it make sense to mention RSS and other stuff like TSO, MSI-X,
Multiple queues in man ?


I would say definitely: YES.

Even though, the high number of cores and the 16 queues seem to beuseless in my case, it is just because I used always the same IP addresspairs (one pair for each direction). But in other use cases, thedifferent source and destination IP addresses may cause a somewhat evendistribution of the interrupts among the CPU cores. So the 16 queues canbe useful.


Best regards,

Gábor

Something like
https://leaf.dragonflybsd.org/cgi/web-man?command=ix&section=ANY

Re: Does OpenBSD support Receive Side Scaling (also called: multi-queue receiving)

Reply via email to