> On 04 Aug 2016, at 11:40, Ben RUBSON <ben.rub...@gmail.com> wrote: > > >> On 02 Aug 2016, at 22:11, Ben RUBSON <ben.rub...@gmail.com> wrote: >> >>> On 02 Aug 2016, at 21:35, Hans Petter Selasky <h...@selasky.org> wrote: >>> >>> The CX-3 driver doesn't bind the worker threads to specific CPU cores by >>> default, so if your CPU has more than one so-called numa, you'll end up >>> that the bottle-neck is the high-speed link between the CPU cores and not >>> the card. A quick and dirty workaround is to "cpuset" iperf and the >>> interrupt and taskqueue threads to specific CPU cores. >> >> My CPUs : 2x E5-2620v3 with DDR4@1866. > > OK, so I cpuset all Mellanox interrupts to one NUMA, as well as the iPerf > processes, and I'm able to reach max bandwidth. > Choosing the wrong NUMA (or both, or one for interrupts, the other one for > iPerf, etc...) totally kills throughput. > > However, full-duplex throughput is still limited, I can't manage to reach > 2x40Gb/s, throttle is at about 45Gb/s. > I tried many different cpuset layouts, but I never went above 45Gb/s. > (Linux allowed me to reach 2x40Gb/s so hardware is not a bottleneck)
OK, I then found a workaround. In the motherboards' BIOS, I disabled the following option : Advanced / ACPI Settings / NUMA And I'm now able to go up to 2x40Gb/s ! I'm then even able to achieve this throughput without any cpuset ! Strange that Linux was able to deal with this setting, but I'm pretty sure production performance will be easier to maintain with only 1 NUMA. Feel free to ask me if you want further testing with 2 NUMA. Ben _______________________________________________ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"