Re: Unstable local network throughput

2016-08-17 Thread Adrian Chadd
On 17 August 2016 at 08:43, Ben RUBSON wrote: > >> On 17 Aug 2016, at 17:38, Adrian Chadd wrote: >> >> [snip] >> >> ok, so this is what I was seeing when I was working on this stuff last. >> >> The big abusers are: >> >> * so_snd lock, for TX'ing producer/consumer socket data >> * tcp stack pcb l

Re: Unstable local network throughput

2016-08-17 Thread Ben RUBSON
> On 17 Aug 2016, at 17:38, Adrian Chadd wrote: > > [snip] > > ok, so this is what I was seeing when I was working on this stuff last. > > The big abusers are: > > * so_snd lock, for TX'ing producer/consumer socket data > * tcp stack pcb locking (which rss tries to work around, but it again >

Re: Unstable local network throughput

2016-08-17 Thread Adrian Chadd
[snip] ok, so this is what I was seeing when I was working on this stuff last. The big abusers are: * so_snd lock, for TX'ing producer/consumer socket data * tcp stack pcb locking (which rss tries to work around, but it again doesn't help producer/consumer locking, only multiple sockets) * for s

Re: Unstable local network throughput

2016-08-17 Thread Ben RUBSON
> On 15 Aug 2016, at 16:49, Ben RUBSON wrote: > >> On 12 Aug 2016, at 00:52, Adrian Chadd wrote: >> >> Which ones of these hit the line rate comfortably? > > So Adrian, I ran tests again using FreeBSD 11-RC1. > I put iperf throughput in result files (so that we can classify them), as > well

Re: Unstable local network throughput

2016-08-16 Thread Ben RUBSON
> On 16 Aug 2016, at 21:36, Adrian Chadd wrote: > > On 16 August 2016 at 02:58, Ben RUBSON wrote: >> >>> On 16 Aug 2016, at 03:45, Adrian Chadd wrote: >>> >>> Hi, >>> >>> ok, can you try 5) but also running with the interrupt threads pinned to >>> CPU 1? >> >> What do you mean by interrup

Re: Unstable local network throughput

2016-08-16 Thread Adrian Chadd
On 16 August 2016 at 02:58, Ben RUBSON wrote: > >> On 16 Aug 2016, at 03:45, Adrian Chadd wrote: >> >> Hi, >> >> ok, can you try 5) but also running with the interrupt threads pinned to CPU >> 1? > > What do you mean by interrupt threads ? > > Perhaps you mean the NIC interrupts ? > In this case

Re: Unstable local network throughput

2016-08-16 Thread Ben RUBSON
> On 16 Aug 2016, at 03:45, Adrian Chadd wrote: > > Hi, > > ok, can you try 5) but also running with the interrupt threads pinned to CPU > 1? What do you mean by interrupt threads ? Perhaps you mean the NIC interrupts ? In this case see 6) and 7) where NIC IRQs are pinned to CPUs 0-11 (6) an

Re: Unstable local network throughput

2016-08-15 Thread Adrian Chadd
Hi, ok, can you try 5) but also running with the interrupt threads pinned to CPU 1? It looks like the interrupt threads are running on CPU 0, and my /guess/ (looking at the CPU usage distributions) that sometimes the userland bits run on the same CPU or numa domain as the interrupt bits, and it l

Re: Unstable local network throughput

2016-08-15 Thread Ben RUBSON
> On 12 Aug 2016, at 00:52, Adrian Chadd wrote: > > Which ones of these hit the line rate comfortably? So Adrian, I ran tests again using FreeBSD 11-RC1. I put iperf throughput in result files (so that we can classify them), as well as top -P ALL and pcm-memory.x. iperf results : columns 3&4 a

Re: Unstable local network throughput

2016-08-11 Thread Adrian Chadd
Which ones of these hit the line rate comfortably? -a On 11 August 2016 at 15:35, Ben RUBSON wrote: > >> On 11 Aug 2016, at 18:36, Adrian Chadd wrote: >> >> Hi! >> >> mlx4_core0: mem >> 0xfbe0-0xfbef,0xfb00-0xfb7f irq 64 at device 0.0 >> numa-domain 1 on pci16 >> mlx4_core:

Re: Unstable local network throughput

2016-08-11 Thread Ben RUBSON
> On 11 Aug 2016, at 18:36, Adrian Chadd wrote: > > Hi! > > mlx4_core0: mem > 0xfbe0-0xfbef,0xfb00-0xfb7f irq 64 at device 0.0 > numa-domain 1 on pci16 > mlx4_core: Initializing mlx4_core: Mellanox ConnectX VPI driver v2.1.6 > (Aug 11 2016) > > so the NIC is in numa-domain 1.

Re: Unstable local network throughput

2016-08-11 Thread Adrian Chadd
adrian did mean fixed-domain-rr. :-P sorry! (Sorry, needed to update my NUMA boxes, things "changed" since I wrote this.) -a ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail t

Re: Unstable local network throughput

2016-08-11 Thread Eric van Gyzen
On 08/11/16 12:54 PM, Ben RUBSON wrote: > >> On 11 Aug 2016, at 19:51, Ben RUBSON wrote: >> >> >>> On 11 Aug 2016, at 18:36, Adrian Chadd wrote: >>> >>> Hi! >> >> Hi Adrian, >> >>> mlx4_core0: mem >>> 0xfbe0-0xfbef,0xfb00-0xfb7f irq 64 at device 0.0 >>> numa-domain 1 on pci16 >>

Re: Unstable local network throughput

2016-08-11 Thread Ben RUBSON
> On 11 Aug 2016, at 18:36, Adrian Chadd wrote: > > Hi! > > mlx4_core0: mem > 0xfbe0-0xfbef,0xfb00-0xfb7f irq 64 at device 0.0 > numa-domain 1 on pci16 > mlx4_core: Initializing mlx4_core: Mellanox ConnectX VPI driver v2.1.6 > (Aug 11 2016) > > so the NIC is in numa-domain 1.

Re: Unstable local network throughput

2016-08-11 Thread Ben RUBSON
> On 11 Aug 2016, at 19:51, Ben RUBSON wrote: > > >> On 11 Aug 2016, at 18:36, Adrian Chadd wrote: >> >> Hi! > > Hi Adrian, > >> mlx4_core0: mem >> 0xfbe0-0xfbef,0xfb00-0xfb7f irq 64 at device 0.0 >> numa-domain 1 on pci16 >> mlx4_core: Initializing mlx4_core: Mellanox Conn

Re: Unstable local network throughput

2016-08-11 Thread Ben RUBSON
> On 11 Aug 2016, at 18:36, Adrian Chadd wrote: > > Hi! Hi Adrian, > mlx4_core0: mem > 0xfbe0-0xfbef,0xfb00-0xfb7f irq 64 at device 0.0 > numa-domain 1 on pci16 > mlx4_core: Initializing mlx4_core: Mellanox ConnectX VPI driver v2.1.6 > (Aug 11 2016) > > so the NIC is in numa-

Re: Unstable local network throughput

2016-08-11 Thread Adrian Chadd
Hi! mlx4_core0: mem 0xfbe0-0xfbef,0xfb00-0xfb7f irq 64 at device 0.0 numa-domain 1 on pci16 mlx4_core: Initializing mlx4_core: Mellanox ConnectX VPI driver v2.1.6 (Aug 11 2016) so the NIC is in numa-domain 1. Try pinning the worker threads to numa-domain 1 when you run the test:

Re: Unstable local network throughput

2016-08-10 Thread Ben RUBSON
> On 11 Aug 2016, at 00:11, Adrian Chadd wrote: > > hi, > > ok, lets start by getting the NUMA bits into the kernel so you can > mess with things. > > add this to the kernel > > options MAXMEMDOM=8 > (which hopefully is enough) > options VM_NUMA_ALLOC > options DEVICE_NUMA > > Then reboot an

Re: Unstable local network throughput

2016-08-10 Thread Adrian Chadd
On 10 August 2016 at 12:50, Ben RUBSON wrote: > >> On 10 Aug 2016, at 21:47, Adrian Chadd wrote: >> >> hi, >> >> yeah, I'd like you to do some further testing with NUMA. Are you able >> to run freebsd-11 or -HEAD on these boxes? > > Hi Adrian, > > Yes I currently have 11 BETA3 running on them. >

Re: Unstable local network throughput

2016-08-10 Thread Ben RUBSON
> On 10 Aug 2016, at 21:47, Adrian Chadd wrote: > > hi, > > yeah, I'd like you to do some further testing with NUMA. Are you able > to run freebsd-11 or -HEAD on these boxes? Hi Adrian, Yes I currently have 11 BETA3 running on them. I could also run BETA4. Ben __

Re: Unstable local network throughput

2016-08-10 Thread Adrian Chadd
hi, yeah, I'd like you to do some further testing with NUMA. Are you able to run freebsd-11 or -HEAD on these boxes? -adrian On 8 August 2016 at 07:01, Ben RUBSON wrote: > >> On 04 Aug 2016, at 11:40, Ben RUBSON wrote: >> >> >>> On 02 Aug 2016, at 22:11, Ben RUBSON wrote: >>> On 02 Aug

Re: Unstable local network throughput

2016-08-08 Thread Ben RUBSON
> On 04 Aug 2016, at 11:40, Ben RUBSON wrote: > > >> On 02 Aug 2016, at 22:11, Ben RUBSON wrote: >> >>> On 02 Aug 2016, at 21:35, Hans Petter Selasky wrote: >>> >>> The CX-3 driver doesn't bind the worker threads to specific CPU cores by >>> default, so if your CPU has more than one so-cal

Re: Unstable local network throughput

2016-08-08 Thread Ben RUBSON
> On 05 Aug 2016, at 10:30, Hans Petter Selasky wrote: > > On 08/04/16 23:49, Ben RUBSON wrote: >>> >>> On 04 Aug 2016, at 20:15, Ryan Stone wrote: >>> >>> On Thu, Aug 4, 2016 at 11:33 AM, Ben RUBSON wrote: >>> But even without RSS, I should be able to go up to 2x40Gbps, don't you >>> think

Re: Unstable local network throughput

2016-08-05 Thread Hans Petter Selasky
On 08/04/16 23:49, Ben RUBSON wrote: On 04 Aug 2016, at 20:15, Ryan Stone wrote: On Thu, Aug 4, 2016 at 11:33 AM, Ben RUBSON wrote: But even without RSS, I should be able to go up to 2x40Gbps, don't you think so ? Nobody already did this ? Try this patch (...) I also just tested the NODEB

Re: Unstable local network throughput

2016-08-04 Thread Ben RUBSON
> > On 04 Aug 2016, at 20:15, Ryan Stone wrote: > > On Thu, Aug 4, 2016 at 11:33 AM, Ben RUBSON wrote: > But even without RSS, I should be able to go up to 2x40Gbps, don't you think > so ? > Nobody already did this ? > > Try this patch > (...) I also just tested the NODEBUG kernel but it did

Re: Unstable local network throughput

2016-08-04 Thread Ben RUBSON
> On 04 Aug 2016, at 20:15, Ryan Stone wrote: > > On Thu, Aug 4, 2016 at 11:33 AM, Ben RUBSON wrote: > But even without RSS, I should be able to go up to 2x40Gbps, don't you think > so ? > Nobody already did this ? > > Try this patch > (...) I also just tested the NODEBUG kernel but I did no

Re: Unstable local network throughput

2016-08-04 Thread Ben RUBSON
> On 04 Aug 2016, at 20:15, Ryan Stone wrote: > > On Thu, Aug 4, 2016 at 11:33 AM, Ben RUBSON wrote: > But even without RSS, I should be able to go up to 2x40Gbps, don't you think > so ? > Nobody already did this ? > > Try this patch, which should improve performance when multiple TCP streams

Re: Unstable local network throughput

2016-08-04 Thread Ryan Stone
On Thu, Aug 4, 2016 at 11:33 AM, Ben RUBSON wrote: > But even without RSS, I should be able to go up to 2x40Gbps, don't you > think so ? > Nobody already did this ? > Try this patch, which should improve performance when multiple TCP streams are running in parallel over an mlx4_en port: https:/

Re: Unstable local network throughput

2016-08-04 Thread Ben RUBSON
> On 04 Aug 2016, at 17:33, Hans Petter Selasky wrote: > > On 08/04/16 17:24, Ben RUBSON wrote: >> >>> On 04 Aug 2016, at 11:40, Ben RUBSON wrote: >>> On 02 Aug 2016, at 22:11, Ben RUBSON wrote: > On 02 Aug 2016, at 21:35, Hans Petter Selasky wrote: > > The CX-3 driv

Re: Unstable local network throughput

2016-08-04 Thread Hans Petter Selasky
On 08/04/16 17:24, Ben RUBSON wrote: On 04 Aug 2016, at 11:40, Ben RUBSON wrote: On 02 Aug 2016, at 22:11, Ben RUBSON wrote: On 02 Aug 2016, at 21:35, Hans Petter Selasky wrote: The CX-3 driver doesn't bind the worker threads to specific CPU cores by default, so if your CPU has more th

Re: Unstable local network throughput

2016-08-04 Thread Ben RUBSON
> On 04 Aug 2016, at 11:40, Ben RUBSON wrote: > >> On 02 Aug 2016, at 22:11, Ben RUBSON wrote: >> >>> On 02 Aug 2016, at 21:35, Hans Petter Selasky wrote: >>> >>> The CX-3 driver doesn't bind the worker threads to specific CPU cores by >>> default, so if your CPU has more than one so-called

Re: Unstable local network throughput

2016-08-04 Thread Ben RUBSON
> On 02 Aug 2016, at 22:11, Ben RUBSON wrote: > >> On 02 Aug 2016, at 21:35, Hans Petter Selasky wrote: >> >> The CX-3 driver doesn't bind the worker threads to specific CPU cores by >> default, so if your CPU has more than one so-called numa, you'll end up that >> the bottle-neck is the hig

Re: Unstable local network throughput

2016-08-03 Thread Ben RUBSON
> On 03 Aug 2016, at 20:02, Hans Petter Selasky wrote: > > The mlx4 send and receive queues have each their set of taskqueues. Look in > output from "ps auxww". I can't find them, I even unloaded/reloaded the driver in order to catch the differences, but I did not found any relevant process.

Re: Unstable local network throughput

2016-08-03 Thread Hans Petter Selasky
On 08/03/16 18:57, Ben RUBSON wrote: taskqueue threads ? The mlx4 send and receive queues have each their set of taskqueues. Look in output from "ps auxww". --HPS ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/f

Re: Unstable local network throughput

2016-08-03 Thread Ben RUBSON
> On 02 Aug 2016, at 21:35, Hans Petter Selasky wrote: > > The CX-3 driver doesn't bind the worker threads to specific CPU cores by > default, so if your CPU has more than one so-called numa, you'll end up that > the bottle-neck is the high-speed link between the CPU cores and not the > card.

Re: Unstable local network throughput

2016-08-02 Thread Ben RUBSON
> On 03 Aug 2016, at 04:32, Eugene Grosbein wrote: > > If you have gateway_enable="YES" (sysctl net.inet.ip.forwarding=1) > then try to disable this forwarding setting and rerun your tests to compare > results. Thank you Eugene for this, but net.inet.ip.forwarding is disabled by default and I

Re: Unstable local network throughput

2016-08-02 Thread Eugene Grosbein
03.08.2016 1:43, Ben RUBSON пишет: Hello, I'm trying to reach the 40Gb/s max throughtput between 2 hosts running a ConnectX-3 Mellanox network adapter. If you have gateway_enable="YES" (sysctl net.inet.ip.forwarding=1) then try to disable this forwarding setting and rerun your tests to compar

Re: Unstable local network throughput

2016-08-02 Thread Ben RUBSON
> On 02 Aug 2016, at 21:35, Hans Petter Selasky wrote: > > Hi, Thank you for your answer Hans Petter ! > The CX-3 driver doesn't bind the worker threads to specific CPU cores by > default, so if your CPU has more than one so-called numa, you'll end up that > the bottle-neck is the high-speed

Re: Unstable local network throughput

2016-08-02 Thread Hans Petter Selasky
On 08/02/16 20:43, Ben RUBSON wrote: Hello, I'm trying to reach the 40Gb/s max throughtput between 2 hosts running a ConnectX-3 Mellanox network adapter. FreeBSD 10.3 just installed, last updates performed. Network adapters running last firmwares / last drivers. No workload at all, just iPerf

Unstable local network throughput

2016-08-02 Thread Ben RUBSON
Hello, I'm trying to reach the 40Gb/s max throughtput between 2 hosts running a ConnectX-3 Mellanox network adapter. FreeBSD 10.3 just installed, last updates performed. Network adapters running last firmwares / last drivers. No workload at all, just iPerf as the benchmark tool. ### Step 1 : I