[dpdk-dev] Performance issues with Mellanox Connectx-3 EN

2015-08-12 Thread Xiaozhou Li
Hi folks,

I am getting performance scalability issues with DPDK on Mellanox Connectx-3
. 

Each of our machine has 16 cores and a single-port 40G Mellanox Connectx-3
EN. We find out the server throughput *does not scale* with number of
cores. With a single thread on one core, we can get about 2 Mpps with a
simple echo server implementation. However, the performance number does not
increase as we use more cores. Our implementation is based on the l2fwd
example.

I'd greatly appreciate it if anyone could provide some insights on what
might be the problem and how can we improve the performance with
Mellanox Connectx-3
EN. Thanks!

Best,
Xiaozhou


[dpdk-dev] Performance issues with Mellanox Connectx-3 EN

2015-08-13 Thread Xiaozhou Li
Hi Qian and Gilad,

Thanks for your reply. We are using dpdk-2.0.0 and mlnx-en-2.4-1.0.0.1 on a
Mellanox Connectx-3 EN with a single 40G port.

I ran testpmd on the server with following commands: sudo ./testpmd -c 0xff
-n 4 -- -i --portmask=0x1 --port-topology=chained --rxq=4 --txq=4
--nb-cores=4; set fwd macswap

I have multiple clients send packets and receive replies. The server
throughput is still only about 2Mpps. Testpmd shows no RX-dropped packet,
but "ifconfig port" shows many dropped packets.

Please let me know if I am doing anything wrong and what else should I
check. I am also copying the output when starting testpmd at the end of
this email. Not sure if there is any useful information.

Thanks!
Xiaozhou


EAL: Detected lcore 0 as core 0 on socket 0
 ... (omit) ...
EAL: Detected 32 lcore(s)
EAL: VFIO modules not all loaded, skip VFIO support...
EAL: Setting up memory...
 ... (omit) ...
EAL: Ask a virtual area of 0xa0 bytes
EAL: Virtual area found at 0x7f2d2fe0 (size = 0xa0)
EAL: Requesting 8192 pages of size 2MB from socket 0
EAL: Requesting 8192 pages of size 2MB from socket 1
EAL: TSC frequency is ~214 KHz
EAL: Master lcore 0 is ready (tid=39add900;cpuset=[0])
PMD: ENICPMD trace: rte_enic_pmd_init
EAL: lcore 4 is ready (tid=3676b700;cpuset=[4])
EAL: lcore 6 is ready (tid=35769700;cpuset=[6])
EAL: lcore 5 is ready (tid=35f6a700;cpuset=[5])
EAL: lcore 2 is ready (tid=3776d700;cpuset=[2])
EAL: lcore 1 is ready (tid=37f6e700;cpuset=[1])
EAL: lcore 3 is ready (tid=36f6c700;cpuset=[3])
EAL: lcore 7 is ready (tid=34f68700;cpuset=[7])
EAL: PCI device :04:00.0 on NUMA socket 0
EAL:   probe driver: 8086:1521 rte_igb_pmd
EAL:   Not managed by a supported kernel driver, skipped
EAL: PCI device :04:00.1 on NUMA socket 0
EAL:   probe driver: 8086:1521 rte_igb_pmd
EAL:   Not managed by a supported kernel driver, skipped
EAL: PCI device :06:00.0 on NUMA socket 0
EAL:   probe driver: 15b3:1003 librte_pmd_mlx4
PMD: librte_pmd_mlx4: PCI information matches, using device "mlx4_0" (VF:
false)
PMD: librte_pmd_mlx4: 1 port(s) detected
PMD: librte_pmd_mlx4: port 1 MAC address is f4:52:14:5a:8f:70
EAL: PCI device :81:00.0 on NUMA socket 1
EAL:   probe driver: 8086:1528 rte_ixgbe_pmd
EAL:   Not managed by a supported kernel driver, skipped
EAL: PCI device :81:00.1 on NUMA socket 1
EAL:   probe driver: 8086:1528 rte_ixgbe_pmd
EAL:   Not managed by a supported kernel driver, skipped
Interactive-mode selected
Configuring Port 0 (socket 0)
PMD: librte_pmd_mlx4: 0x884360: TX queues number update: 0 -> 4
PMD: librte_pmd_mlx4: 0x884360: RX queues number update: 0 -> 4
Port 0: F4:52:14:5A:8F:70
Checking link statuses...
Port 0 Link Up - speed 4 Mbps - full-duplex
Done

*testpmd>* show config rxtx
  macswap packet forwarding - CRC stripping disabled - packets/burst=32
  nb forwarding cores=4 - nb forwarding ports=1
  RX queues=4 - RX desc=128 - RX free threshold=0
  RX threshold registers: pthresh=0 hthresh=0 wthresh=0
  TX queues=4 - TX desc=512 - TX free threshold=0
  TX threshold registers: pthresh=0 hthresh=0 wthresh=0
  TX RS bit threshold=0 - TXQ flags=0x0
*testpmd> *show config fwd
macswap packet forwarding - ports=1 - cores=4 - streams=4 - NUMA support
disabled, MP over anonymous pages disabled
Logical Core 1 (socket 0) forwards packets on 1 streams:
  RX P=0/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00
Logical Core 2 (socket 0) forwards packets on 1 streams:
  RX P=0/Q=1 (socket 0) -> TX P=0/Q=1 (socket 0) peer=02:00:00:00:00:00
Logical Core 3 (socket 0) forwards packets on 1 streams:
  RX P=0/Q=2 (socket 0) -> TX P=0/Q=2 (socket 0) peer=02:00:00:00:00:00
Logical Core 4 (socket 0) forwards packets on 1 streams:
  RX P=0/Q=3 (socket 0) -> TX P=0/Q=3 (socket 0) peer=02:00:00:00:00:00



On Thu, Aug 13, 2015 at 6:13 AM, Gilad Berman  wrote:

> Xiaozhou,
> Following Qian answer - 2Mpps is VERY (VERY) low and far below what we see
> even with single core.
> Which version of DPDK and PMD are you using? Are you using MLNX optimized
> libs for PMD? Can you provide more details on the exact setup?
> Can you run a simple test with testpmd and see if you are getting the same
> results?
>
> Just to be clear - it does not matter which version you are using, 2Mpps
> is very far from what you should get :)
>
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Xu, Qian Q
> Sent: Thursday, August 13, 2015 6:25 AM
> To: Xiaozhou Li ; dev at dpdk.org
> Subject: Re: [dpdk-dev] Performance issues with Mellanox Connectx-3 EN
>
> Xiaozhou
> So seems the performance bottleneck is not at the core, have you checked
> that the Mellanox NIC's configuration? How many queues per port are you
> using? Could you try l3fwd example with Mellanox to check if the
> performance is good enough? I'm not familiar with Mel