Chris, 

The numbers you are getting are correct. :)

Practically speaking, most motherboards pin out between 4 and 5 x8 slots to 
every CPU socket. At PCI-E Gen 2 speeds (5 GT/s), each slot is capable of 
carrying 20 Gb/s of traffic  (limited to ~16 Gb/s of 64B packets). I would have 
expected the 64-byte  traffic capacity to be a bit higher than 80 Gb/s, but 
either way the numbers you are achieving are well within the capability of the 
system if you are careful about pinning cores to ports, which you seem to be 
doing. QPI is not a limiter either for the amount of traffic you are generating 
currently. 

Regards,
-Venky

-----Original Message-----
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Chris Pappas
Sent: Sunday, September 22, 2013 7:32 AM
To: dev at dpdk.org
Subject: [dpdk-dev] Question regarding throughput number with DPDK l2fwd with 
Wind River System's pktgen

Hi,

We have a question about the performance numbers we are getting measured 
through the pktgen application provided by Wind River Systems. The current 
setup is the following:

We have two machines, each equipped with 6 dual-port 10 GbE NICs (with a total 
of 12 ports). Machine 0 runs DPDK L2FWD code, and Machine 1 runs Wind River 
System's pktgen. L2FWD is modified to forward the incoming packets to other 
statically assigned output port.

Our machines have two Intel Xeon E5-2600 CPUs connected via QPI, and has two 
riser slots each having three 10Gbps NICs. Two NICS in riser slot 1
(NIC0 and NIC1) is connected to CPU 1 via PCIe Gen3, while the remaining
NIC2 is connected to CPU2 also via PCIe Gen3. In riser slot 2, all NICs (NICs 
3,4, and 5) are connected to CPU2 via PCIe Gen3. We were careful to assign the 
NIC ports to cores of CPU sockets that have direct physical connection to 
achieve max performance.


With this setup, we are getting 120 Gbps throughput measured by pktgen with 
packet size 1500 Bytes. For 64 Byte packets, we are getting around 80 Gbps.
Do these performance numbers make sense? We are reading related papers in this 
domain, and seems like our numbers are unusually high. We did our theoretical 
calculation and find that it should theoretically be possible because it does 
not hit the PCIe bandwidth or our machine, nor does it exceed QPI bandwidth 
when packets are forwarded over the NUMA node. Can you share your thoughts / 
experience with this?

Thank you,

Chris

Reply via email to