Hi, We have a question about the performance numbers we are getting measured through the pktgen application provided by Wind River Systems. The current setup is the following:
We have two machines, each equipped with 6 dual-port 10 GbE NICs (with a total of 12 ports). Machine 0 runs DPDK L2FWD code, and Machine 1 runs Wind River System's pktgen. L2FWD is modified to forward the incoming packets to other statically assigned output port. Our machines have two Intel Xeon E5-2600 CPUs connected via QPI, and has two riser slots each having three 10Gbps NICs. Two NICS in riser slot 1 (NIC0 and NIC1) is connected to CPU 1 via PCIe Gen3, while the remaining NIC2 is connected to CPU2 also via PCIe Gen3. In riser slot 2, all NICs (NICs 3,4, and 5) are connected to CPU2 via PCIe Gen3. We were careful to assign the NIC ports to cores of CPU sockets that have direct physical connection to achieve max performance. With this setup, we are getting 120 Gbps throughput measured by pktgen with packet size 1500 Bytes. For 64 Byte packets, we are getting around 80 Gbps. Do these performance numbers make sense? We are reading related papers in this domain, and seems like our numbers are unusually high. We did our theoretical calculation and find that it should theoretically be possible because it does not hit the PCIe bandwidth or our machine, nor does it exceed QPI bandwidth when packets are forwarded over the NUMA node. Can you share your thoughts / experience with this? Thank you, Chris