Hi, we are benchmarking DPDK l2fwd performance by using DPDK Pktgen (both up to date). We have connected two server machines back-to-back, and each machine is a dual-socket server with 6 dual-port 10G NICs (12 ports in total with 120 Gbps). Four of the NICs (8 ports in total) are connected to socket 0 and the other two (4 ports in total) are connected to socket 1. With 1500 byte packets we saturate line rate, however, with 64 byte packets we do not.
By running l2fwd (./l2fwd -c 0xff0f -n 4 -- -p 0xfff) we get following performance reported by Pktgen: Rx/Tx 7386/9808 7386/9807 7413/9837 7413/9827 7397/9816 7397/9822 7400/9823 7400/9823 7394/9820 7394/9807 7372/9768 7372/9788 L2fwd reports 0 dropped packets in total. Another observation is that Pktgen does not saturate exactly the line rate as for 1500 byte packets we observe exactly 10 Gbps Tx. * The way the coremask (-c) works is quite clear (for our case the 4 LSB are cores of socket 0, the next 4 LSB of socket 1, then socket 0 and socket 1 again). However, the port mask only defines which NICs are enabled and we would like to know how do we ensure that the cores that are assigned to the NICs are on the same socket as the corresponding NICs, or is this done automatically? The command we use to run l2fwd is the following: ./l2fwd -c 0xff0f -n 4 -- -p 0xfff * The next observation is that if we run again l2fwd with a different coremask and enable all our cores (./l2fwd -c 0xffff -n 4 -- -p 0xfff), performance drops significantly, and results are the following: Rx/Tx 7380/9807 7380/9806 7422/9850 7423/9789 2467/9585 2467/9624 1399/9809 1399/9806 7391/9816 7392/9802 7370/9789 7370/9789 We observe that ports P4-P7 have a very low throughput, and they correspond to the cores we enabled in the coremask. This result seems weird and make the assignment of cores to NICs seem as a logical explanation. Moreover, l2fwd reports many dropped packets only for these 4 NICs. We would like to know if there is an obvious mistake in our configuration, or if there are some steps we can take to debug this. 6Wind reports a platform limit of 160 Mpps, but we are below this with a similar platform. Is the bottleneck the PCIe? Thank you in advance for your time. Best regards, Chris Pappas