On Sat, Jan 16, 2021 at 10:39 AM Igor Russkikh <irussk...@marvell.com> wrote: > > When testing high performance numbers, it is often that CPU performance > limits the max values device can reach (both in pps and in gbps) > > Here instead of recreating each packet separately, we use clones counter > to resend the same mbuf to the line multiple times. > > PMDs handle that transparently due to reference counting inside of mbuf. > > Reaching max PPS on small packet sizes helps here: > Some data from our 2 port x 50G device. Using 2*6 tx queues, 64b packets, > PowerEdge R7525, AMD EPYC 7452: > > ./build/app/dpdk-testpmd -l 32-63 -- --forward-mode=flowgen \ > --rxq=6 --txq=6 --disable-crc-strip --burst=512 \ > --flowgen-clones=0 --txd=4096 --stats-period=1 --txpkts=64 > > Gives ~46MPPS TX output: > > Tx-pps: 22926849 Tx-bps: 11738590176 > Tx-pps: 23642629 Tx-bps: 12105024112 > > Setting flowgen-clones to 512 pushes TX almost to our device > physical limit (68MPPS) using same 2*6 queues(cores): > > Tx-pps: 34357556 Tx-bps: 17591073696 > Tx-pps: 34353211 Tx-bps: 17588802640 > > Doing similar measurements per core, I see one core can do > 6.9MPPS (without clones) vs 11MPPS (with clones) > > Verified on Marvell qede and atlantic PMDs.
Ubuntu 18.04 gcc complains: https://github.com/ovsrobot/dpdk/runs/1713302522?check_suite_focus=true#step:14:3097 Can you have a look? -- David Marchand