On Tue, Jun 25, 2024 at 05:29:35PM +0200, Maxime Coquelin wrote: > Hi Mattias, > > On 6/20/24 19:57, Mattias Rönnblom wrote: > > This patch set make DPDK library, driver, and application code use the > > compiler/libc memcpy() by default when functions in <rte_memcpy.h> are > > invoked. > > > > The various custom DPDK rte_memcpy() implementations may be retained > > by means of a build-time option. > > > > This patch set only make a difference on x86, PPC and ARM. Loongarch > > and RISCV already used compiler/libc memcpy(). > > It indeed makes a difference on x86! > > Just tested latest main with and without your series on > Intel(R) Xeon(R) Gold 6438N. > > The test is a simple IO loop between a Vhost PMD and a Virtio-user PMD: > # dpdk-testpmd -l 4-6 --file-prefix=virtio1 --no-pci --vdev > 'net_virtio_user0,mac=00:01:02:03:04:05,path=./vhost-net,server=1,mrg_rxbuf=1,in_order=1' > --single-file-segments -- -i > testpmd> start > > # dpdk-testpmd -l 8-10 --file-prefix=vhost1 --no-pci --vdev > 'net_vhost0,iface=vhost-net,client=1' --single-file-segments -- -i > testpmd> start tx_first 32 > > Latest main: 14.5Mpps > Latest main + this series: 10Mpps >
I ran the above benchmark on my Raptor Lake desktop (locked to 3,2 GHz). GCC 12.3.0. Core use_cc_memcpy Mpps E false 9.5 E true 9.7 P false 16.4 P true 13.5 On the P-cores, there's a significant performance regression, although not as bad as the one you see on your Sapphire Rapids Xeon. On the E-cores, there's actually a slight performance gain. The virtio PMD does not directly invoke rte_memcpy() or anything else from <rte_memcpy.h>, but rather use memcpy(), so I'm not sure I understand what's going on here. Does the virtio driver delegate some performance-critical task to some module that in turns uses rte_memcpy()? > So for me, it should be disabled by default. > > Regards, > Maxime > > > This patch set includes a number of fixes in drivers and libraries > > which errornously relied on <rte_memcpy.h> including header files > > (i.e., <rte_vect.h>) required by its implementation. > > > > Mattias Rönnblom (13): > > net/i40e: add missing vector API header include > > net/iavf: add missing vector API header include > > net/ice: add missing vector API header include > > net/ixgbe: add missing vector API header include > > net/ngbe: add missing vector API header include > > net/txgbe: add missing vector API header include > > net/virtio: add missing vector API header include > > net/fm10k: add missing vector API header include > > event/dlb2: include headers for vector and memory copy APIs > > net/octeon_ep: add missing vector API header include > > distributor: add missing vector API header include > > fib: add missing vector API header include > > eal: provide option to use compiler memcpy instead of RTE > > > > config/meson.build | 1 + > > doc/guides/rel_notes/release_24_07.rst | 21 +++++++ > > drivers/event/dlb2/dlb2.c | 2 + > > drivers/net/fm10k/fm10k_rxtx_vec.c | 3 +- > > drivers/net/i40e/i40e_rxtx_vec_sse.c | 3 +- > > drivers/net/iavf/iavf_rxtx_vec_sse.c | 3 +- > > drivers/net/ice/ice_rxtx_vec_sse.c | 2 +- > > drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 3 +- > > drivers/net/ngbe/ngbe_rxtx_vec_sse.c | 3 +- > > drivers/net/octeon_ep/otx_ep_ethdev.c | 2 + > > drivers/net/txgbe/txgbe_rxtx_vec_sse.c | 3 +- > > drivers/net/virtio/virtio_rxtx_simple_sse.c | 3 +- > > lib/distributor/rte_distributor.c | 1 + > > lib/eal/arm/include/rte_memcpy.h | 10 ++++ > > lib/eal/include/generic/rte_memcpy.h | 61 ++++++++++++++++++--- > > lib/eal/loongarch/include/rte_memcpy.h | 53 ++---------------- > > lib/eal/ppc/include/rte_memcpy.h | 10 ++++ > > lib/eal/riscv/include/rte_memcpy.h | 53 ++---------------- > > lib/eal/x86/include/meson.build | 1 + > > lib/eal/x86/include/rte_memcpy.h | 11 +++- > > lib/fib/trie.c | 1 + > > meson_options.txt | 2 + > > 22 files changed, 131 insertions(+), 121 deletions(-) > > >