On 2024-03-02 12:14, Mattias Rönnblom wrote:
On 2024-03-01 18:14, Stephen Hemminger wrote:
The DPDK has a lot of "cargo cult" usage of rte_memcpy.
This patch set replaces cases where rte_memcpy is used with a fixed
size constant size.

Typical example is:
    rte_memcpy(mac_addrs, mac.addr_bytes, RTE_ETHER_ADDR_LEN);
which can be replaced with:
    memcpy(mac_addrs, mac.addr_bytes, RTE_ETHER_ADDR_LEN);

This has two benefits. Gcc (and clang) are smart enough that for
all small fixed size values, they just generate the necessary instructions
to do it inline. It also means that fortify, Coverity, and ASAN
analyzers can check these memcpy's.


Instead of smearing out the knowledge of when to use rte_memcpy(), and when to use memcpy() across the code base, wouldn't it be better to *always* call rte_memcpy() in the fast path, and leave the policy decision to the rte_memcpy() implementation?

In rte_memcpy(), add:
if (__builtin_constant_p(n) && n < RTE_LIBC_MEMCPY_SIZE_THRESHOLD)
     memcpy(/../);
..or something to that effect.

Could you have a #ifdef for dumb static analysis tools? To make it look like you are always using memcpy()?

So faster, better, safer.


What is "faster" based on?


I ran some DSW benchmarks, and if you add

diff --git a/lib/eal/x86/include/rte_memcpy.h b/lib/eal/x86/include/rte_memcpy.h
index 72a92290e0..64cd82d78d 100644
--- a/lib/eal/x86/include/rte_memcpy.h
+++ b/lib/eal/x86/include/rte_memcpy.h
@@ -862,6 +862,11 @@ rte_memcpy_aligned(void *dst, const void *src, size_t n)
 static __rte_always_inline void *
 rte_memcpy(void *dst, const void *src, size_t n)
 {
+       if (__builtin_constant_p(n) && n <= 32) {
+               memcpy(dst, src, n);
+               return dst;
+       }
+
        if (!(((uintptr_t)dst | (uintptr_t)src) & ALIGNMENT_MASK))
                return rte_memcpy_aligned(dst, src, n);
        else

...the overhead increases from roughly 48 core clock cycles/event to 59 cc/event. The same for "n < 128". (I'm not sure what counts as "small" here.)

So something rte_memcpy() does for small and constant memory copies does make things go *significantly* faster, at least in certain cases.

(Linux, GCC 11.4, Intel Gracemont.)

My experience with replacing rte_memcpy() with memcpy() (or vice versa) is mixed.

I've also tried just dropping the DPDK-custom memcpy() implementation altogether, and that caused a performance drop (in a particular app, on a particular compiler and CPU).

The first patch is a simple coccinelle script to do the replacement
and the rest are the results broken out by module.

The coccinelle script can be used again to make sure more bad
usage doesn't creep in with new drivers.

v2 - fix CI failure on some OS by adding string.h
      remove rte_memcpy.h if no longer used

Stephen Hemminger (71):
   cocci/rte_memcpy: add script to eliminate fixed size rte_memcpy
   eal: replace use of fixed size rte_memcpy
   ethdev: replace use of fixed size rte_memcpy
   eventdev: replace use of fixed size rte_memcpy
   cryptodev: replace use of fixed size rte_memcpy
   ip_frag: replace use of fixed size rte_memcpy
   net: replace use of fixed size rte_memcpy
   lpm: replace use of fixed size rte_memcpy
   node: replace use of fixed size rte_memcpy
   pdcp: replace use of fixed size rte_memcpy
   pipeline: replace use of fixed size rte_memcpy
   rib: replace use of fixed size rte_memcpy
   security: replace use of fixed size rte_memcpy
   net/mlx5: replace use of fixed size rte_memcpy
   net/nfp: replace use of fixed size rte_memcpy
   net/ngbe: replace use of fixed size rte_memcpy
   net/null: replace use of fixed size rte_memcpy
   net/pcap: replace use of fixed size rte_memcpy
   net/sfc: replace use of fixed size rte_memcpy
   net/tap: replace use of fixed size rte_memcpy
   net/txgbe: replace use of fixed size rte_memcpy
   raw/ifpga: replace use of fixed size rte_memcpy
   raw/skeleton: replace use of fixed size rte_memcpy
   net/hns3: replace use of fixed size rte_memcpy
   net/i40e: replace use of fixed size rte_memcpy
   net/iavf: replace use of fixed size rte_memcpy
   net/ice: replace use of fixed size rte_memcpy
   net/idpf: replace use of fixed size rte_memcpy
   net/ipn3ke: replace use of fixed size rte_memcpy
   net/ixgbe: replace use of fixed size rte_memcpy
   net/memif: replace use of fixed size rte_memcpy
   net/qede: replace use of fixed size rte_memcpy
   baseband/acc: replace use of fixed size rte_memcpy
   baseband/la12xx: replace use of fixed size rte_memcpy
   common/idpf: replace use of fixed size rte_memcpy
   common/qat: replace use of fixed size rte_memcpy
   compress/qat: replace use of fixed size rte_memcpy
   crypto/ccp: replace use of fixed size rte_memcpy
   crypto/cnxk: replace use of fixed size rte_memcpy
   crypto/dpaa_sec: replace use of fixed size rte_memcpy
   crypto/ipsec_mb: replace use of fixed size rte_memcpy
   crypto/qat: replace use of fixed size rte_memcpy
   crypto/scheduler: replace use of fixed size rte_memcpy
   event/cnxk: replace use of fixed size rte_memcpy
   event/dlb2: replace use of fixed size rte_memcpy
   event/dpaa2: replace use of fixed size rte_memcpy
   event/octeontx: replace use of fixed size rte_memcpy
   mempool/dpaa: replace use of fixed size rte_memcpy
   mempool/dpaa2: replace use of fixed size rte_memcpy
   ml/cnxk: replace use of fixed size rte_memcpy
   net/af_xdp: replace use of fixed size rte_memcpy
   net/avp: replace use of fixed size rte_memcpy
   net/axgbe: replace use of fixed size rte_memcpy
   net/bnx2x: replace use of fixed size rte_memcpy
   net/bnxt: replace use of fixed size rte_memcpy
   net/bonding: replace use of fixed size rte_memcpy
   net/cnxk: replace use of fixed size rte_memcpy
   net/cpfl: replace use of fixed size rte_memcpy
   net/cxgbe: replace use of fixed size rte_memcpy
   net/dpaa2: replace use of fixed size rte_memcpy
   net/e1000: replace use of fixed size rte_memcpy
   net/enic: replace use of fixed size rte_memcpy
   net/failsafe: replace use of fixed size rte_memcpy
   net/gve/base: replace use of fixed size rte_memcpy
   net/hinic: replace use of fixed size rte_memcpy
   net/mvpp2: replace use of fixed size rte_memcpy
   app/test-pmd: replace use of fixed size rte_memcpy
   app/graph: replace use of fixed size rte_memcpy
   app/test-eventdev: replace use of fixed size rte_memcpy
   app/test: replace use of fixed size rte_memcpy
   examples: replace use of fixed size rte_memcpy

  app/graph/neigh.c                             |   8 +-
  app/test-eventdev/test_pipeline_common.c      |  19 ++-
  app/test-pmd/cmdline.c                        |  48 ++++----
  app/test-pmd/cmdline_flow.c                   |  24 ++--
  app/test-pmd/config.c                         |   8 +-
  app/test-pmd/csumonly.c                       |   1 -
  app/test-pmd/flowgen.c                        |   1 -
  app/test-pmd/iofwd.c                          |   1 -
  app/test-pmd/macfwd.c                         |   1 -
  app/test-pmd/macswap.c                        |   1 -
  app/test-pmd/noisy_vnf.c                      |   1 -
  app/test-pmd/rxonly.c                         |   1 -
  app/test-pmd/testpmd.c                        |   1 -
  app/test/commands.c                           |   1 -
  app/test/packet_burst_generator.c             |   4 +-
  app/test/test_crc.c                           |   5 +-
  app/test/test_cryptodev.c                     |  18 ++-
  app/test/test_cryptodev_asym.c                |   1 -
  app/test/test_cryptodev_security_pdcp.c       |   1 -
  app/test/test_efd.c                           |   1 -
  app/test/test_efd_perf.c                      |   1 -
  app/test/test_event_crypto_adapter.c          |  12 +-
  app/test/test_event_dma_adapter.c             |   4 +-
  app/test/test_eventdev.c                      |   1 -
  app/test/test_ipsec.c                         |   6 +-
  app/test/test_link_bonding_mode4.c            |   8 +-
  app/test/test_mbuf.c                          |   1 -
  app/test/test_member.c                        |   1 -
  app/test/test_member_perf.c                   |   1 -
  app/test/test_rawdev.c                        |   1 -
  app/test/test_security_inline_proto.c         |  36 +++---
  app/test/test_service_cores.c                 |   1 -
  app/test/virtual_pmd.c                        |   3 +-
  devtools/cocci/rte_memcpy.cocci               |  11 ++
  drivers/baseband/acc/rte_acc100_pmd.c         |  17 ++-
  drivers/baseband/acc/rte_vrb_pmd.c            |  21 ++--
  drivers/baseband/la12xx/bbdev_la12xx.c        |   4 +-
  drivers/common/idpf/idpf_common_device.c      |   4 +-
  drivers/common/idpf/idpf_common_virtchnl.c    |   8 +-
  drivers/common/qat/qat_qp.c                   |  10 +-
  drivers/compress/qat/qat_comp.c               |   8 +-
  drivers/crypto/ccp/ccp_crypto.c               |  14 +--
  drivers/crypto/cnxk/cnxk_cryptodev_ops.c      |   2 +-
  drivers/crypto/cnxk/cnxk_se.h                 |   2 +-
  drivers/crypto/dpaa_sec/dpaa_sec.c            |   2 +-
  drivers/crypto/ipsec_mb/pmd_snow3g.c          |   4 +-
  drivers/crypto/qat/qat_sym_session.c          |  52 ++++-----
  .../scheduler/rte_cryptodev_scheduler.c       |   6 +-
  drivers/crypto/scheduler/scheduler_failover.c |  12 +-
  drivers/event/cnxk/cnxk_tim_evdev.c           |   4 +-
  drivers/event/dlb2/dlb2.c                     |   6 +-
  drivers/event/dpaa2/dpaa2_eventdev.c          |   6 +-
  drivers/event/octeontx/timvf_evdev.c          |   4 +-
  drivers/mempool/dpaa/dpaa_mempool.c           |   4 +-
  drivers/mempool/dpaa2/dpaa2_hw_mempool.c      |   4 +-
  drivers/ml/cnxk/cn10k_ml_model.c              |   8 +-
  drivers/ml/cnxk/cn10k_ml_ops.c                |  11 +-
  drivers/ml/cnxk/cnxk_ml_ops.c                 |   2 +-
  drivers/ml/cnxk/mvtvm_ml_model.c              |   8 +-
  drivers/ml/cnxk/mvtvm_ml_ops.c                |   8 +-
  drivers/net/af_xdp/rte_eth_af_xdp.c           |   2 +-
  drivers/net/avp/avp_ethdev.c                  |   4 +-
  drivers/net/axgbe/axgbe_ethdev.c              |   4 +-
  drivers/net/bnx2x/bnx2x.c                     |  32 +++--
  drivers/net/bnx2x/bnx2x_stats.c               |  10 +-
  drivers/net/bnx2x/bnx2x_vfpf.c                |  19 +--
  drivers/net/bnxt/bnxt_flow.c                  |  34 +++---
  drivers/net/bonding/rte_eth_bond_8023ad.c     |   4 +-
  drivers/net/bonding/rte_eth_bond_flow.c       |   2 +-
  drivers/net/cnxk/cnxk_ethdev_ops.c            |   2 +-
  drivers/net/cnxk/cnxk_tm.c                    |   5 +-
  drivers/net/cpfl/cpfl_ethdev.c                |   3 +-
  drivers/net/cpfl/cpfl_vchnl.c                 |   4 +-
  drivers/net/cxgbe/clip_tbl.c                  |   2 +-
  drivers/net/cxgbe/cxgbe_filter.c              |   8 +-
  drivers/net/cxgbe/l2t.c                       |   4 +-
  drivers/net/cxgbe/smt.c                       |  20 ++--
  drivers/net/dpaa2/base/dpaa2_hw_dpni.c        |   1 -
  drivers/net/dpaa2/dpaa2_ethdev.c              |   1 -
  drivers/net/dpaa2/dpaa2_recycle.c             |   1 -
  drivers/net/dpaa2/dpaa2_rxtx.c                |   1 -
  drivers/net/dpaa2/dpaa2_sparser.c             |   1 -
  drivers/net/dpaa2/dpaa2_tm.c                  |   2 +-
  drivers/net/e1000/em_rxtx.c                   |   1 -
  drivers/net/e1000/igb_flow.c                  |  22 ++--
  drivers/net/e1000/igb_pf.c                    |   7 +-
  drivers/net/e1000/igb_rxtx.c                  |   1 -
  drivers/net/enic/enic_main.c                  |   8 +-
  drivers/net/failsafe/failsafe_ops.c           |   6 +-
  drivers/net/gve/base/gve_adminq.c             |   2 +-
  drivers/net/hinic/hinic_pmd_flow.c            |  40 +++----
  drivers/net/hns3/hns3_fdir.c                  |   2 +-
  drivers/net/hns3/hns3_flow.c                  |   4 +-
  drivers/net/i40e/i40e_ethdev.c                | 109 ++++++++----------
  drivers/net/i40e/i40e_fdir.c                  |  28 +++--
  drivers/net/i40e/i40e_flow.c                  |  56 +++++----
  drivers/net/i40e/i40e_pf.c                    |   3 +-
  drivers/net/i40e/i40e_tm.c                    |  11 +-
  drivers/net/i40e/rte_pmd_i40e.c               |  34 +++---
  drivers/net/iavf/iavf_fdir.c                  |  93 +++++++--------
  drivers/net/iavf/iavf_fsub.c                  |  50 ++++----
  drivers/net/iavf/iavf_generic_flow.c          |   2 +-
  drivers/net/iavf/iavf_tm.c                    |  11 +-
  drivers/net/iavf/iavf_vchnl.c                 |   9 +-
  drivers/net/ice/ice_dcf.c                     |   5 +-
  drivers/net/ice/ice_dcf_parent.c              |   2 +-
  drivers/net/ice/ice_dcf_sched.c               |  11 +-
  drivers/net/ice/ice_diagnose.c                |   4 +-
  drivers/net/ice/ice_ethdev.c                  |  14 +--
  drivers/net/ice/ice_fdir_filter.c             |  37 +++---
  drivers/net/ice/ice_generic_flow.c            |   2 +-
  drivers/net/ice/ice_hash.c                    |   2 +-
  drivers/net/ice/ice_tm.c                      |  11 +-
  drivers/net/idpf/idpf_ethdev.c                |   7 +-
  drivers/net/idpf/idpf_rxtx.c                  |  10 +-
  drivers/net/ipn3ke/ipn3ke_flow.c              |  32 +++--
  drivers/net/ipn3ke/ipn3ke_representor.c       |  16 +--
  drivers/net/ipn3ke/ipn3ke_tm.c                |   6 +-
  drivers/net/ixgbe/ixgbe_ethdev.c              |   9 +-
  drivers/net/ixgbe/ixgbe_fdir.c                |   7 +-
  drivers/net/ixgbe/ixgbe_flow.c                |  65 +++++------
  drivers/net/ixgbe/ixgbe_ipsec.c               |   8 +-
  drivers/net/ixgbe/ixgbe_pf.c                  |   5 +-
  drivers/net/ixgbe/ixgbe_tm.c                  |  11 +-
  drivers/net/ixgbe/rte_pmd_ixgbe.c             |   4 +-
  drivers/net/memif/memif_socket.c              |   4 +-
  drivers/net/mlx5/mlx5_devx.c                  |   4 +-
  drivers/net/mlx5/mlx5_flow.c                  |  38 +++---
  drivers/net/mlx5/mlx5_flow_aso.c              |   6 +-
  drivers/net/mlx5/mlx5_flow_hw.c               |  16 +--
  drivers/net/mlx5/mlx5_rx.c                    |   6 +-
  drivers/net/mlx5/mlx5_rxtx_vec.c              |   8 +-
  drivers/net/mvpp2/mrvl_tm.c                   |   2 +-
  drivers/net/nfp/flower/nfp_conntrack.c        |   2 +-
  drivers/net/nfp/flower/nfp_flower_flow.c      |  16 +--
  .../net/nfp/flower/nfp_flower_representor.c   |   2 +-
  drivers/net/nfp/nfp_mtr.c                     |  10 +-
  drivers/net/ngbe/ngbe_pf.c                    |   4 +-
  drivers/net/null/rte_eth_null.c               |   6 +-
  drivers/net/pcap/pcap_ethdev.c                |   2 +-
  drivers/net/pcap/pcap_osdep_freebsd.c         |   2 +-
  drivers/net/pcap/pcap_osdep_linux.c           |   2 +-
  drivers/net/qede/qede_main.c                  |   2 +-
  drivers/net/sfc/sfc.c                         |   2 +-
  drivers/net/sfc/sfc_ef10_tx.c                 |   2 +-
  drivers/net/sfc/sfc_ethdev.c                  |  11 +-
  drivers/net/sfc/sfc_flow.c                    |  20 ++--
  drivers/net/sfc/sfc_flow_rss.c                |   2 +-
  drivers/net/sfc/sfc_mae.c                     |   2 +-
  drivers/net/sfc/sfc_rx.c                      |   2 +-
  drivers/net/sfc/sfc_tso.c                     |   2 +-
  drivers/net/sfc/sfc_tso.h                     |   9 +-
  drivers/net/tap/rte_eth_tap.c                 |  14 +--
  drivers/net/txgbe/txgbe_ethdev.c              |   9 +-
  drivers/net/txgbe/txgbe_fdir.c                |   6 +-
  drivers/net/txgbe/txgbe_flow.c                |  65 +++++------
  drivers/net/txgbe/txgbe_ipsec.c               |   8 +-
  drivers/net/txgbe/txgbe_pf.c                  |   4 +-
  drivers/net/txgbe/txgbe_tm.c                  |  11 +-
  drivers/raw/ifpga/afu_pmd_he_hssi.c           |   3 +-
  drivers/raw/ifpga/afu_pmd_he_lpbk.c           |   3 +-
  drivers/raw/ifpga/afu_pmd_he_mem.c            |   3 +-
  drivers/raw/ifpga/afu_pmd_n3000.c             |   8 +-
  drivers/raw/ifpga/ifpga_rawdev.c              |  11 +-
  drivers/raw/skeleton/skeleton_rawdev.c        |   7 +-
  examples/bbdev_app/main.c                     |   2 +-
  examples/l2fwd-cat/cat.c                      |   4 +-
  examples/ptpclient/ptpclient.c                |  11 +-
  examples/vhost/main.c                         |   5 +-
  examples/vmdq/main.c                          |   6 +-
  examples/vmdq_dcb/main.c                      |  15 +--
  lib/cryptodev/rte_cryptodev.c                 |   2 +-
  lib/eal/common/eal_common_options.c           |   7 +-
  lib/ethdev/rte_ethdev.c                       |   3 +-
  lib/ethdev/rte_flow.c                         |   5 +-
  lib/eventdev/rte_event_crypto_adapter.c       |   2 +-
  lib/eventdev/rte_event_dma_adapter.c          |   4 +-
  lib/eventdev/rte_event_timer_adapter.c        |   2 +-
  lib/fib/trie.c                                |   2 +-
  lib/ip_frag/rte_ipv6_fragmentation.c          |   4 +-
  lib/ip_frag/rte_ipv6_reassembly.c             |   6 +-
  lib/lpm/rte_lpm6.c                            |   3 +-
  lib/net/rte_ether.c                           |   2 +-
  lib/node/ip6_lookup.c                         |   8 +-
  lib/pdcp/pdcp_process.c                       |  36 +++---
  lib/pipeline/rte_table_action.c               |   8 +-
  lib/rib/rte_rib6.h                            |   5 +-
  lib/security/rte_security.c                   |   4 +-
  188 files changed, 881 insertions(+), 998 deletions(-)
  create mode 100644 devtools/cocci/rte_memcpy.cocci

Reply via email to