Re: [DPDK/other Bug 1562] dumpcap captures all available network interfaces when specifying any PCI network interface
Oh, Thank you, Can you point me to those patches? we require it on top of 22.11.1, I would like to get those patches back ported on top 22.11.1 On Wed, Feb 12, 2025 at 8:32 PM Stephen Hemminger < step...@networkplumber.org> wrote: > On Wed, 12 Feb 2025 13:29:57 +0530 > Navin Srinivas wrote: > > > Hi, > > > > Is this backported to 22.11.1? > > > > Thanks, > > Navn Srinivas > > No, it required a couple of other patches related to management of network > interface list. >
Re: [PATCH v4 00/11] remove component-specific logic for AVX builds
Hello Bruce, On Wed, Mar 19, 2025 at 7:09 PM Bruce Richardson wrote: > > On Wed, Mar 19, 2025 at 05:29:30PM +, Bruce Richardson wrote: > > A number of libs and drivers had special optimized AVX2 and AVX512 code > > paths for performance reasons, and these tended to have copy-pasted > > logic to build those files. Centralise that logic in the main > > drivers/ and lib/ meson.build files to avoid duplication. > > > > v4: rebase on latest main branch > > minor fixes following feedback > > limit use of -march=skylake-avx512 to when we don't already have a > > -march flag supporting AVX512. > > v3: add patch for event/dlb2 AVX512 handling. > > add common code for libraries as well as drivers. > > v2: add patch 4 to remove use of unnecessary CC_AVX2_SUPPORT flag > > > A related follow-up to this patchset. Checking with "godbolt.org", it > appears that both clang 3.6[1] and gcc 5[2] (the minimum called out compiler > versions in our docs[1]) support the set of AVX-512 compiler flags we use. > Therefore, it seems we can simplify our code further by removing the > "cc_has_avx512" variable. What about https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90028 ? You'll need to send a new revision for this series in any case, since patch 9 broke the crc stuff in the net library. https://inbox.dpdk.org/dev/CAJFAV8w9wYPN+30Hv=batMvP=0m4momkzgmndfixxbd-9u8...@mail.gmail.com/ -- David Marchand
RE: [PATCH] mempool: micro optimizations
PING for review. @Bruce, you seemed to acknowledge this, but never sent a formal Ack. Med venlig hilsen / Kind regards, -Morten Brørup > -Original Message- > From: Morten Brørup [mailto:m...@smartsharesystems.com] > Sent: Wednesday, 26 February 2025 16.59 > To: Andrew Rybchenko; dev@dpdk.org > Cc: Morten Brørup > Subject: [PATCH] mempool: micro optimizations > > The comparisons lcore_id < RTE_MAX_LCORE and lcore_id != LCORE_ID_ANY > are > equivalent, but the latter compiles to fewer bytes of code space. > Similarly for lcore_id >= RTE_MAX_LCORE and lcore_id == LCORE_ID_ANY. > > The rte_mempool_get_ops() function is also used in the fast path, so > RTE_VERIFY() was replaced by RTE_ASSERT(). > > Compilers implicitly consider comparisons of variable == 0 likely, so > unlikely() was added to the check for no mempool cache (mp->cache_size > == > 0) in the rte_mempool_default_cache() function. > > The rte_mempool_do_generic_put() function for adding objects to a > mempool > was refactored as follows: > - The comparison for the request itself being too big, which is > considered > unlikely, was moved down and out of the code path where the cache has > sufficient room for the added objects, which is considered the most > likely code path. > - Added __rte_assume() about the cache length, size and threshold, for > compiler optimization when "n" is compile time constant. > - Added __rte_assume() about "ret" being zero, so other functions using > the value returned by this function can be potentially optimized by > the > compiler; especially when it merges multiple sequential code paths of > inlined code depending on the return value being either zero or > negative. > - The refactored source code (with comments) made the separate comment > describing the cache flush/add algorithm superfluous, so it was > removed. > > A few more likely()/unlikely() were added. > > A few comments were improved for readability. > > Some assertions, RTE_ASSERT(), were added. Most importantly to assert > that > the return values of the mempool drivers' enqueue and dequeue > operations > are API compliant, i.e. 0 (for success) or negative (for failure), and > never positive. > > Signed-off-by: Morten Brørup > --- > lib/mempool/rte_mempool.h | 67 ++- > 1 file changed, 38 insertions(+), 29 deletions(-) > > diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h > index c495cc012f..aedc100964 100644 > --- a/lib/mempool/rte_mempool.h > +++ b/lib/mempool/rte_mempool.h > @@ -334,7 +334,7 @@ struct __rte_cache_aligned rte_mempool { > #ifdef RTE_LIBRTE_MEMPOOL_STATS > #define RTE_MEMPOOL_STAT_ADD(mp, name, n) do { > \ > unsigned int __lcore_id = rte_lcore_id(); > \ > - if (likely(__lcore_id < RTE_MAX_LCORE)) > \ > + if (likely(__lcore_id != LCORE_ID_ANY)) > \ > (mp)->stats[__lcore_id].name += (n); > \ > else > \ > rte_atomic_fetch_add_explicit(&((mp)- > >stats[RTE_MAX_LCORE].name), \ > @@ -751,7 +751,7 @@ extern struct rte_mempool_ops_table > rte_mempool_ops_table; > static inline struct rte_mempool_ops * > rte_mempool_get_ops(int ops_index) > { > - RTE_VERIFY((ops_index >= 0) && (ops_index < > RTE_MEMPOOL_MAX_OPS_IDX)); > + RTE_ASSERT((ops_index >= 0) && (ops_index < > RTE_MEMPOOL_MAX_OPS_IDX)); > > return &rte_mempool_ops_table.ops[ops_index]; > } > @@ -791,7 +791,8 @@ rte_mempool_ops_dequeue_bulk(struct rte_mempool > *mp, > rte_mempool_trace_ops_dequeue_bulk(mp, obj_table, n); > ops = rte_mempool_get_ops(mp->ops_index); > ret = ops->dequeue(mp, obj_table, n); > - if (ret == 0) { > + RTE_ASSERT(ret <= 0); > + if (likely(ret == 0)) { > RTE_MEMPOOL_STAT_ADD(mp, get_common_pool_bulk, 1); > RTE_MEMPOOL_STAT_ADD(mp, get_common_pool_objs, n); > } > @@ -816,11 +817,14 @@ rte_mempool_ops_dequeue_contig_blocks(struct > rte_mempool *mp, > void **first_obj_table, unsigned int n) > { > struct rte_mempool_ops *ops; > + int ret; > > ops = rte_mempool_get_ops(mp->ops_index); > RTE_ASSERT(ops->dequeue_contig_blocks != NULL); > rte_mempool_trace_ops_dequeue_contig_blocks(mp, first_obj_table, > n); > - return ops->dequeue_contig_blocks(mp, first_obj_table, n); > + ret = ops->dequeue_contig_blocks(mp, first_obj_table, n); > + RTE_ASSERT(ret <= 0); > + return ret; > } > > /** > @@ -848,6 +852,7 @@ rte_mempool_ops_enqueue_bulk(struct rte_mempool > *mp, void * const *obj_table, > rte_mempool_trace_ops_enqueue_bulk(mp, obj_table, n); > ops = rte_mempool_get_ops(mp->ops_index); > ret = ops->enqueue(mp, obj_table, n); > + RTE_ASSERT(ret <= 0); > #ifdef RTE_LIBRTE_MEMPOOL_DEBUG > if (unlikely(ret < 0)) > RTE_MEMPOOL_LOG(CRIT, "cannot enqueue %u objects to mempool > %s", > @@ -1333,10 +1338,10 @@ rte_mem
[PATCH v5 11/11] member: use common AVX512 build support
Use the support for building AVX512 code present in lib/meson.build rather than reimplementing it in the library meson.build file. Signed-off-by: Bruce Richardson --- lib/member/meson.build | 46 +++--- 1 file changed, 7 insertions(+), 39 deletions(-) diff --git a/lib/member/meson.build b/lib/member/meson.build index 4341b424df..07f9afaed9 100644 --- a/lib/member/meson.build +++ b/lib/member/meson.build @@ -20,44 +20,12 @@ sources = files( deps += ['hash', 'ring'] -# compile AVX512 version if: -if dpdk_conf.has('RTE_ARCH_X86_64') and binutils_ok -# compile AVX512 version if either: -# a. we have AVX512 supported in minimum instruction set -#baseline -# b. it's not minimum instruction set, but supported by -#compiler -# -# in former case, just add avx512 C file to files list -# in latter case, compile c file to static lib, using correct -# compiler flags, and then have the .o file from static lib -# linked into main lib. - -member_avx512_args = cc_avx512_flags -if not is_ms_compiler -member_avx512_args += '-mavx512ifma' -endif - -# check if all required flags already enabled -sketch_avx512_flags = ['__AVX512F__', '__AVX512DQ__', '__AVX512IFMA__'] - -sketch_avx512_on = true -foreach f:sketch_avx512_flags -if cc.get_define(f, args: machine_args) == '' -sketch_avx512_on = false -endif -endforeach - -if sketch_avx512_on == true -cflags += ['-DCC_AVX512_SUPPORT'] -sources += files('rte_member_sketch_avx512.c') -elif cc.has_multi_arguments(member_avx512_args) -sketch_avx512_tmp = static_library('sketch_avx512_tmp', -'rte_member_sketch_avx512.c', -include_directories: includes, -dependencies: [static_rte_eal, static_rte_hash], -c_args: cflags + member_avx512_args) -objs += sketch_avx512_tmp.extract_objects('rte_member_sketch_avx512.c') -cflags += ['-DCC_AVX512_SUPPORT'] +# compile AVX512 version if we have avx512 on MSVC or the 'ifma' flag on GCC/Clang +if dpdk_conf.has('RTE_ARCH_X86_64') +if is_ms_compiler +sources_avx512 += files('rte_member_sketch_avx512.c') +elif cc.has_argument('-mavx512ifma') +sources_avx512 += files('rte_member_sketch_avx512.c') +cflags_avx512 += '-mavx512ifma' endif endif -- 2.45.2
Re: [PATCH] drivers/net/mlx5: fix mlx5 send packet failed
On Tue, 25 Mar 2025 18:39:00 +0800 Wenbo Liu wrote: > Test Environment: ARM architecture, OpenEuler operating system > CPU: HUAWEI Kunpeng 920 5220, BIOS Vendor ID: HiSilicon > Network Card: Mellanox Technologies MT27800 Family [ConnectX-5] > DPDK program sending self-encapsulated packets with MAC, IP, and UDP headers > continuously prints the following errors and ceases packet transmission > > mlx5_common: Failed to modify SQ using DevX > mlx5_net: Cannot change the Tx SQ state to RESET Remote I/O error > > Signed-off-by: Wenbo Liu Patch has compile failures in CI and coding indent issue reported by checkpatch.
[PATCH v5 00/11] remove component-specific logic for AVX builds
A number of libs and drivers had special optimized AVX2 and AVX512 code paths for performance reasons, and these tended to have copy-pasted logic to build those files. Centralise that logic in the main drivers/ and lib/ meson.build files to avoid duplication. v5: fix RTE_ARCH_X86 macro, which broke crc library v4: rebase on latest main branch minor fixes following feedback limit use of -march=skylake-avx512 to when we don't already have a -march flag supporting AVX512. v3: add patch for event/dlb2 AVX512 handling. add common code for libraries as well as drivers. v2: add patch 4 to remove use of unnecessary CC_AVX2_SUPPORT flag Bruce Richardson (11): drivers: add generalized AVX build handling net/intel: use common AVX build code drivers/net: build use common AVX handling drivers/net: remove AVX2 build-time define event/dlb2: build using common AVX handling lib: add generalized AVX build handling acl: use common AVX build handling fib: use common AVX build handling net: simplify build-time logic for x86 net: use common AVX512 build code member: use common AVX512 build support drivers/event/dlb2/dlb2_sse.c | 4 ++ drivers/event/dlb2/meson.build| 16 +--- drivers/meson.build | 30 ++ drivers/net/bnxt/bnxt_ethdev.c| 2 - drivers/net/bnxt/meson.build | 10 + drivers/net/enic/meson.build | 10 + drivers/net/intel/i40e/meson.build| 26 +--- drivers/net/intel/iavf/meson.build| 25 +--- drivers/net/intel/ice/meson.build | 25 +--- drivers/net/intel/idpf/meson.build| 25 +--- drivers/net/nfp/meson.build | 10 + drivers/net/octeon_ep/meson.build | 13 +- drivers/net/octeon_ep/otx_ep_ethdev.c | 4 -- drivers/net/virtio/meson.build| 9 + lib/acl/meson.build | 54 ++--- lib/fib/dir24_8.c | 6 +-- lib/fib/meson.build | 18 + lib/fib/trie.c| 6 +-- lib/member/meson.build| 46 - lib/meson.build | 34 +++- lib/net/meson.build | 58 +++ lib/net/rte_net_crc.c | 16 22 files changed, 114 insertions(+), 333 deletions(-) -- 2.45.2
[PATCH v5 01/11] drivers: add generalized AVX build handling
Add support to the top-level driver build file for AVX2 and AVX512 specific sources. This should simplify driver builds by avoiding the need to constantly reimplement the same build logic Signed-off-by: Bruce Richardson --- drivers/meson.build | 30 ++ 1 file changed, 30 insertions(+) diff --git a/drivers/meson.build b/drivers/meson.build index 05391a575d..c15319dc24 100644 --- a/drivers/meson.build +++ b/drivers/meson.build @@ -126,6 +126,8 @@ foreach subpath:subdirs name = drv annotate_locks = true sources = [] +sources_avx2 = [] +sources_avx512 = [] headers = [] driver_sdk_headers = [] # public headers included by drivers objs = [] @@ -235,6 +237,34 @@ foreach subpath:subdirs dpdk_includes += include_directories(drv_path) endif +# handle avx2 and avx512 source files +if arch_subdir == 'x86' +if sources_avx2.length() > 0 +avx2_lib = static_library(lib_name + '_avx2_lib', +sources_avx2, +dependencies: static_deps, +include_directories: includes, +c_args: [cflags, cc_avx2_flags]) +objs += avx2_lib.extract_objects(sources_avx2) +endif +if sources_avx512.length() > 0 and cc_has_avx512 +cflags += '-DCC_AVX512_SUPPORT' +avx512_args = [cflags, cc_avx512_flags] +if not target_has_avx512 and cc.has_argument('-march=skylake-avx512') +avx512_args += '-march=skylake-avx512' +if cc.has_argument('-Wno-overriding-option') +avx512_args += '-Wno-overriding-option' +endif +endif +avx512_lib = static_library(lib_name + '_avx512_lib', +sources_avx512, +dependencies: static_deps, +include_directories: includes, +c_args: avx512_args) +objs += avx512_lib.extract_objects(sources_avx512) +endif +endif + # generate pmdinfo sources by building a temporary # lib and then running pmdinfogen on the contents of # that lib. The final lib reuses the object files and -- 2.45.2
[PATCH v5 02/11] net/intel: use common AVX build code
Remove driver-specific build instructions for the AVX2 and AVX-512 code, and rely instead on the generic driver build file. Signed-off-by: Bruce Richardson --- drivers/net/intel/i40e/meson.build | 26 ++ drivers/net/intel/iavf/meson.build | 25 ++--- drivers/net/intel/ice/meson.build | 25 ++--- drivers/net/intel/idpf/meson.build | 25 ++--- 4 files changed, 8 insertions(+), 93 deletions(-) diff --git a/drivers/net/intel/i40e/meson.build b/drivers/net/intel/i40e/meson.build index 15993393fb..dae61222cf 100644 --- a/drivers/net/intel/i40e/meson.build +++ b/drivers/net/intel/i40e/meson.build @@ -40,31 +40,9 @@ includes += include_directories('base') if arch_subdir == 'x86' sources += files('i40e_rxtx_vec_sse.c') +sources_avx2 += files('i40e_rxtx_vec_avx2.c') +sources_avx512 += files('i40e_rxtx_vec_avx512.c') -i40e_avx2_lib = static_library('i40e_avx2_lib', -'i40e_rxtx_vec_avx2.c', -dependencies: [static_rte_ethdev, static_rte_kvargs, static_rte_hash], -include_directories: includes, -c_args: [cflags, cc_avx2_flags]) -objs += i40e_avx2_lib.extract_objects('i40e_rxtx_vec_avx2.c') - -if cc_has_avx512 -cflags += ['-DCC_AVX512_SUPPORT'] -avx512_args = cflags + cc_avx512_flags -if cc.has_argument('-march=skylake-avx512') -avx512_args += '-march=skylake-avx512' -if cc.has_argument('-Wno-overriding-option') -avx512_args += '-Wno-overriding-option' -endif -endif -i40e_avx512_lib = static_library('i40e_avx512_lib', -'i40e_rxtx_vec_avx512.c', -dependencies: [static_rte_ethdev, -static_rte_kvargs, static_rte_hash], -include_directories: includes, -c_args: avx512_args) -objs += i40e_avx512_lib.extract_objects('i40e_rxtx_vec_avx512.c') -endif elif arch_subdir == 'ppc' sources += files('i40e_rxtx_vec_altivec.c') elif arch_subdir == 'arm' diff --git a/drivers/net/intel/iavf/meson.build b/drivers/net/intel/iavf/meson.build index 833a63e6c8..1ca500c43c 100644 --- a/drivers/net/intel/iavf/meson.build +++ b/drivers/net/intel/iavf/meson.build @@ -28,30 +28,9 @@ includes += include_directories('base') if arch_subdir == 'x86' sources += files('iavf_rxtx_vec_sse.c') +sources_avx2 += files('iavf_rxtx_vec_avx2.c') +sources_avx512 += files('iavf_rxtx_vec_avx512.c') -iavf_avx2_lib = static_library('iavf_avx2_lib', -'iavf_rxtx_vec_avx2.c', -dependencies: [static_rte_ethdev], -include_directories: includes, -c_args: [cflags, cc_avx2_flags]) -objs += iavf_avx2_lib.extract_objects('iavf_rxtx_vec_avx2.c') - -if cc_has_avx512 -cflags += ['-DCC_AVX512_SUPPORT'] -avx512_args = cflags + cc_avx512_flags -if cc.has_argument('-march=skylake-avx512') -avx512_args += '-march=skylake-avx512' -if cc.has_argument('-Wno-overriding-option') -avx512_args += '-Wno-overriding-option' -endif -endif -iavf_avx512_lib = static_library('iavf_avx512_lib', -'iavf_rxtx_vec_avx512.c', -dependencies: [static_rte_ethdev], -include_directories: includes, -c_args: avx512_args) -objs += iavf_avx512_lib.extract_objects('iavf_rxtx_vec_avx512.c') -endif elif arch_subdir == 'arm' sources += files('iavf_rxtx_vec_neon.c') endif diff --git a/drivers/net/intel/ice/meson.build b/drivers/net/intel/ice/meson.build index 4d8f71cd4a..fa6c505450 100644 --- a/drivers/net/intel/ice/meson.build +++ b/drivers/net/intel/ice/meson.build @@ -34,30 +34,9 @@ endif if arch_subdir == 'x86' sources += files('ice_rxtx_vec_sse.c') +sources_avx2 += files('ice_rxtx_vec_avx2.c') +sources_avx512 += files('ice_rxtx_vec_avx512.c') -ice_avx2_lib = static_library('ice_avx2_lib', -'ice_rxtx_vec_avx2.c', -dependencies: [static_rte_ethdev, static_rte_hash], -include_directories: includes, -c_args: [cflags, cc_avx2_flags]) -objs += ice_avx2_lib.extract_objects('ice_rxtx_vec_avx2.c') - -if cc_has_avx512 -cflags += ['-DCC_AVX512_SUPPORT'] -avx512_args = cflags + cc_avx512_flags -if cc.has_argument('-march=skylake-avx512') -avx512_args += '-march=skylake-avx512' -if cc.has_argument('-Wno-overriding-option') -avx512_args += '-Wno-overriding-option' -endif -endif -ice_avx512_lib = static_library('ice_avx512_lib', -'ice_rxtx_vec_avx512.c', -dependencies: [static_rte_ethdev, static_rte_hash], -include_directories: includes, -c_args: avx512_args) -objs += ice_avx512_lib.
[PATCH v5 03/11] drivers/net: build use common AVX handling
Remove from remaining net drivers the special-case code to handle AVX2 or AVX512 specific files. These can be built instead using drivers/meson.build. Signed-off-by: Bruce Richardson --- drivers/net/bnxt/meson.build | 10 +- drivers/net/enic/meson.build | 10 +- drivers/net/nfp/meson.build | 10 +- drivers/net/octeon_ep/meson.build | 14 ++ drivers/net/virtio/meson.build| 9 + 5 files changed, 6 insertions(+), 47 deletions(-) diff --git a/drivers/net/bnxt/meson.build b/drivers/net/bnxt/meson.build index fd82d0c409..dcca7df916 100644 --- a/drivers/net/bnxt/meson.build +++ b/drivers/net/bnxt/meson.build @@ -58,15 +58,7 @@ subdir('hcapi/cfa_v3') if arch_subdir == 'x86' sources += files('bnxt_rxtx_vec_sse.c') -# build AVX2 code with instruction set explicitly enabled for runtime selection -bnxt_avx2_lib = static_library('bnxt_avx2_lib', -'bnxt_rxtx_vec_avx2.c', -dependencies: [static_rte_ethdev, -static_rte_bus_pci, -static_rte_kvargs, static_rte_hash], -include_directories: includes, -c_args: [cflags, cc_avx2_flags]) - objs += bnxt_avx2_lib.extract_objects('bnxt_rxtx_vec_avx2.c') +sources_avx2 = files('bnxt_rxtx_vec_avx2.c') elif arch_subdir == 'arm' and dpdk_conf.get('RTE_ARCH_64') sources += files('bnxt_rxtx_vec_neon.c') endif diff --git a/drivers/net/enic/meson.build b/drivers/net/enic/meson.build index cfe5ec170a..2b3052fae8 100644 --- a/drivers/net/enic/meson.build +++ b/drivers/net/enic/meson.build @@ -29,17 +29,9 @@ sources = files( deps += ['hash'] includes += include_directories('base') -# Build the avx2 handler for 64-bit X86 targets, even though 'machine' -# may not. This is to support users who build for the min supported machine -# and need to run the binary on newer CPUs too. if dpdk_conf.has('RTE_ARCH_X86_64') cflags += '-DENIC_RXTX_VEC' -enic_avx2_lib = static_library('enic_avx2_lib', -'enic_rxtx_vec_avx2.c', -dependencies: [static_rte_ethdev, static_rte_bus_pci], -include_directories: includes, -c_args: [cflags, cc_avx2_flags]) -objs += enic_avx2_lib.extract_objects('enic_rxtx_vec_avx2.c') +sources_avx2 = files('enic_rxtx_vec_avx2.c') endif annotate_locks = false diff --git a/drivers/net/nfp/meson.build b/drivers/net/nfp/meson.build index 0a12b7dce7..a98b584042 100644 --- a/drivers/net/nfp/meson.build +++ b/drivers/net/nfp/meson.build @@ -52,19 +52,11 @@ cflags += no_wvla_cflag if arch_subdir == 'x86' includes += include_directories('../../common/nfp') -avx2_sources = files( +sources_avx2 = files( 'nfdk/nfp_nfdk_vec_avx2_dp.c', 'nfp_rxtx_vec_avx2.c', ) -nfp_avx2_lib = static_library('nfp_avx2_lib', -avx2_sources, -dependencies: [static_rte_ethdev, static_rte_bus_pci], -include_directories: includes, -c_args: [cflags, cc_avx2_flags] -) - -objs += nfp_avx2_lib.extract_all_objects(recursive: true) else sources += files( 'nfp_rxtx_vec_stub.c', diff --git a/drivers/net/octeon_ep/meson.build b/drivers/net/octeon_ep/meson.build index 1b34db3edc..9bf4627894 100644 --- a/drivers/net/octeon_ep/meson.build +++ b/drivers/net/octeon_ep/meson.build @@ -15,18 +15,8 @@ sources = files( if arch_subdir == 'x86' sources += files('cnxk_ep_rx_sse.c') -if cc.get_define('__AVX2__', args: machine_args) != '' -cflags += ['-DCC_AVX2_SUPPORT'] -sources += files('cnxk_ep_rx_avx.c') -elif cc.has_multi_arguments(cc_avx2_flags) -cflags += ['-DCC_AVX2_SUPPORT'] -otx_ep_avx2_lib = static_library('otx_ep_avx2_lib', -'cnxk_ep_rx_avx.c', -dependencies: [static_rte_ethdev, static_rte_pci, static_rte_bus_pci], -include_directories: includes, -c_args: [cflags, cc_avx2_flags]) -objs += otx_ep_avx2_lib.extract_objects('cnxk_ep_rx_avx.c') -endif +cflags += ['-DCC_AVX2_SUPPORT'] +sources_avx2 = files('cnxk_ep_rx_avx.c') endif if arch_subdir == 'arm' diff --git a/drivers/net/virtio/meson.build b/drivers/net/virtio/meson.build index c1c4a85bea..01bfb3c47d 100644 --- a/drivers/net/virtio/meson.build +++ b/drivers/net/virtio/meson.build @@ -27,15 +27,8 @@ cflags += no_wvla_cflag if arch_subdir == 'x86' if cc_has_avx512 -cflags += ['-DCC_AVX512_SUPPORT'] cflags += ['-DVIRTIO_RXTX_PACKED_VEC'] -virtio_avx512_lib = static_library('virtio_avx512_lib', -'virtio_rxtx_packed.c', -dependencies: [static_rte_ethdev, -static_rte_kvargs, static_rte_bus_pci], -include_directories: includes, -c_args: cflags + cc_avx512_flags) -o
[PATCH v5 07/11] acl: use common AVX build handling
remove custom logic for building AVX2 and AVX-512 files. Signed-off-by: Bruce Richardson --- lib/acl/meson.build | 54 - 1 file changed, 4 insertions(+), 50 deletions(-) diff --git a/lib/acl/meson.build b/lib/acl/meson.build index a80c172812..87e9f25f8e 100644 --- a/lib/acl/meson.build +++ b/lib/acl/meson.build @@ -15,57 +15,11 @@ headers = files('rte_acl.h', 'rte_acl_osdep.h') if dpdk_conf.has('RTE_ARCH_X86') sources += files('acl_run_sse.c') - -avx2_tmplib = static_library('avx2_tmp', -'acl_run_avx2.c', -dependencies: static_rte_eal, -c_args: [cflags, cc_avx2_flags]) -objs += avx2_tmplib.extract_objects('acl_run_avx2.c') - -# compile AVX512 version if: -# we are building 64-bit binary AND binutils can generate proper code - -if dpdk_conf.has('RTE_ARCH_X86_64') and binutils_ok - -# compile AVX512 version if either: -# a. we have AVX512 supported in minimum instruction set -#baseline -# b. it's not minimum instruction set, but supported by -#compiler -# -# in former case, just add avx512 C file to files list -# in latter case, compile c file to static lib, using correct -# compiler flags, and then have the .o file from static lib -# linked into main lib. - -# check if all required flags already enabled (variant a). -acl_avx512_flags = ['__AVX512F__', '__AVX512VL__', -'__AVX512CD__', '__AVX512BW__'] - -acl_avx512_on = true -foreach f:acl_avx512_flags - -if cc.get_define(f, args: machine_args) == '' -acl_avx512_on = false -endif -endforeach - -if acl_avx512_on == true - -sources += files('acl_run_avx512.c') -cflags += '-DCC_AVX512_SUPPORT' - -elif cc_has_avx512 -avx512_tmplib = static_library('avx512_tmp', -'acl_run_avx512.c', -dependencies: static_rte_eal, -c_args: cflags + cc_avx512_flags) -objs += avx512_tmplib.extract_objects( -'acl_run_avx512.c') -cflags += '-DCC_AVX512_SUPPORT' -endif +sources_avx2 += files('acl_run_avx2.c') +# AVX512 is only supported on 64-bit builds +if dpdk_conf.has('RTE_ARCH_X86_64') +sources_avx512 += files('acl_run_avx512.c') endif - elif dpdk_conf.has('RTE_ARCH_ARM') cflags += '-flax-vector-conversions' sources += files('acl_run_neon.c') -- 2.45.2
[PATCH v5 06/11] lib: add generalized AVX build handling
Add support to the top-level lib build file for AVX2 and AVX512 specific sources. This should simplify library builds by avoiding the need to constantly reimplement the same build logic Signed-off-by: Bruce Richardson --- lib/meson.build | 34 +- 1 file changed, 33 insertions(+), 1 deletion(-) diff --git a/lib/meson.build b/lib/meson.build index ce92cb5537..e2605e7d68 100644 --- a/lib/meson.build +++ b/lib/meson.build @@ -122,6 +122,9 @@ foreach l:libraries use_function_versioning = false annotate_locks = true sources = [] +sources_avx2 = [] +sources_avx512 = [] +cflags_avx512 = [] # extra cflags for the avx512 code, e.g. extra avx512 feature flags headers = [] indirect_headers = [] # public headers not directly included by apps driver_sdk_headers = [] # public headers included by drivers @@ -242,7 +245,36 @@ foreach l:libraries cflags += '-Wthread-safety' endif -# first build static lib +# handle avx2 and avx512 source files +if arch_subdir == 'x86' +if sources_avx2.length() > 0 +avx2_lib = static_library(libname + '_avx2_lib', +sources_avx2, +dependencies: static_deps, +include_directories: includes, +c_args: [cflags, cc_avx2_flags]) +objs += avx2_lib.extract_objects(sources_avx2) +endif +if sources_avx512.length() > 0 and cc_has_avx512 +cflags += '-DCC_AVX512_SUPPORT' +avx512_args = [cflags, cflags_avx512, cc_avx512_flags] +if not target_has_avx512 and cc.has_argument('-march=skylake-avx512') +avx512_args += '-march=skylake-avx512' +if cc.has_argument('-Wno-overriding-option') +avx512_args += '-Wno-overriding-option' +endif +endif +avx512_lib = static_library(libname + '_avx512_lib', +sources_avx512, +dependencies: static_deps, +include_directories: includes, +c_args: avx512_args) +objs += avx512_lib.extract_objects(sources_avx512) +endif +endif + + +# build static lib static_lib = static_library(libname, sources, objects: objs, -- 2.45.2
[PATCH v5 04/11] drivers/net: remove AVX2 build-time define
Since all supported compilers can generate AVX2 code, we will always enable the build of the AVX2 files on x86. This means that CC_AVX2_SUPPORT is always true on x86, so it can be removed and a regular "#ifdef RTE_ARCH_x86" used in its place. Signed-off-by: Bruce Richardson Acked-by: Ajit Khaparde --- drivers/net/bnxt/bnxt_ethdev.c| 2 -- drivers/net/octeon_ep/meson.build | 1 - drivers/net/octeon_ep/otx_ep_ethdev.c | 4 3 files changed, 7 deletions(-) diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c index a0e3cd8bbe..2f37f5aa10 100644 --- a/drivers/net/bnxt/bnxt_ethdev.c +++ b/drivers/net/bnxt/bnxt_ethdev.c @@ -3258,8 +3258,6 @@ static const struct { #if defined(RTE_ARCH_X86) {bnxt_crx_pkts_vec, "Vector SSE"}, {bnxt_recv_pkts_vec,"Vector SSE"}, -#endif -#if defined(RTE_ARCH_X86) && defined(CC_AVX2_SUPPORT) {bnxt_crx_pkts_vec_avx2,"Vector AVX2"}, {bnxt_recv_pkts_vec_avx2, "Vector AVX2"}, #endif diff --git a/drivers/net/octeon_ep/meson.build b/drivers/net/octeon_ep/meson.build index 9bf4627894..a4a7663d1d 100644 --- a/drivers/net/octeon_ep/meson.build +++ b/drivers/net/octeon_ep/meson.build @@ -15,7 +15,6 @@ sources = files( if arch_subdir == 'x86' sources += files('cnxk_ep_rx_sse.c') -cflags += ['-DCC_AVX2_SUPPORT'] sources_avx2 = files('cnxk_ep_rx_avx.c') endif diff --git a/drivers/net/octeon_ep/otx_ep_ethdev.c b/drivers/net/octeon_ep/otx_ep_ethdev.c index 8b14734b0c..10f2f8a2e0 100644 --- a/drivers/net/octeon_ep/otx_ep_ethdev.c +++ b/drivers/net/octeon_ep/otx_ep_ethdev.c @@ -91,11 +91,9 @@ otx_ep_set_rx_func(struct rte_eth_dev *eth_dev) eth_dev->rx_pkt_burst = &cnxk_ep_recv_pkts; #ifdef RTE_ARCH_X86 eth_dev->rx_pkt_burst = &cnxk_ep_recv_pkts_sse; -#ifdef CC_AVX2_SUPPORT if (rte_vect_get_max_simd_bitwidth() >= RTE_VECT_SIMD_256 && rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2) == 1) eth_dev->rx_pkt_burst = &cnxk_ep_recv_pkts_avx; -#endif #elif defined(RTE_ARCH_ARM64) eth_dev->rx_pkt_burst = &cnxk_ep_recv_pkts_neon; #endif @@ -105,11 +103,9 @@ otx_ep_set_rx_func(struct rte_eth_dev *eth_dev) eth_dev->rx_pkt_burst = &cn9k_ep_recv_pkts; #ifdef RTE_ARCH_X86 eth_dev->rx_pkt_burst = &cn9k_ep_recv_pkts_sse; -#ifdef CC_AVX2_SUPPORT if (rte_vect_get_max_simd_bitwidth() >= RTE_VECT_SIMD_256 && rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2) == 1) eth_dev->rx_pkt_burst = &cn9k_ep_recv_pkts_avx; -#endif #elif defined(RTE_ARCH_ARM64) eth_dev->rx_pkt_burst = &cn9k_ep_recv_pkts_neon; #endif -- 2.45.2
[PATCH v5 05/11] event/dlb2: build using common AVX handling
remove special-case handling for AVX512, and rely on mechanisms in the drivers meson.build file. Signed-off-by: Bruce Richardson --- drivers/event/dlb2/dlb2_sse.c | 4 drivers/event/dlb2/meson.build | 16 ++-- 2 files changed, 6 insertions(+), 14 deletions(-) diff --git a/drivers/event/dlb2/dlb2_sse.c b/drivers/event/dlb2/dlb2_sse.c index f2e1f9fb7e..06474d61dd 100644 --- a/drivers/event/dlb2/dlb2_sse.c +++ b/drivers/event/dlb2/dlb2_sse.c @@ -5,6 +5,8 @@ #include #include +#ifndef CC_AVX512_SUPPORT + #include "dlb2_priv.h" #include "dlb2_iface.h" #include "dlb2_inline_fns.h" @@ -226,3 +228,5 @@ dlb2_event_build_hcws(struct dlb2_port *qm_port, break; } } + +#endif /* no CC_AVX512_SUPPORT */ diff --git a/drivers/event/dlb2/meson.build b/drivers/event/dlb2/meson.build index c024edb311..13d0fa544e 100644 --- a/drivers/event/dlb2/meson.build +++ b/drivers/event/dlb2/meson.build @@ -20,22 +20,10 @@ sources = files( 'pf/base/dlb2_resource.c', 'rte_pmd_dlb2.c', 'dlb2_selftest.c', +'dlb2_sse.c', ) -if target_has_avx512 -cflags += '-DCC_AVX512_SUPPORT' -sources += files('dlb2_avx512.c') - -elif cc_has_avx512 -cflags += '-DCC_AVX512_SUPPORT' -avx512_tmplib = static_library('avx512_tmp', - 'dlb2_avx512.c', - dependencies: [static_rte_eal, static_rte_eventdev], - c_args: cflags + cc_avx512_flags) -objs += avx512_tmplib.extract_objects('dlb2_avx512.c') -else -sources += files('dlb2_sse.c') -endif +sources_avx512 += files('dlb2_avx512.c') headers = files('rte_pmd_dlb2.h') -- 2.45.2
[PATCH v5 08/11] fib: use common AVX build handling
Remove custom logic for building AVX2 and AVX-512 files. Within the C code this requires some renaming of build macros to use the standard defines. Signed-off-by: Bruce Richardson --- lib/fib/dir24_8.c | 6 +++--- lib/fib/meson.build | 18 +- lib/fib/trie.c | 6 +++--- 3 files changed, 7 insertions(+), 23 deletions(-) diff --git a/lib/fib/dir24_8.c b/lib/fib/dir24_8.c index c48d962d41..2ba7e93511 100644 --- a/lib/fib/dir24_8.c +++ b/lib/fib/dir24_8.c @@ -16,11 +16,11 @@ #include "dir24_8.h" #include "fib_log.h" -#ifdef CC_DIR24_8_AVX512_SUPPORT +#ifdef CC_AVX512_SUPPORT #include "dir24_8_avx512.h" -#endif /* CC_DIR24_8_AVX512_SUPPORT */ +#endif /* CC_AVX512_SUPPORT */ #define DIR24_8_NAMESIZE 64 @@ -63,7 +63,7 @@ get_scalar_fn_inlined(enum rte_fib_dir24_8_nh_sz nh_sz, bool be_addr) static inline rte_fib_lookup_fn_t get_vector_fn(enum rte_fib_dir24_8_nh_sz nh_sz, bool be_addr) { -#ifdef CC_DIR24_8_AVX512_SUPPORT +#ifdef CC_AVX512_SUPPORT if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512F) <= 0 || rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512DQ) <= 0 || rte_vect_get_max_simd_bitwidth() < RTE_VECT_SIMD_512) diff --git a/lib/fib/meson.build b/lib/fib/meson.build index 0c19cc8201..55d9ddcee9 100644 --- a/lib/fib/meson.build +++ b/lib/fib/meson.build @@ -15,21 +15,5 @@ deps += ['rcu'] deps += ['net'] if dpdk_conf.has('RTE_ARCH_X86_64') -if target_has_avx512 -cflags += ['-DCC_DIR24_8_AVX512_SUPPORT', '-DCC_TRIE_AVX512_SUPPORT'] -sources += files('dir24_8_avx512.c', 'trie_avx512.c') - -elif cc_has_avx512 -cflags += ['-DCC_DIR24_8_AVX512_SUPPORT', '-DCC_TRIE_AVX512_SUPPORT'] -dir24_8_avx512_tmp = static_library('dir24_8_avx512_tmp', -'dir24_8_avx512.c', -dependencies: [static_rte_eal, static_rte_rcu], -c_args: cflags + cc_avx512_flags) -objs += dir24_8_avx512_tmp.extract_objects('dir24_8_avx512.c') -trie_avx512_tmp = static_library('trie_avx512_tmp', -'trie_avx512.c', -dependencies: [static_rte_eal, static_rte_rcu, static_rte_net], -c_args: cflags + cc_avx512_flags) -objs += trie_avx512_tmp.extract_objects('trie_avx512.c') -endif +sources_avx512 = files('dir24_8_avx512.c', 'trie_avx512.c') endif diff --git a/lib/fib/trie.c b/lib/fib/trie.c index 4893f6c636..6c20057ac5 100644 --- a/lib/fib/trie.c +++ b/lib/fib/trie.c @@ -14,11 +14,11 @@ #include #include "trie.h" -#ifdef CC_TRIE_AVX512_SUPPORT +#ifdef CC_AVX512_SUPPORT #include "trie_avx512.h" -#endif /* CC_TRIE_AVX512_SUPPORT */ +#endif /* CC_AVX512_SUPPORT */ #define TRIE_NAMESIZE 64 @@ -45,7 +45,7 @@ get_scalar_fn(enum rte_fib_trie_nh_sz nh_sz) static inline rte_fib6_lookup_fn_t get_vector_fn(enum rte_fib_trie_nh_sz nh_sz) { -#ifdef CC_TRIE_AVX512_SUPPORT +#ifdef CC_AVX512_SUPPORT if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512F) <= 0 || rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512DQ) <= 0 || rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512BW) <= 0 || -- 2.45.2
[PATCH v5 10/11] net: use common AVX512 build code
Use the common support for AVX512 code present in lib/meson.build, rather than hard-coding it. The only complication is an extra check for the "-mvpclmulqdq" command-line flag before adding the AVX512 sources. Signed-off-by: Bruce Richardson --- lib/net/meson.build | 12 lib/net/rte_net_crc.c | 8 2 files changed, 8 insertions(+), 12 deletions(-) diff --git a/lib/net/meson.build b/lib/net/meson.build index cd49b4d758..7a6c419f40 100644 --- a/lib/net/meson.build +++ b/lib/net/meson.build @@ -44,14 +44,10 @@ use_function_versioning = true if dpdk_conf.has('RTE_ARCH_X86_64') sources += files('net_crc_sse.c') cflags += ['-mpclmul', '-maes'] -if cc.has_argument('-mvpclmulqdq') and cc_has_avx512 -cflags += ['-DCC_X86_64_AVX512_VPCLMULQDQ_SUPPORT'] -net_crc_avx512_lib = static_library( -'net_crc_avx512_lib', -'net_crc_avx512.c', -dependencies: static_rte_eal, -c_args: [cflags, cc_avx512_flags, '-mvpclmulqdq']) -objs += net_crc_avx512_lib.extract_objects('net_crc_avx512.c') +# only build AVX-512 support if we also have PCLMULQDQ support +if cc.has_argument('-mvpclmulqdq') +sources_avx512 += files('net_crc_avx512.c') +cflags_avx512 += ['-mvpclmulqdq'] endif elif (dpdk_conf.has('RTE_ARCH_ARM64') and diff --git a/lib/net/rte_net_crc.c b/lib/net/rte_net_crc.c index 8d11515712..b5a5dabd4f 100644 --- a/lib/net/rte_net_crc.c +++ b/lib/net/rte_net_crc.c @@ -60,7 +60,7 @@ static const rte_net_crc_handler handlers_scalar[] = { [RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_handler, [RTE_NET_CRC32_ETH] = rte_crc32_eth_handler, }; -#ifdef CC_X86_64_AVX512_VPCLMULQDQ_SUPPORT +#ifdef CC_AVX512_SUPPORT static const rte_net_crc_handler handlers_avx512[] = { [RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_avx512_handler, [RTE_NET_CRC32_ETH] = rte_crc32_eth_avx512_handler, @@ -185,7 +185,7 @@ rte_crc32_eth_handler(const uint8_t *data, uint32_t data_len) static const rte_net_crc_handler * avx512_vpclmulqdq_get_handlers(void) { -#ifdef CC_X86_64_AVX512_VPCLMULQDQ_SUPPORT +#ifdef CC_AVX512_SUPPORT if (AVX512_VPCLMULQDQ_CPU_SUPPORTED && max_simd_bitwidth >= RTE_VECT_SIMD_512) return handlers_avx512; @@ -197,7 +197,7 @@ avx512_vpclmulqdq_get_handlers(void) static void avx512_vpclmulqdq_init(void) { -#ifdef CC_X86_64_AVX512_VPCLMULQDQ_SUPPORT +#ifdef CC_AVX512_SUPPORT if (AVX512_VPCLMULQDQ_CPU_SUPPORTED) rte_net_crc_avx512_init(); #endif @@ -305,7 +305,7 @@ handlers_init(enum rte_net_crc_alg alg) switch (alg) { case RTE_NET_CRC_AVX512: -#ifdef CC_X86_64_AVX512_VPCLMULQDQ_SUPPORT +#ifdef CC_AVX512_SUPPORT if (AVX512_VPCLMULQDQ_CPU_SUPPORTED) { handlers_dpdk26[alg].f[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_avx512_handler; -- 2.45.2
[PATCH v5 09/11] net: simplify build-time logic for x86
All DPDK-supported versions of clang and gcc have the "-mpclmul" and "-maes" flags, so we never need to check for those. This allows the SSE code path to be unconditionally built on x86. For the AVX512 code path, simplify it by only checking for the build-time support, and always doing a separate build with AVX512 support when that compiler support is present. Signed-off-by: Bruce Richardson --- lib/net/meson.build | 52 +-- lib/net/rte_net_crc.c | 8 +++ 2 files changed, 9 insertions(+), 51 deletions(-) diff --git a/lib/net/meson.build b/lib/net/meson.build index c9b34afc98..cd49b4d758 100644 --- a/lib/net/meson.build +++ b/lib/net/meson.build @@ -42,57 +42,15 @@ deps += ['mbuf'] use_function_versioning = true if dpdk_conf.has('RTE_ARCH_X86_64') -net_crc_sse42_cpu_support = (cc.get_define('__PCLMUL__', args: machine_args) != '') -net_crc_avx512_cpu_support = ( -target_has_avx512 and -cc.get_define('__VPCLMULQDQ__', args: machine_args) != '' -) - -net_crc_sse42_cc_support = (cc.has_argument('-mpclmul') and cc.has_argument('-maes')) -net_crc_avx512_cc_support = (cc.has_argument('-mvpclmulqdq') and cc_has_avx512) - -build_static_net_crc_sse42_lib = 0 -build_static_net_crc_avx512_lib = 0 - -if net_crc_sse42_cpu_support == true -sources += files('net_crc_sse.c') -cflags += ['-DCC_X86_64_SSE42_PCLMULQDQ_SUPPORT'] -if net_crc_avx512_cpu_support == true -sources += files('net_crc_avx512.c') -cflags += ['-DCC_X86_64_AVX512_VPCLMULQDQ_SUPPORT'] -elif net_crc_avx512_cc_support == true -build_static_net_crc_avx512_lib = 1 -net_crc_avx512_lib_cflags = cc_avx512_flags + ['-mvpclmulqdq'] -cflags += ['-DCC_X86_64_AVX512_VPCLMULQDQ_SUPPORT'] -endif -elif net_crc_sse42_cc_support == true -build_static_net_crc_sse42_lib = 1 -net_crc_sse42_lib_cflags = ['-mpclmul', '-maes'] -cflags += ['-DCC_X86_64_SSE42_PCLMULQDQ_SUPPORT'] -if net_crc_avx512_cc_support == true -build_static_net_crc_avx512_lib = 1 -net_crc_avx512_lib_cflags = cc_avx512_flags + ['-mvpclmulqdq', '-mpclmul'] -cflags += ['-DCC_X86_64_AVX512_VPCLMULQDQ_SUPPORT'] -endif -endif - -if build_static_net_crc_sse42_lib == 1 -net_crc_sse42_lib = static_library( -'net_crc_sse42_lib', -'net_crc_sse.c', -dependencies: static_rte_eal, -c_args: [cflags, -net_crc_sse42_lib_cflags]) -objs += net_crc_sse42_lib.extract_objects('net_crc_sse.c') -endif - -if build_static_net_crc_avx512_lib == 1 +sources += files('net_crc_sse.c') +cflags += ['-mpclmul', '-maes'] +if cc.has_argument('-mvpclmulqdq') and cc_has_avx512 +cflags += ['-DCC_X86_64_AVX512_VPCLMULQDQ_SUPPORT'] net_crc_avx512_lib = static_library( 'net_crc_avx512_lib', 'net_crc_avx512.c', dependencies: static_rte_eal, -c_args: [cflags, -net_crc_avx512_lib_cflags]) +c_args: [cflags, cc_avx512_flags, '-mvpclmulqdq']) objs += net_crc_avx512_lib.extract_objects('net_crc_avx512.c') endif diff --git a/lib/net/rte_net_crc.c b/lib/net/rte_net_crc.c index 2fb3eec231..8d11515712 100644 --- a/lib/net/rte_net_crc.c +++ b/lib/net/rte_net_crc.c @@ -66,7 +66,7 @@ static const rte_net_crc_handler handlers_avx512[] = { [RTE_NET_CRC32_ETH] = rte_crc32_eth_avx512_handler, }; #endif -#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT +#ifdef RTE_ARCH_X86_64 static const rte_net_crc_handler handlers_sse42[] = { [RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_sse42_handler, [RTE_NET_CRC32_ETH] = rte_crc32_eth_sse42_handler, @@ -211,7 +211,7 @@ avx512_vpclmulqdq_init(void) static const rte_net_crc_handler * sse42_pclmulqdq_get_handlers(void) { -#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT +#ifdef RTE_ARCH_X86_64 if (SSE42_PCLMULQDQ_CPU_SUPPORTED && max_simd_bitwidth >= RTE_VECT_SIMD_128) return handlers_sse42; @@ -223,7 +223,7 @@ sse42_pclmulqdq_get_handlers(void) static void sse42_pclmulqdq_init(void) { -#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT +#ifdef RTE_ARCH_X86_64 if (SSE42_PCLMULQDQ_CPU_SUPPORTED) rte_net_crc_sse42_init(); #endif @@ -316,7 +316,7 @@ handlers_init(enum rte_net_crc_alg alg) #endif /* fall-through */ case RTE_NET_CRC_SSE42: -#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT +#ifdef RTE_ARCH_X86_64 if (SSE42_PCLMULQDQ_CPU_SUPPORTED) { handlers_dpdk26[alg].f[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_sse42_handler; -- 2.45.2
Re: [PATCH] doc: add tested platforms with NVIDIA NICs
20/03/2025 09:44, Raslan Darawsheh: > Add tested platforms with NVIDIA NICs to the 25.03 release notes. > > Signed-off-by: Raslan Darawsheh Applied, thanks.
RE: [PATCH] mempool perf test: test random bulk sizes
Second PING for review. Med venlig hilsen / Kind regards, -Morten Brørup > From: Morten Brørup [mailto:m...@smartsharesystems.com] > Sent: Thursday, 13 March 2025 09.23 > > PING for review. > > This could still make it into 25.03-rc3 (deadline: 14 March 2025). > > Med venlig hilsen / Kind regards, > -Morten Brørup > > > > From: Morten Brørup [mailto:m...@smartsharesystems.com] > > Sent: Friday, 28 February 2025 17.49 > > > > Bulk requests to get or put objects in a mempool often vary in size. > > A series of tests with pseudo random request sizes, to mitigate the > > benefits of the CPU's dynamic branch predictor, was added. > > > > Also, various other minor changes: > > - Improved the output formatting for readability. > > - Added test for the "default" mempool with cache. > > - Skip the tests for the "default" mempool, if it happens to use the > > same > > driver (i.e. operations) as already tested. > > - Replaced bare use of "unsigned" with "unsigned int", > > to make checkpatches happy. > > > > Signed-off-by: Morten Brørup > > --- > > app/test/test_mempool_perf.c | 219 +++-- > -- > > 1 file changed, 172 insertions(+), 47 deletions(-) > > > > diff --git a/app/test/test_mempool_perf.c > > b/app/test/test_mempool_perf.c > > index 4dd74ef75a..5e29797f02 100644 > > --- a/app/test/test_mempool_perf.c > > +++ b/app/test/test_mempool_perf.c > > @@ -33,6 +33,13 @@ > > * Mempool performance > > * === > > * > > + *Each core get *n_keep* objects per bulk of a pseudorandom > number > > + *between 1 and *n_max_bulk*. > > + *Objects are put back in the pool per bulk of a similar > > pseudorandom number. > > + *Note: The very low entropy of the randomization algorithm is > > harmless, because > > + * the sole purpose of randomization is to prevent the > CPU's > > dynamic branch > > + * predictor from enhancing the test results. > > + * > > *Each core get *n_keep* objects per bulk of *n_get_bulk*. Then, > > *objects are put back in the pool per bulk of *n_put_bulk*. > > * > > @@ -52,7 +59,12 @@ > > * - Two cores with user-owned cache > > * - Max. cores with user-owned cache > > * > > - *- Bulk size (*n_get_bulk*, *n_put_bulk*) > > + *- Pseudorandom max bulk size (*n_max_bulk*) > > + * > > + * - Max bulk from CACHE_LINE_BURST to 256, and > > RTE_MEMPOOL_CACHE_MAX_SIZE, > > + *where CACHE_LINE_BURST is the number of pointers fitting > > into one CPU cache line. > > + * > > + *- Fixed bulk size (*n_get_bulk*, *n_put_bulk*) > > * > > * - Bulk get from 1 to 256, and RTE_MEMPOOL_CACHE_MAX_SIZE > > * - Bulk put from 1 to 256, and RTE_MEMPOOL_CACHE_MAX_SIZE > > @@ -89,16 +101,19 @@ > > } while (0) > > > > static int use_external_cache; > > -static unsigned external_cache_size = RTE_MEMPOOL_CACHE_MAX_SIZE; > > +static unsigned int external_cache_size = > RTE_MEMPOOL_CACHE_MAX_SIZE; > > > > static RTE_ATOMIC(uint32_t) synchro; > > > > +/* max random number of objects in one bulk operation (get and put) > */ > > +static unsigned int n_max_bulk; > > + > > /* number of objects in one bulk operation (get or put) */ > > -static unsigned n_get_bulk; > > -static unsigned n_put_bulk; > > +static unsigned int n_get_bulk; > > +static unsigned int n_put_bulk; > > > > /* number of objects retrieved from mempool before putting them back > > */ > > -static unsigned n_keep; > > +static unsigned int n_keep; > > > > /* true if we want to test with constant n_get_bulk and n_put_bulk > */ > > static int use_constant_values; > > @@ -118,7 +133,7 @@ static struct mempool_test_stats > > stats[RTE_MAX_LCORE]; > > */ > > static void > > my_obj_init(struct rte_mempool *mp, __rte_unused void *arg, > > - void *obj, unsigned i) > > + void *obj, unsigned int i) > > { > > uint32_t *objnum = obj; > > memset(obj, 0, mp->elt_size); > > @@ -159,11 +174,55 @@ test_loop(struct rte_mempool *mp, struct > > rte_mempool_cache *cache, > > return 0; > > } > > > > +static __rte_always_inline int > > +test_loop_random(struct rte_mempool *mp, struct rte_mempool_cache > > *cache, > > + unsigned int x_keep, unsigned int x_max_bulk) > > +{ > > + alignas(RTE_CACHE_LINE_SIZE) void *obj_table[MAX_KEEP]; > > + unsigned int idx; > > + unsigned int i; > > + unsigned int r = 0; > > + unsigned int x_bulk; > > + int ret; > > + > > + for (i = 0; likely(i < (N / x_keep)); i++) { > > + /* get x_keep objects by bulk of random [1 .. x_max_bulk] > > */ > > + for (idx = 0; idx < x_keep; idx += x_bulk, r++) { > > + /* Generate a pseudorandom number [1 .. x_max_bulk]. > > */ > > + x_bulk = ((r ^ (r >> 2) ^ (r << 3)) & (x_max_bulk - > > 1)) + 1; > > + if (unlikely(idx + x_bulk > x_keep)) > > + x_bulk = x_keep - idx; > > + ret = rte_mempool_generic_get(mp, > > +
Re: release candidate 25.03-rc3
Hi, > -Original Message- > From: Thomas Monjalon > Date: Wednesday, 19 March 2025 at 6:35 > To: annou...@dpdk.org > Subject: release candidate 25.03-rc3 >A new DPDK release candidate is ready for testing: > https://git.dpdk.org/dpdk/tag/?id=v25.03-rc3 > > There are 71 new patches in this snapshot. > > Release notes: >https://doc.dpdk.org/guides/rel_notes/release_25_03.html > > Please test and report new issues on https://bugs.dpdk.org > > If no major blocker is discovered, > this release cycle may be closed at the end of the week. > > Thank you everyone The following is a list of tests that we ran on NVIDIA hardware this release: Note: all tests are passed, and no critical issues were found. - Basic functionality: Send and receive multiple types of traffic. - testpmd xstats counter test. - testpmd timestamp test. - Changing/checking link status through testpmd. - RTE flow tests: https://doc.dpdk.org/guides/nics/mlx5.html#supported-hardware-offloads - RSS testing. - VLAN filtering, stripping and insertion tests. - Checksum and TSO tests. - ptype reporting. - link status interrupt using the example application link_status_interrupt tests. - Interrupt mode using l3fwd-power example application tests. - Multi process testing using multi process example applications. - Hardware LRO tests. - Buffer Split. - Tx scheduling. We don't see new issues caused by changes in this release. Kindest Regards, Wael Abualrub
[RFC PATCH] build: reduce use of AVX compiler flags
When doing a build for a target that already has the instruction sets for AVX2/AVX512 enabled, skip emitting the AVX compiler flags, or the skylake-avx512 '-march' flags, as they are unnecessary. Instead, when the default flags produce the desired output, just use them unmodified. Depends-on: series-34915 ("remove component-specific logic for AVX builds") Signed-off-by: Bruce Richardson --- This patchset depends on the previous AVX rework. However, sending it separately as a new RFC because it effectively increases the minimum compiler versions needed for x86 builds - from GCC 5 to 6, and Clang 3.6 to 3.9. For now, I've just documented that as an additional note in the GSG that these versions are recommended, but it would be simpler if we could just set them as the required minimum baseline (at least in the docs). Feedback on these compiler version requirements welcome. --- config/x86/meson.build| 31 --- doc/guides/linux_gsg/sys_reqs.rst | 8 drivers/meson.build | 9 + lib/meson.build | 9 + 4 files changed, 30 insertions(+), 27 deletions(-) diff --git a/config/x86/meson.build b/config/x86/meson.build index c3564b0011..97f790b0d4 100644 --- a/config/x86/meson.build +++ b/config/x86/meson.build @@ -4,11 +4,13 @@ if is_ms_compiler cc_avx2_flags = ['/arch:AVX2'] else -cc_avx2_flags = ['-mavx2'] +cc_avx2_flags = [] +if cc.get_define('__AVX2__', args: machine_args) == '' +cc_avx2_flags = ['-mavx2'] +endif endif cc_has_avx512 = false -target_has_avx512 = false dpdk_conf.set('RTE_ARCH_X86', 1) if dpdk_conf.get('RTE_ARCH_64') @@ -65,26 +67,33 @@ if is_linux or cc.get_id() == 'gcc' endif endif -cc_avx512_flags = ['-mavx512f', '-mavx512vl', '-mavx512dq', '-mavx512bw', '-mavx512cd'] -if (binutils_ok and cc.has_multi_arguments(cc_avx512_flags) +avx512_march_flag = '-march=skylake-avx512' +cc_avx512_flags = [] +if (binutils_ok and cc.has_argument(avx512_march_flag) and '-mno-avx512f' not in get_option('c_args')) # check if compiler is working with _mm512_extracti64x4_epi64 # Ref: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82887 code = '''#include void test(__m512i zmm){ __m256i ymm = _mm512_extracti64x4_epi64(zmm, 0);}''' -result = cc.compiles(code, args : cc_avx512_flags, name : 'AVX512 checking') +result = cc.compiles(code, args : [avx512_march_flag], name : 'AVX512 checking') if result == false machine_args += '-mno-avx512f' warning('Broken _mm512_extracti64x4_epi64, disabling AVX512 support') else cc_has_avx512 = true -target_has_avx512 = ( -cc.get_define('__AVX512F__', args: machine_args) != '' and -cc.get_define('__AVX512BW__', args: machine_args) != '' and -cc.get_define('__AVX512DQ__', args: machine_args) != '' and -cc.get_define('__AVX512VL__', args: machine_args) != '' -) +if cc.get_define('__AVX512F__', args: machine_args) == '' +cc_avx512_flags = [avx512_march_flag] +if cc.has_argument('-Wno-overriding-option') +cc_avx512_args += '-Wno-overriding-option' +endif +endif +endif +endif +if developer_mode +message('Extra C flags needed for AVX2 output: @0@'.format(cc_avx2_flags)) +if cc_has_avx512 +message('Extra C flags needed for AVX512 output: @0@'.format(cc_avx512_flags)) endif endif diff --git a/doc/guides/linux_gsg/sys_reqs.rst b/doc/guides/linux_gsg/sys_reqs.rst index 5a7d9e4a43..55e9fe4724 100644 --- a/doc/guides/linux_gsg/sys_reqs.rst +++ b/doc/guides/linux_gsg/sys_reqs.rst @@ -35,6 +35,14 @@ Compilation of the DPDK * For Ubuntu/Debian systems these can be installed using ``apt install build-essential`` * For Alpine Linux, ``apk add alpine-sdk bsd-compat-headers`` +.. note:: + + When compiling for x86 platforms, + GCC version 6.1 or higher, + or Clang version 3.9 or higher is recommended. + Earlier versions of these compilers do not support the compiler flags used by DPDK for AVX-512 code. + As such, any builds using earlier compilers will be missing AVX-512 support. + .. note:: pkg-config 0.27, supplied with RHEL-7, diff --git a/drivers/meson.build b/drivers/meson.build index c15319dc24..bb33b0a7a0 100644 --- a/drivers/meson.build +++ b/drivers/meson.build @@ -249,18 +249,11 @@ foreach subpath:subdirs endif if sources_avx512.length() > 0 and cc_has_avx512 cflags += '-DCC_AVX512_SUPPORT' -avx512_args = [cflags, cc_avx512_flags] -if not target_has_avx512 and cc.has_argument('-march=skylake-avx512') -avx512_args += '-march=skylake-avx512' -if cc.has_argument('-Wno-overriding-option') -avx512_args += '-Wno-overriding-option'
Re: [PATCH v4 00/11] remove component-specific logic for AVX builds
On Tue, Mar 25, 2025 at 08:46:35AM +0100, David Marchand wrote: > Hello Bruce, > > On Wed, Mar 19, 2025 at 7:09 PM Bruce Richardson > wrote: > > > > On Wed, Mar 19, 2025 at 05:29:30PM +, Bruce Richardson wrote: > > > A number of libs and drivers had special optimized AVX2 and AVX512 code > > > paths for performance reasons, and these tended to have copy-pasted > > > logic to build those files. Centralise that logic in the main > > > drivers/ and lib/ meson.build files to avoid duplication. > > > > > > v4: rebase on latest main branch > > > minor fixes following feedback > > > limit use of -march=skylake-avx512 to when we don't already have a > > > -march flag supporting AVX512. > > > v3: add patch for event/dlb2 AVX512 handling. > > > add common code for libraries as well as drivers. > > > v2: add patch 4 to remove use of unnecessary CC_AVX2_SUPPORT flag > > > > > A related follow-up to this patchset. Checking with "godbolt.org", it > > appears that both clang 3.6[1] and gcc 5[2] (the minimum called out compiler > > versions in our docs[1]) support the set of AVX-512 compiler flags we use. > > Therefore, it seems we can simplify our code further by removing the > > "cc_has_avx512" variable. > > What about https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90028 ? > Yep, still needs to be handled, something I only realised after sending the email. > You'll need to send a new revision for this series in any case, since > patch 9 broke the crc stuff in the net library. > https://inbox.dpdk.org/dev/CAJFAV8w9wYPN+30Hv=batMvP=0m4momkzgmndfixxbd-9u8...@mail.gmail.com/ > Yes, I saw that and just started looking at it last evening. Will hopefully get a new revision out soon. /Bruce
Re: [PATCH] doc/bluefield: add comparison between BlueField versions
19/03/2025 13:13, Raslan Darawsheh: > Updated BlueField-3 documentation to include a detailed comparison > with BlueField-2 and added notes on compiler requirements. > > Signed-off-by: Raslan Darawsheh Applied, thanks.
Re: [PATCH 2/2] app/dma-perf: fix infinite loop
On Fri, 21 Mar 2025 12:03:16 +0800 Dengdui Huang wrote: > When a core that is not used by the rte is specified in the config > for testing, the problem of infinite loop occurs. The root cause > is that the program waits for the completion of the test task when > the test worker fails to be started on the lcore. This patch fix it. > > Fixes: 533d7e7f66f3 ("app/dma-perf: support config per device") > Cc: sta...@dpdk.org > > Signed-off-by: Dengdui Huang > --- > app/test-dma-perf/benchmark.c | 5 - > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/app/test-dma-perf/benchmark.c b/app/test-dma-perf/benchmark.c > index 6d617ea200..351c1c966e 100644 > --- a/app/test-dma-perf/benchmark.c > +++ b/app/test-dma-perf/benchmark.c > @@ -751,7 +751,10 @@ mem_copy_benchmark(struct test_configure *cfg) > goto out; > } > > - rte_eal_remote_launch(get_work_function(cfg), (void > *)(lcores[i]), lcore_id); > + if (rte_eal_remote_launch(get_work_function(cfg), (void > *)(lcores[i]), lcore_id)) { > + printf("Error: Fail to start the test on lcore %d\n", > lcore_id); Convention is to log errors on stderr and lcore_id is unsigned not signed value.