[PATCH] ci: do not dump error logs in GHA containers
On error, the build logs are displayed in GHA console and logs unless the GITHUB_WORKFLOW env variable is set. However, containers in GHA do not automatically inherit this variable. We could pass this variable in the container environment, but in the end, dumping those logs is only for Travis which we don't really care about anymore. Let's make the linux-build.sh more generic and dump logs from Travis yaml itself. Fixes: b35c4b0aa2bc ("ci: add Fedora 35 container in GHA") Signed-off-by: David Marchand --- .ci/linux-build.sh | 19 --- .travis.yml| 4 2 files changed, 4 insertions(+), 19 deletions(-) diff --git a/.ci/linux-build.sh b/.ci/linux-build.sh index 774a1441bf..6a937611fa 100755 --- a/.ci/linux-build.sh +++ b/.ci/linux-build.sh @@ -3,25 +3,6 @@ # Builds are run as root in containers, no need for sudo [ "$(id -u)" != '0' ] || alias sudo= -on_error() { -if [ $? = 0 ]; then -exit -fi -FILES_TO_PRINT="build/meson-logs/testlog.txt" -FILES_TO_PRINT="$FILES_TO_PRINT build/.ninja_log" -FILES_TO_PRINT="$FILES_TO_PRINT build/meson-logs/meson-log.txt" -FILES_TO_PRINT="$FILES_TO_PRINT build/gdb.log" - -for pr_file in $FILES_TO_PRINT; do -if [ -e "$pr_file" ]; then -cat "$pr_file" -fi -done -} -# We capture the error logs as artifacts in Github Actions, no need to dump -# them via a EXIT handler. -[ -n "$GITHUB_WORKFLOW" ] || trap on_error EXIT - install_libabigail() { version=$1 instdir=$2 diff --git a/.travis.yml b/.travis.yml index 5f46dccb54..e4e70fa560 100644 --- a/.travis.yml +++ b/.travis.yml @@ -38,6 +38,10 @@ _doc_packages: &doc_packages before_install: ./.ci/${TRAVIS_OS_NAME}-setup.sh script: ./.ci/${TRAVIS_OS_NAME}-build.sh +after_failure: +- [ ! -e build/meson-logs/testlog.txt ] || cat build/meson-logs/testlog.txt +- [ ! -e build/.ninja_log ] || cat build/.ninja_log +- [ ! -e build/meson-logs/meson-log.txt ] || cat build/meson-logs/meson-log.txt env: global: -- 2.23.0
Re: [PATCH] ci: do not dump error logs in GHA containers
On Tue, Apr 26, 2022 at 9:09 AM David Marchand wrote: > > On error, the build logs are displayed in GHA console and logs unless > the GITHUB_WORKFLOW env variable is set. > However, containers in GHA do not automatically inherit this variable. > We could pass this variable in the container environment, but in the > end, dumping those logs is only for Travis which we don't really care > about anymore. > > Let's make the linux-build.sh more generic and dump logs from Travis > yaml itself. > > Fixes: b35c4b0aa2bc ("ci: add Fedora 35 container in GHA") > > Signed-off-by: David Marchand TBH, I did not test Travis by lack of interest (plus I don't want to be bothered with their ui / credit stuff). We could consider dropping Travis in the near future. Opinions? -- David Marchand
[PATCH 2/2] ci: add mingw cross compilation in GHA
Add mingw cross compilation in our public CI so that users with their own github repository have a first level of checks for Windows compilation before submitting to the mailing list. This does not replace our better checks in other entities of the CI. Only the helloworld example is compiled (same as what is tested in test-meson-builds.sh). Note: the mingw cross compilation toolchain (version 5.0) in Ubuntu 18.04 was broken (missing a ENOMSG definition). Signed-off-by: David Marchand --- .ci/linux-build.sh | 22 +- .github/workflows/build.yml | 8 2 files changed, 25 insertions(+), 5 deletions(-) diff --git a/.ci/linux-build.sh b/.ci/linux-build.sh index 30119b61ba..06dd20772d 100755 --- a/.ci/linux-build.sh +++ b/.ci/linux-build.sh @@ -37,16 +37,26 @@ catch_coredump() { return 1 } +cross_file= + if [ "$AARCH64" = "true" ]; then if [ "${CC%%clang}" != "$CC" ]; then -OPTS="$OPTS --cross-file config/arm/arm64_armv8_linux_clang_ubuntu2004" +cross_file=config/arm/arm64_armv8_linux_clang_ubuntu2004 else -OPTS="$OPTS --cross-file config/arm/arm64_armv8_linux_gcc" +cross_file=config/arm/arm64_armv8_linux_gcc fi fi +if [ "$MINGW" = "true" ]; then +cross_file=config/x86/cross-mingw +fi + if [ "$PPC64LE" = "true" ]; then -OPTS="$OPTS --cross-file config/ppc/ppc64le-power8-linux-gcc-ubuntu2004" +cross_file=config/ppc/ppc64le-power8-linux-gcc-ubuntu2004 +fi + +if [ -n "$cross_file" ]; then +OPTS="$OPTS --cross-file $cross_file" fi if [ "$BUILD_DOCS" = "true" ]; then @@ -59,7 +69,9 @@ if [ "$BUILD_32BIT" = "true" ]; then export PKG_CONFIG_LIBDIR="/usr/lib32/pkgconfig" fi -if [ "$DEF_LIB" = "static" ]; then +if [ "$MINGW" = "true" ]; then +OPTS="$OPTS -Dexamples=helloworld" +elif [ "$DEF_LIB" = "static" ]; then OPTS="$OPTS -Dexamples=l2fwd,l3fwd" else OPTS="$OPTS -Dexamples=all" @@ -76,7 +88,7 @@ fi meson build --werror $OPTS ninja -C build -if [ "$AARCH64" != "true" ] && [ "$PPC64LE" != "true" ]; then +if [ -z "$cross_file" ]; then failed= configure_coredump devtools/test-null.sh || failed="true" diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml index 812aa7055d..e2f94d786b 100644 --- a/.github/workflows/build.yml +++ b/.github/workflows/build.yml @@ -21,6 +21,7 @@ jobs: CC: ccache ${{ matrix.config.compiler }} DEF_LIB: ${{ matrix.config.library }} LIBABIGAIL_VERSION: libabigail-1.8 + MINGW: ${{ matrix.config.cross == 'mingw' }} MINI: ${{ matrix.config.mini != '' }} PPC64LE: ${{ matrix.config.cross == 'ppc64le' }} REF_GIT_TAG: v22.03 @@ -52,6 +53,10 @@ jobs: compiler: gcc library: static cross: i386 + - os: ubuntu-20.04 +compiler: gcc +library: static +cross: mingw - os: ubuntu-20.04 compiler: gcc library: static @@ -119,6 +124,9 @@ jobs: if: env.AARCH64 == 'true' run: sudo apt install -y gcc-aarch64-linux-gnu libc6-dev-arm64-cross pkg-config-aarch64-linux-gnu +- name: Install mingw cross compiling packages + if: env.MINGW == 'true' + run: sudo apt install -y mingw-w64 mingw-w64-tools - name: Install ppc64le cross compiling packages if: env.PPC64LE == 'true' run: sudo apt install -y gcc-powerpc64le-linux-gnu libc6-dev-ppc64el-cross -- 2.23.0
[PATCH 1/2] ci: switch to Ubuntu 20.04
Ubuntu 18.04 is now rather old. Besides, other entities in our CI are also testing this distribution. Switch to a newer Ubuntu release and benefit from more recent tool(chain)s: for example, net/cnxk now builds fine and can be re-enabled. Signed-off-by: David Marchand --- .ci/linux-build.sh| 7 ++ .github/workflows/build.yml | 22 +-- config/arm/arm64_armv8_linux_clang_ubuntu2004 | 1 + .../ppc/ppc64le-power8-linux-gcc-ubuntu2004 | 1 + 4 files changed, 14 insertions(+), 17 deletions(-) create mode 12 config/arm/arm64_armv8_linux_clang_ubuntu2004 create mode 12 config/ppc/ppc64le-power8-linux-gcc-ubuntu2004 diff --git a/.ci/linux-build.sh b/.ci/linux-build.sh index 6a937611fa..30119b61ba 100755 --- a/.ci/linux-build.sh +++ b/.ci/linux-build.sh @@ -38,18 +38,15 @@ catch_coredump() { } if [ "$AARCH64" = "true" ]; then -# Note: common/cnxk is disabled for Ubuntu 18.04 -# https://bugs.dpdk.org/show_bug.cgi?id=697 -OPTS="$OPTS -Ddisable_drivers=common/cnxk" if [ "${CC%%clang}" != "$CC" ]; then -OPTS="$OPTS --cross-file config/arm/arm64_armv8_linux_clang_ubuntu1804" +OPTS="$OPTS --cross-file config/arm/arm64_armv8_linux_clang_ubuntu2004" else OPTS="$OPTS --cross-file config/arm/arm64_armv8_linux_gcc" fi fi if [ "$PPC64LE" = "true" ]; then -OPTS="$OPTS --cross-file config/ppc/ppc64le-power8-linux-gcc-ubuntu1804" +OPTS="$OPTS --cross-file config/ppc/ppc64le-power8-linux-gcc-ubuntu2004" fi if [ "$BUILD_DOCS" = "true" ]; then diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml index 22daaabb91..812aa7055d 100644 --- a/.github/workflows/build.yml +++ b/.github/workflows/build.yml @@ -30,43 +30,41 @@ jobs: fail-fast: false matrix: config: - - os: ubuntu-18.04 + - os: ubuntu-20.04 compiler: gcc library: static - - os: ubuntu-18.04 + - os: ubuntu-20.04 compiler: gcc library: shared mini: mini - - os: ubuntu-18.04 + - os: ubuntu-20.04 compiler: gcc library: shared checks: abi+doc+tests - - os: ubuntu-18.04 + - os: ubuntu-20.04 compiler: clang library: static - - os: ubuntu-18.04 + - os: ubuntu-20.04 compiler: clang library: shared checks: doc+tests - - os: ubuntu-18.04 + - os: ubuntu-20.04 compiler: gcc library: static cross: i386 - # Note: common/cnxk is disabled for Ubuntu 18.04 - # https://bugs.dpdk.org/show_bug.cgi?id=697 - - os: ubuntu-18.04 + - os: ubuntu-20.04 compiler: gcc library: static cross: aarch64 - - os: ubuntu-18.04 + - os: ubuntu-20.04 compiler: gcc library: shared cross: aarch64 - - os: ubuntu-18.04 + - os: ubuntu-20.04 compiler: gcc library: static cross: ppc64le - - os: ubuntu-18.04 + - os: ubuntu-20.04 compiler: gcc library: shared cross: ppc64le diff --git a/config/arm/arm64_armv8_linux_clang_ubuntu2004 b/config/arm/arm64_armv8_linux_clang_ubuntu2004 new file mode 12 index 00..01f5b7643e --- /dev/null +++ b/config/arm/arm64_armv8_linux_clang_ubuntu2004 @@ -0,0 +1 @@ +arm64_armv8_linux_clang_ubuntu1804 \ No newline at end of file diff --git a/config/ppc/ppc64le-power8-linux-gcc-ubuntu2004 b/config/ppc/ppc64le-power8-linux-gcc-ubuntu2004 new file mode 12 index 00..9d6139a19b --- /dev/null +++ b/config/ppc/ppc64le-power8-linux-gcc-ubuntu2004 @@ -0,0 +1 @@ +ppc64le-power8-linux-gcc-ubuntu1804 \ No newline at end of file -- 2.23.0
Re: [PATCH v2] test/bpf: skip test if libpcap is unavailable
On Tue, Mar 22, 2022 at 8:12 AM Tyler Retzlaff wrote: > > test_bpf_convert is being conditionally registered depending on the > presence of RTE_HAS_LIBPCAP except the UT unconditionally lists it as a > test to run. > > when the UT runs test_bpf_convert test-dpdk can't find the registration > and assumes the DPDK_TEST environment variable hasn't been defined > resulting in test-dpdk dropping to interactive mode and subsequently > waiting for the remainder of the UT fast-test timeout period before > reporting the test as having timed out. > > * unconditionally register test_bpf_convert > * if ! RTE_HAS_LIBPCAP provide a stub test_bpf_convert that reports the > test is skipped similar to that done with the test_bpf test. > > Fixes: 2eccf6afbea9 ("bpf: add function to convert classic BPF to DPDK BPF") > Cc: sta...@dpdk.org > > Signed-off-by: Tyler Retzlaff Acked-by: Stephen Hemminger Acked-by: Konstantin Ananyev Applied, thanks. -- David Marchand
Re: [PATCH v2] test/bpf: skip test if libpcap is unavailable
On Tue, Apr 26, 2022 at 09:41:08AM +0200, David Marchand wrote: > On Tue, Mar 22, 2022 at 8:12 AM Tyler Retzlaff > wrote: > > > > test_bpf_convert is being conditionally registered depending on the > > presence of RTE_HAS_LIBPCAP except the UT unconditionally lists it as a > > test to run. > > > > when the UT runs test_bpf_convert test-dpdk can't find the registration > > and assumes the DPDK_TEST environment variable hasn't been defined > > resulting in test-dpdk dropping to interactive mode and subsequently > > waiting for the remainder of the UT fast-test timeout period before > > reporting the test as having timed out. > > > > * unconditionally register test_bpf_convert > > * if ! RTE_HAS_LIBPCAP provide a stub test_bpf_convert that reports the > > test is skipped similar to that done with the test_bpf test. > > > > Fixes: 2eccf6afbea9 ("bpf: add function to convert classic BPF to DPDK BPF") > > Cc: sta...@dpdk.org > > > > Signed-off-by: Tyler Retzlaff > Acked-by: Stephen Hemminger > Acked-by: Konstantin Ananyev > > Applied, thanks. thanks mate! > > > -- > David Marchand
[PATCH v4 0/3] add eal functions for thread affinity and self
this series provides basic dependencies for additional eal thread api additions. series includes * basic platform error number conversion. * function to get current thread identifier. * functions to get and set affinity with platform agnostic thread identifier. * minimal unit test of get and set affinity demonstrating usage. note: previous series introducing these functions is now superseded by this series. http://patches.dpdk.org/project/dpdk/list/?series=20472&state=* v4: * combine patch eal/windows: translate Windows errors to errno-style errors into eal: implement functions for get/set thread affinity patch. the former introduced static functions that were not used without eal: implement functions for get/set thread affinity which would cause a build break when applied standalone. * remove struct tag from rte_thread_t struct typedef. * remove rte_ prefix from rte_convert_cpuset_to_affinity static function. v3: * fix memory leak on eal_create_cpu_map error paths. v2: * add missing boilerplate comments warning of experimental api for rte_thread_{set,get}_affinity_by_id(). * don't break literal format string to log_early to improve searchability. * fix multi-line comment style to match file. * return ENOTSUP instead of EINVAL from rte_convert_cpuset_to_affinity() if cpus in set are not part of the same processor group and note limitation in commit message. * expand series to include rte_thread_self(). * modify unit test to remove use of implementation detail and get thread identifier use added rte_thread_self(). * move literal value to rhs when using memcmp in RTE_TEST_ASSERT Tyler Retzlaff (3): eal: add basic thread ID and current thread identifier API eal: implement functions for get/set thread affinity test/threads: add unit test for thread API app/test/meson.build | 2 + app/test/test_threads.c | 89 ++ lib/eal/include/rte_thread.h | 64 + lib/eal/unix/rte_thread.c| 27 ++ lib/eal/version.map | 5 ++ lib/eal/windows/eal_lcore.c | 181 +++-- lib/eal/windows/eal_windows.h| 10 +++ lib/eal/windows/include/rte_os.h | 2 + lib/eal/windows/rte_thread.c | 190 ++- 9 files changed, 522 insertions(+), 48 deletions(-) create mode 100644 app/test/test_threads.c -- 1.8.3.1
[PATCH v4 1/3] eal: add basic thread ID and current thread identifier API
Provide a portable type-safe thread identifier. Provide rte_thread_self for obtaining current thread identifier. Signed-off-by: Narcisa Vasile Signed-off-by: Tyler Retzlaff --- lib/eal/include/rte_thread.h | 22 ++ lib/eal/unix/rte_thread.c| 11 +++ lib/eal/version.map | 3 +++ lib/eal/windows/rte_thread.c | 10 ++ 4 files changed, 46 insertions(+) diff --git a/lib/eal/include/rte_thread.h b/lib/eal/include/rte_thread.h index 8be8ed8..14478ba 100644 --- a/lib/eal/include/rte_thread.h +++ b/lib/eal/include/rte_thread.h @@ -1,7 +1,10 @@ /* SPDX-License-Identifier: BSD-3-Clause * Copyright(c) 2021 Mellanox Technologies, Ltd + * Copyright (C) 2022 Microsoft Corporation */ +#include + #include #include @@ -21,10 +24,29 @@ #endif /** + * Thread id descriptor. + */ +typedef struct { + uintptr_t opaque_id; /**< thread identifier */ +} rte_thread_t; + +/** * TLS key type, an opaque pointer. */ typedef struct eal_tls_key *rte_thread_key; +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice. + * + * Get the id of the calling thread. + * + * @return + * Return the thread id of the calling thread. + */ +__rte_experimental +rte_thread_t rte_thread_self(void); + #ifdef RTE_HAS_CPUSET /** diff --git a/lib/eal/unix/rte_thread.c b/lib/eal/unix/rte_thread.c index c34ede9..82e008f 100644 --- a/lib/eal/unix/rte_thread.c +++ b/lib/eal/unix/rte_thread.c @@ -1,5 +1,6 @@ /* SPDX-License-Identifier: BSD-3-Clause * Copyright 2021 Mellanox Technologies, Ltd + * Copyright (C) 2022 Microsoft Corporation */ #include @@ -15,6 +16,16 @@ struct eal_tls_key { pthread_key_t thread_index; }; +rte_thread_t +rte_thread_self(void) +{ + rte_thread_t thread_id; + + thread_id.opaque_id = (uintptr_t)pthread_self(); + + return thread_id; +} + int rte_thread_key_create(rte_thread_key *key, void (*destructor)(void *)) { diff --git a/lib/eal/version.map b/lib/eal/version.map index b53eeb3..05ce8f9 100644 --- a/lib/eal/version.map +++ b/lib/eal/version.map @@ -420,6 +420,9 @@ EXPERIMENTAL { rte_intr_instance_free; rte_intr_type_get; rte_intr_type_set; + + # added in 22.07 + rte_thread_self; }; INTERNAL { diff --git a/lib/eal/windows/rte_thread.c b/lib/eal/windows/rte_thread.c index 667287c..59fed3c 100644 --- a/lib/eal/windows/rte_thread.c +++ b/lib/eal/windows/rte_thread.c @@ -11,6 +11,16 @@ struct eal_tls_key { DWORD thread_index; }; +rte_thread_t +rte_thread_self(void) +{ + rte_thread_t thread_id; + + thread_id.opaque_id = GetCurrentThreadId(); + + return thread_id; +} + int rte_thread_key_create(rte_thread_key *key, __rte_unused void (*destructor)(void *)) -- 1.8.3.1
[PATCH v4 2/3] eal: implement functions for get/set thread affinity
Implement functions for getting/setting thread affinity. Threads can be pinned to specific cores by setting their affinity attribute. Windows error codes are translated to errno-style error codes. The possible return values are chosen so that we have as much semantic compatibility between platforms as possible. note: convert_cpuset_to_affinity has a limitation that all cpus of the set belong to the same processor group. Signed-off-by: Narcisa Vasile Signed-off-by: Tyler Retzlaff --- lib/eal/include/rte_thread.h | 42 + lib/eal/unix/rte_thread.c| 16 lib/eal/version.map | 2 + lib/eal/windows/eal_lcore.c | 181 +-- lib/eal/windows/eal_windows.h| 10 +++ lib/eal/windows/include/rte_os.h | 2 + lib/eal/windows/rte_thread.c | 180 +- 7 files changed, 385 insertions(+), 48 deletions(-) diff --git a/lib/eal/include/rte_thread.h b/lib/eal/include/rte_thread.h index 14478ba..7888f7a 100644 --- a/lib/eal/include/rte_thread.h +++ b/lib/eal/include/rte_thread.h @@ -50,6 +50,48 @@ #ifdef RTE_HAS_CPUSET /** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice. + * + * Set the affinity of thread 'thread_id' to the cpu set + * specified by 'cpuset'. + * + * @param thread_id + *Id of the thread for which to set the affinity. + * + * @param cpuset + * Pointer to CPU affinity to set. + * + * @return + * On success, return 0. + * On failure, return a positive errno-style error number. + */ +__rte_experimental +int rte_thread_set_affinity_by_id(rte_thread_t thread_id, + const rte_cpuset_t *cpuset); + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice. + * + * Get the affinity of thread 'thread_id' and store it + * in 'cpuset'. + * + * @param thread_id + *Id of the thread for which to get the affinity. + * + * @param cpuset + * Pointer for storing the affinity value. + * + * @return + * On success, return 0. + * On failure, return a positive errno-style error number. + */ +__rte_experimental +int rte_thread_get_affinity_by_id(rte_thread_t thread_id, + rte_cpuset_t *cpuset); + +/** * Set core affinity of the current thread. * Support both EAL and non-EAL thread and update TLS. * diff --git a/lib/eal/unix/rte_thread.c b/lib/eal/unix/rte_thread.c index 82e008f..c64198f 100644 --- a/lib/eal/unix/rte_thread.c +++ b/lib/eal/unix/rte_thread.c @@ -100,3 +100,19 @@ struct eal_tls_key { } return pthread_getspecific(key->thread_index); } + +int +rte_thread_set_affinity_by_id(rte_thread_t thread_id, + const rte_cpuset_t *cpuset) +{ + return pthread_setaffinity_np((pthread_t)thread_id.opaque_id, + sizeof(*cpuset), cpuset); +} + +int +rte_thread_get_affinity_by_id(rte_thread_t thread_id, + rte_cpuset_t *cpuset) +{ + return pthread_getaffinity_np((pthread_t)thread_id.opaque_id, + sizeof(*cpuset), cpuset); +} diff --git a/lib/eal/version.map b/lib/eal/version.map index 05ce8f9..d49e30b 100644 --- a/lib/eal/version.map +++ b/lib/eal/version.map @@ -422,7 +422,9 @@ EXPERIMENTAL { rte_intr_type_set; # added in 22.07 + rte_thread_get_affinity_by_id; rte_thread_self; + rte_thread_set_affinity_by_id; }; INTERNAL { diff --git a/lib/eal/windows/eal_lcore.c b/lib/eal/windows/eal_lcore.c index 476c2d2..aa2fad9 100644 --- a/lib/eal/windows/eal_lcore.c +++ b/lib/eal/windows/eal_lcore.c @@ -1,8 +1,8 @@ /* SPDX-License-Identifier: BSD-3-Clause * Copyright(c) 2019 Intel Corporation + * Copyright (C) 2022 Microsoft Corporation */ -#include #include #include @@ -27,13 +27,15 @@ struct socket_map { }; struct cpu_map { - unsigned int socket_count; unsigned int lcore_count; + unsigned int socket_count; + unsigned int cpu_count; struct lcore_map lcores[RTE_MAX_LCORE]; struct socket_map sockets[RTE_MAX_NUMA_NODES]; + GROUP_AFFINITY cpus[CPU_SETSIZE]; }; -static struct cpu_map cpu_map = { 0 }; +static struct cpu_map cpu_map; /* eal_create_cpu_map() is called before logging is initialized */ static void @@ -47,13 +49,115 @@ struct cpu_map { va_end(va); } +static int +eal_query_group_affinity(void) +{ + SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX *infos = NULL; + unsigned int *cpu_count = &cpu_map.cpu_count; + DWORD infos_size = 0; + int ret = 0; + USHORT group_count; + KAFFINITY affinity; + USHORT group_no; + unsigned int i; + + if (!GetLogicalProcessorInformationEx(RelationGroup, NULL, + &infos_size)) { + DWORD error = GetLastError(); + if (error != ERROR_INSUFFICIENT_BUFFER) { + log_early("Cannot get group information size, error %lu\n", error); + rte_errno = EINVAL; +
[PATCH v4 3/3] test/threads: add unit test for thread API
Establish unit test for testing thread api. Initial unit tests for rte_thread_{get,set}_affinity_by_id(). Signed-off-by: Narcisa Vasile Signed-off-by: Tyler Retzlaff --- app/test/meson.build| 2 ++ app/test/test_threads.c | 89 + 2 files changed, 91 insertions(+) create mode 100644 app/test/test_threads.c diff --git a/app/test/meson.build b/app/test/meson.build index 5fc1dd1..5a9d69b 100644 --- a/app/test/meson.build +++ b/app/test/meson.build @@ -133,6 +133,7 @@ test_sources = files( 'test_tailq.c', 'test_thash.c', 'test_thash_perf.c', +'test_threads.c', 'test_timer.c', 'test_timer_perf.c', 'test_timer_racecond.c', @@ -238,6 +239,7 @@ fast_tests = [ ['reorder_autotest', true], ['service_autotest', true], ['thash_autotest', true], +['threads_autotest', true], ['trace_autotest', true], ] diff --git a/app/test/test_threads.c b/app/test/test_threads.c new file mode 100644 index 000..0ca6745 --- /dev/null +++ b/app/test/test_threads.c @@ -0,0 +1,89 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright (C) 2022 Microsoft Corporation + */ + +#include +#include + +#include +#include + +#include "test.h" + +RTE_LOG_REGISTER(threads_logtype_test, test.threads, INFO); + +static uint32_t thread_id_ready; + +static void * +thread_main(void *arg) +{ + *(rte_thread_t *)arg = rte_thread_self(); + __atomic_store_n(&thread_id_ready, 1, __ATOMIC_RELEASE); + + return NULL; +} + +static int +test_thread_affinity(void) +{ + pthread_t id; + rte_thread_t thread_id; + + RTE_TEST_ASSERT(pthread_create(&id, NULL, thread_main, &thread_id) == 0, + "Failed to create thread"); + + while (__atomic_load_n(&thread_id_ready, __ATOMIC_ACQUIRE) == 0) + ; + + rte_cpuset_t cpuset0; + RTE_TEST_ASSERT(rte_thread_get_affinity_by_id(thread_id, &cpuset0) == 0, + "Failed to get thread affinity"); + + rte_cpuset_t cpuset1; + RTE_TEST_ASSERT(rte_thread_get_affinity_by_id(thread_id, &cpuset1) == 0, + "Failed to get thread affinity"); + RTE_TEST_ASSERT(memcmp(&cpuset0, &cpuset1, sizeof(rte_cpuset_t)) == 0, + "Affinity should be stable"); + + RTE_TEST_ASSERT(rte_thread_set_affinity_by_id(thread_id, &cpuset1) == 0, + "Failed to set thread affinity"); + RTE_TEST_ASSERT(rte_thread_get_affinity_by_id(thread_id, &cpuset0) == 0, + "Failed to get thread affinity"); + RTE_TEST_ASSERT(memcmp(&cpuset0, &cpuset1, sizeof(rte_cpuset_t)) == 0, + "Affinity should be stable"); + + size_t i; + for (i = 1; i < CPU_SETSIZE; i++) + if (CPU_ISSET(i, &cpuset0)) { + CPU_ZERO(&cpuset0); + CPU_SET(i, &cpuset0); + + break; + } + RTE_TEST_ASSERT(rte_thread_set_affinity_by_id(thread_id, &cpuset0) == 0, + "Failed to set thread affinity"); + RTE_TEST_ASSERT(rte_thread_get_affinity_by_id(thread_id, &cpuset1) == 0, + "Failed to get thread affinity"); + RTE_TEST_ASSERT(memcmp(&cpuset0, &cpuset1, sizeof(rte_cpuset_t)) == 0, + "Affinity should be stable"); + + return 0; +} + +static struct unit_test_suite threads_test_suite = { + .suite_name = "threads autotest", + .setup = NULL, + .teardown = NULL, + .unit_test_cases = { + TEST_CASE(test_thread_affinity), + TEST_CASES_END() + } +}; + +static int +test_threads(void) +{ + return unit_test_suite_runner(&threads_test_suite); +} + +REGISTER_TEST_COMMAND(threads_autotest, test_threads); -- 1.8.3.1
RE: [PATCH v6 03/16] vhost: add vhost msg support
HI David, Thanks for your reply. I will send out a version to address that. > -Original Message- > From: David Marchand > Sent: Monday, April 25, 2022 9:05 PM > To: Pei, Andy > Cc: dev ; Xia, Chenbo ; Maxime > Coquelin ; Cao, Gang > ; Liu, Changpeng > Subject: Re: [PATCH v6 03/16] vhost: add vhost msg support > > On Thu, Apr 21, 2022 at 11:20 AM Andy Pei wrote: > > diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c index > > 1d39067..3780804 100644 > > --- a/lib/vhost/vhost_user.c > > +++ b/lib/vhost/vhost_user.c > > @@ -80,6 +80,8 @@ > > [VHOST_USER_NET_SET_MTU] = "VHOST_USER_NET_SET_MTU", > > [VHOST_USER_SET_SLAVE_REQ_FD] = > "VHOST_USER_SET_SLAVE_REQ_FD", > > [VHOST_USER_IOTLB_MSG] = "VHOST_USER_IOTLB_MSG", > > + [VHOST_USER_GET_CONFIG] = "VHOST_USER_GET_CONFIG", > > + [VHOST_USER_SET_CONFIG] = "VHOST_USER_SET_CONFIG", > > [VHOST_USER_CRYPTO_CREATE_SESS] = > "VHOST_USER_CRYPTO_CREATE_SESS", > > [VHOST_USER_CRYPTO_CLOSE_SESS] = > "VHOST_USER_CRYPTO_CLOSE_SESS", > > [VHOST_USER_POSTCOPY_ADVISE] = > "VHOST_USER_POSTCOPY_ADVISE", > > @@ -2542,6 +2544,71 @@ static int is_vring_iotlb(struct virtio_net > > *dev, } > > > > static int > > +vhost_user_get_config(struct virtio_net **pdev, > > + struct vhu_msg_context *ctx, > > + int main_fd __rte_unused) { > > + struct virtio_net *dev = *pdev; > > + struct rte_vdpa_device *vdpa_dev = dev->vdpa_dev; > > + int ret = 0; > > You must check if there is any fd attached to this message. > > > > + > > + if (vdpa_dev->ops->get_config) { > > + ret = vdpa_dev->ops->get_config(dev->vid, > > + ctx->msg.payload.cfg.region, > > + ctx->msg.payload.cfg.size); > > + if (ret != 0) { > > + ctx->msg.size = 0; > > + VHOST_LOG_CONFIG(ERR, > > +"(%s) get_config() return > > error!\n", > > +dev->ifname); > > + } > > + } else { > > + VHOST_LOG_CONFIG(ERR, "(%s) get_config() not supportted!\n", > > +dev->ifname); > > + } > > + > > + return RTE_VHOST_MSG_RESULT_REPLY; } > > + > > +static int > > +vhost_user_set_config(struct virtio_net **pdev, > > + struct vhu_msg_context *ctx, > > + int main_fd __rte_unused) { > > + struct virtio_net *dev = *pdev; > > + struct rte_vdpa_device *vdpa_dev = dev->vdpa_dev; > > + int ret = 0; > > Idem. > > > > + > > + if (ctx->msg.size != sizeof(struct vhost_user_config)) { > > + VHOST_LOG_CONFIG(ERR, > > + "(%s) invalid set config msg size: %"PRId32" != > > %d\n", > > + dev->ifname, ctx->msg.size, > > + (int)sizeof(struct vhost_user_config)); > > + goto OUT; > > + } > > > For info, I posted a series to make this kind of check more systematic. > See: > https://patchwork.dpdk.org/project/dpdk/patch/20220425125431.26464-2- > david.march...@redhat.com/ > > > > -- > David Marchand
Re: [PATCH v4 01/14] bus/vmbus: move independent code from Linux
Sure Stephen. I will change it to unix. On Tue, 19 Apr 2022, 8:19 pm Stephen Hemminger, wrote: > On Mon, 18 Apr 2022 09:59:02 +0530 > Srikanth Kaka wrote: > > > Move the OS independent code from Linux dir in-order to be used > > by FreeBSD > > > > Signed-off-by: Srikanth Kaka > > Signed-off-by: Vag Singh > > Signed-off-by: Anand Thulasiram > > --- > > drivers/bus/vmbus/linux/vmbus_bus.c | 13 + > > drivers/bus/vmbus/meson.build | 5 + > > drivers/bus/vmbus/osi/vmbus_osi.h | 11 +++ > > drivers/bus/vmbus/osi/vmbus_osi_bus.c | 20 > > 4 files changed, 37 insertions(+), 12 deletions(-) > > create mode 100644 drivers/bus/vmbus/osi/vmbus_osi.h > > create mode 100644 drivers/bus/vmbus/osi/vmbus_osi_bus.c > > > > diff --git a/drivers/bus/vmbus/linux/vmbus_bus.c > b/drivers/bus/vmbus/linux/vmbus_bus.c > > index f502783f7a..c9a07041a7 100644 > > --- a/drivers/bus/vmbus/linux/vmbus_bus.c > > +++ b/drivers/bus/vmbus/linux/vmbus_bus.c > > @@ -21,22 +21,11 @@ > > > > #include "eal_filesystem.h" > > #include "private.h" > > +#include "vmbus_osi.h" > > > > /** Pathname of VMBUS devices directory. */ > > #define SYSFS_VMBUS_DEVICES "/sys/bus/vmbus/devices" > > > > -/* > > - * GUID associated with network devices > > - * {f8615163-df3e-46c5-913f-f2d2f965ed0e} > > - */ > > -static const rte_uuid_t vmbus_nic_uuid = { > > - 0xf8, 0x61, 0x51, 0x63, > > - 0xdf, 0x3e, > > - 0x46, 0xc5, > > - 0x91, 0x3f, > > - 0xf2, 0xd2, 0xf9, 0x65, 0xed, 0xe > > -}; > > - > > extern struct rte_vmbus_bus rte_vmbus_bus; > > > > /* Read sysfs file to get UUID */ > > diff --git a/drivers/bus/vmbus/meson.build > b/drivers/bus/vmbus/meson.build > > index 3892cbf67f..cbcba44e16 100644 > > --- a/drivers/bus/vmbus/meson.build > > +++ b/drivers/bus/vmbus/meson.build > > @@ -16,6 +16,11 @@ sources = files( > > 'vmbus_common_uio.c', > > ) > > > > +includes += include_directories('osi') > > +sources += files( > > + 'osi/vmbus_osi_bus.c' > > +) > > + > > if is_linux > > sources += files('linux/vmbus_bus.c', > > 'linux/vmbus_uio.c') > > diff --git a/drivers/bus/vmbus/osi/vmbus_osi.h > b/drivers/bus/vmbus/osi/vmbus_osi.h > > new file mode 100644 > > index 00..2db9399181 > > --- /dev/null > > +++ b/drivers/bus/vmbus/osi/vmbus_osi.h > > Having common code is good, we are already doing it now in DPDK EAL. > But the name osi seems odd to me. > Could you use unix instead (same as EAL) > >drivers/bus/vmbus/unix/vmbus.h > > Or drivers/bus/vmbus/common >
Reuse Of lcore after returning from its worker thread
Hi, As per my understanding "*rte_eal_wait_lcore" *is a blocking call in case of lcore state running. 1. Is there any direct way to reuse the lcore which we returned from a worker thread? 2. Technically is there any issue in reusing the lcore by some means?
Re: [PATCH] net/nfp: remove unneeded header inclusion
Hi David, Thanks for your work. On 2022-04-08 11:41:16 +0200, David Marchand wrote: > Looking at this driver history, there was never a need for including > execinfo.h. > > Signed-off-by: David Marchand Reviewed-by: Niklas Söderlund > --- > drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c | 3 --- > 1 file changed, 3 deletions(-) > > diff --git a/drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c > b/drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c > index bad80a5a1c..08bc4e8ef2 100644 > --- a/drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c > +++ b/drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c > @@ -16,9 +16,6 @@ > > #include > #include > -#if defined(RTE_BACKTRACE) > -#include > -#endif > #include > #include > #include > -- > 2.23.0 > -- Kind Regards, Niklas Söderlund
[PATCH v5 00/14] add FreeBSD support to VMBUS & NetVSC PMDs
This patchset requires FreeBSD VMBus kernel changes and HV_UIO driver. Both are currently under review at https://reviews.freebsd.org/D32184 Changelog: v5: - renamed dir osi to unix - marked a newly added API as experimental - removed camel case variables v4: - moved OS independent code out of Linux v3: - split the patches into further logical parts - updated docs v2: - replaced strncpy with memcpy - replaced malloc.h with stdlib.h - added comment in linux/vmbus_uio.c v1: Intial release Srikanth Kaka (14): bus/vmbus: move independent code from Linux bus/vmbus: move independent bus functions bus/vmbus: move OS independent UIO functions bus/vmbus: scan and get the network device on FreeBSD bus/vmbus: handle mapping of device resources bus/vmbus: get device resource values using sysctl net/netvsc: make event monitor OS dependent bus/vmbus: add sub-channel mapping support bus/vmbus: open subchannels net/netvsc: make IOCTL call to open subchannels bus/vmbus: get subchannel info net/netvsc: moving hotplug retry to OS dir bus/vmbus: add meson support for FreeBSD bus/vmbus: update MAINTAINERS and docs MAINTAINERS | 2 + doc/guides/nics/netvsc.rst | 11 ++ drivers/bus/vmbus/freebsd/vmbus_bus.c | 286 drivers/bus/vmbus/freebsd/vmbus_uio.c | 256 + drivers/bus/vmbus/linux/vmbus_bus.c | 28 +-- drivers/bus/vmbus/linux/vmbus_uio.c | 320 drivers/bus/vmbus/meson.build | 12 +- drivers/bus/vmbus/private.h | 1 + drivers/bus/vmbus/rte_bus_vmbus.h | 11 ++ drivers/bus/vmbus/unix/vmbus_unix.h | 27 +++ drivers/bus/vmbus/unix/vmbus_unix_bus.c | 37 drivers/bus/vmbus/unix/vmbus_unix_uio.c | 310 +++ drivers/bus/vmbus/version.map | 6 + drivers/bus/vmbus/vmbus_channel.c | 5 + drivers/net/netvsc/freebsd/hn_os.c | 21 +++ drivers/net/netvsc/freebsd/meson.build | 6 + drivers/net/netvsc/hn_ethdev.c | 95 +- drivers/net/netvsc/hn_os.h | 8 + drivers/net/netvsc/linux/hn_os.c| 111 +++ drivers/net/netvsc/linux/meson.build| 6 + drivers/net/netvsc/meson.build | 3 + 21 files changed, 1164 insertions(+), 398 deletions(-) create mode 100644 drivers/bus/vmbus/freebsd/vmbus_bus.c create mode 100644 drivers/bus/vmbus/freebsd/vmbus_uio.c create mode 100644 drivers/bus/vmbus/unix/vmbus_unix.h create mode 100644 drivers/bus/vmbus/unix/vmbus_unix_bus.c create mode 100644 drivers/bus/vmbus/unix/vmbus_unix_uio.c create mode 100644 drivers/net/netvsc/freebsd/hn_os.c create mode 100644 drivers/net/netvsc/freebsd/meson.build create mode 100644 drivers/net/netvsc/hn_os.h create mode 100644 drivers/net/netvsc/linux/hn_os.c create mode 100644 drivers/net/netvsc/linux/meson.build -- 1.8.3.1
[PATCH v5 01/14] bus/vmbus: move independent code from Linux
Move the OS independent code from Linux dir in-order to be used by FreeBSD Signed-off-by: Srikanth Kaka Signed-off-by: Vag Singh Signed-off-by: Anand Thulasiram --- drivers/bus/vmbus/linux/vmbus_bus.c | 13 + drivers/bus/vmbus/meson.build | 5 + drivers/bus/vmbus/unix/vmbus_unix.h | 11 +++ drivers/bus/vmbus/unix/vmbus_unix_bus.c | 20 4 files changed, 37 insertions(+), 12 deletions(-) create mode 100644 drivers/bus/vmbus/unix/vmbus_unix.h create mode 100644 drivers/bus/vmbus/unix/vmbus_unix_bus.c diff --git a/drivers/bus/vmbus/linux/vmbus_bus.c b/drivers/bus/vmbus/linux/vmbus_bus.c index f502783..e649537 100644 --- a/drivers/bus/vmbus/linux/vmbus_bus.c +++ b/drivers/bus/vmbus/linux/vmbus_bus.c @@ -21,22 +21,11 @@ #include "eal_filesystem.h" #include "private.h" +#include "vmbus_unix.h" /** Pathname of VMBUS devices directory. */ #define SYSFS_VMBUS_DEVICES "/sys/bus/vmbus/devices" -/* - * GUID associated with network devices - * {f8615163-df3e-46c5-913f-f2d2f965ed0e} - */ -static const rte_uuid_t vmbus_nic_uuid = { - 0xf8, 0x61, 0x51, 0x63, - 0xdf, 0x3e, - 0x46, 0xc5, - 0x91, 0x3f, - 0xf2, 0xd2, 0xf9, 0x65, 0xed, 0xe -}; - extern struct rte_vmbus_bus rte_vmbus_bus; /* Read sysfs file to get UUID */ diff --git a/drivers/bus/vmbus/meson.build b/drivers/bus/vmbus/meson.build index 3892cbf..01ef01f 100644 --- a/drivers/bus/vmbus/meson.build +++ b/drivers/bus/vmbus/meson.build @@ -16,6 +16,11 @@ sources = files( 'vmbus_common_uio.c', ) +includes += include_directories('unix') +sources += files( + 'unix/vmbus_unix_bus.c' +) + if is_linux sources += files('linux/vmbus_bus.c', 'linux/vmbus_uio.c') diff --git a/drivers/bus/vmbus/unix/vmbus_unix.h b/drivers/bus/vmbus/unix/vmbus_unix.h new file mode 100644 index 000..2db9399 --- /dev/null +++ b/drivers/bus/vmbus/unix/vmbus_unix.h @@ -0,0 +1,11 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright (c) 2018, Microsoft Corporation. + * All Rights Reserved. + */ + +#ifndef _VMBUS_BUS_UNIX_H_ +#define _VMBUS_BUS_UNIX_H_ + +extern const rte_uuid_t vmbus_nic_uuid; + +#endif /* _VMBUS_BUS_UNIX_H_ */ diff --git a/drivers/bus/vmbus/unix/vmbus_unix_bus.c b/drivers/bus/vmbus/unix/vmbus_unix_bus.c new file mode 100644 index 000..f76a361 --- /dev/null +++ b/drivers/bus/vmbus/unix/vmbus_unix_bus.c @@ -0,0 +1,20 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright (c) 2018, Microsoft Corporation. + * All Rights Reserved. + */ + +#include + +#include "vmbus_unix.h" + +/* + * GUID associated with network devices + * {f8615163-df3e-46c5-913f-f2d2f965ed0e} + */ +const rte_uuid_t vmbus_nic_uuid = { + 0xf8, 0x61, 0x51, 0x63, + 0xdf, 0x3e, + 0x46, 0xc5, + 0x91, 0x3f, + 0xf2, 0xd2, 0xf9, 0x65, 0xed, 0xe +}; -- 1.8.3.1
[PATCH v5 02/14] bus/vmbus: move independent bus functions
move independent Linux bus functions to OS independent file Signed-off-by: Srikanth Kaka Signed-off-by: Vag Singh Signed-off-by: Anand Thulasiram --- drivers/bus/vmbus/linux/vmbus_bus.c | 15 --- drivers/bus/vmbus/unix/vmbus_unix_bus.c | 17 + 2 files changed, 17 insertions(+), 15 deletions(-) diff --git a/drivers/bus/vmbus/linux/vmbus_bus.c b/drivers/bus/vmbus/linux/vmbus_bus.c index e649537..18233a5 100644 --- a/drivers/bus/vmbus/linux/vmbus_bus.c +++ b/drivers/bus/vmbus/linux/vmbus_bus.c @@ -358,18 +358,3 @@ closedir(dir); return -1; } - -void rte_vmbus_irq_mask(struct rte_vmbus_device *device) -{ - vmbus_uio_irq_control(device, 1); -} - -void rte_vmbus_irq_unmask(struct rte_vmbus_device *device) -{ - vmbus_uio_irq_control(device, 0); -} - -int rte_vmbus_irq_read(struct rte_vmbus_device *device) -{ - return vmbus_uio_irq_read(device); -} diff --git a/drivers/bus/vmbus/unix/vmbus_unix_bus.c b/drivers/bus/vmbus/unix/vmbus_unix_bus.c index f76a361..96cb968 100644 --- a/drivers/bus/vmbus/unix/vmbus_unix_bus.c +++ b/drivers/bus/vmbus/unix/vmbus_unix_bus.c @@ -3,8 +3,10 @@ * All Rights Reserved. */ +#include #include +#include "private.h" #include "vmbus_unix.h" /* @@ -18,3 +20,18 @@ 0x91, 0x3f, 0xf2, 0xd2, 0xf9, 0x65, 0xed, 0xe }; + +void rte_vmbus_irq_mask(struct rte_vmbus_device *device) +{ + vmbus_uio_irq_control(device, 1); +} + +void rte_vmbus_irq_unmask(struct rte_vmbus_device *device) +{ + vmbus_uio_irq_control(device, 0); +} + +int rte_vmbus_irq_read(struct rte_vmbus_device *device) +{ + return vmbus_uio_irq_read(device); +} -- 1.8.3.1
[PATCH v5 03/14] bus/vmbus: move OS independent UIO functions
Moved all Linux independent UIO functions to unix dir. Split the vmbus_uio_map_subchan() by keeping OS dependent code in vmbus_uio_map_subchan_os() function Signed-off-by: Srikanth Kaka Signed-off-by: Vag Singh Signed-off-by: Anand Thulasiram --- drivers/bus/vmbus/linux/vmbus_uio.c | 292 ++ drivers/bus/vmbus/meson.build | 3 +- drivers/bus/vmbus/unix/vmbus_unix.h | 12 ++ drivers/bus/vmbus/unix/vmbus_unix_uio.c | 306 4 files changed, 330 insertions(+), 283 deletions(-) create mode 100644 drivers/bus/vmbus/unix/vmbus_unix_uio.c diff --git a/drivers/bus/vmbus/linux/vmbus_uio.c b/drivers/bus/vmbus/linux/vmbus_uio.c index 5db70f8..b5d15c9 100644 --- a/drivers/bus/vmbus/linux/vmbus_uio.c +++ b/drivers/bus/vmbus/linux/vmbus_uio.c @@ -21,233 +21,18 @@ #include #include "private.h" +#include "vmbus_unix.h" /** Pathname of VMBUS devices directory. */ #define SYSFS_VMBUS_DEVICES "/sys/bus/vmbus/devices" -static void *vmbus_map_addr; - -/* Control interrupts */ -void vmbus_uio_irq_control(struct rte_vmbus_device *dev, int32_t onoff) -{ - if ((rte_intr_fd_get(dev->intr_handle) < 0) || - write(rte_intr_fd_get(dev->intr_handle), &onoff, - sizeof(onoff)) < 0) { - VMBUS_LOG(ERR, "cannot write to %d:%s", - rte_intr_fd_get(dev->intr_handle), - strerror(errno)); - } -} - -int vmbus_uio_irq_read(struct rte_vmbus_device *dev) -{ - int32_t count; - int cc; - - if (rte_intr_fd_get(dev->intr_handle) < 0) - return -1; - - cc = read(rte_intr_fd_get(dev->intr_handle), &count, - sizeof(count)); - if (cc < (int)sizeof(count)) { - if (cc < 0) { - VMBUS_LOG(ERR, "IRQ read failed %s", - strerror(errno)); - return -errno; - } - VMBUS_LOG(ERR, "can't read IRQ count"); - return -EINVAL; - } - - return count; -} - -void -vmbus_uio_free_resource(struct rte_vmbus_device *dev, - struct mapped_vmbus_resource *uio_res) -{ - rte_free(uio_res); - - if (rte_intr_dev_fd_get(dev->intr_handle) >= 0) { - close(rte_intr_dev_fd_get(dev->intr_handle)); - rte_intr_dev_fd_set(dev->intr_handle, -1); - } - - if (rte_intr_fd_get(dev->intr_handle) >= 0) { - close(rte_intr_fd_get(dev->intr_handle)); - rte_intr_fd_set(dev->intr_handle, -1); - rte_intr_type_set(dev->intr_handle, RTE_INTR_HANDLE_UNKNOWN); - } -} - -int -vmbus_uio_alloc_resource(struct rte_vmbus_device *dev, -struct mapped_vmbus_resource **uio_res) -{ - char devname[PATH_MAX]; /* contains the /dev/uioX */ - int fd; - - /* save fd if in primary process */ - snprintf(devname, sizeof(devname), "/dev/uio%u", dev->uio_num); - fd = open(devname, O_RDWR); - if (fd < 0) { - VMBUS_LOG(ERR, "Cannot open %s: %s", - devname, strerror(errno)); - goto error; - } - - if (rte_intr_fd_set(dev->intr_handle, fd)) - goto error; - - if (rte_intr_type_set(dev->intr_handle, RTE_INTR_HANDLE_UIO_INTX)) - goto error; - - /* allocate the mapping details for secondary processes*/ - *uio_res = rte_zmalloc("UIO_RES", sizeof(**uio_res), 0); - if (*uio_res == NULL) { - VMBUS_LOG(ERR, "cannot store uio mmap details"); - goto error; - } - - strlcpy((*uio_res)->path, devname, PATH_MAX); - rte_uuid_copy((*uio_res)->id, dev->device_id); - - return 0; - -error: - vmbus_uio_free_resource(dev, *uio_res); - return -1; -} - -static int -find_max_end_va(const struct rte_memseg_list *msl, void *arg) -{ - size_t sz = msl->memseg_arr.len * msl->page_sz; - void *end_va = RTE_PTR_ADD(msl->base_va, sz); - void **max_va = arg; - - if (*max_va < end_va) - *max_va = end_va; - return 0; -} - -/* - * TODO: this should be part of memseg api. - * code is duplicated from PCI. - */ -static void * -vmbus_find_max_end_va(void) -{ - void *va = NULL; - - rte_memseg_list_walk(find_max_end_va, &va); - return va; -} - -int -vmbus_uio_map_resource_by_index(struct rte_vmbus_device *dev, int idx, - struct mapped_vmbus_resource *uio_res, - int flags) -{ - size_t size = dev->resource[idx].len; - struct vmbus_map *maps = uio_res->maps; - void *mapaddr; - off_t offset; - int fd; - - /* devname for mmap */ - fd = open(uio_res->path, O_RDWR); - if (fd < 0) { - VMBUS_LOG(ERR, "Cannot open %s: %s", - uio_res->path, stre
[PATCH v5 04/14] bus/vmbus: scan and get the network device on FreeBSD
Using sysctl, all the devices on the VMBUS are identified by the PMD. On finding the Network device's device id, it is added to VMBUS dev list. Signed-off-by: Srikanth Kaka Signed-off-by: Vag Singh Signed-off-by: Anand Thulasiram --- drivers/bus/vmbus/freebsd/vmbus_bus.c | 268 ++ 1 file changed, 268 insertions(+) create mode 100644 drivers/bus/vmbus/freebsd/vmbus_bus.c diff --git a/drivers/bus/vmbus/freebsd/vmbus_bus.c b/drivers/bus/vmbus/freebsd/vmbus_bus.c new file mode 100644 index 000..c1a3a5f --- /dev/null +++ b/drivers/bus/vmbus/freebsd/vmbus_bus.c @@ -0,0 +1,268 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright (c) 2018, Microsoft Corporation. + * All Rights Reserved. + */ + +#include +#include + +#include "private.h" +#include "vmbus_unix.h" + +#include +#include +#include + +/* Parse UUID. Caller must pass NULL terminated string */ +static int +parse_sysfs_uuid(const char *filename, rte_uuid_t uu) +{ + char in[BUFSIZ]; + + memcpy(in, filename, BUFSIZ); + if (rte_uuid_parse(in, uu) < 0) { + VMBUS_LOG(ERR, "%s not a valid UUID", in); + return -1; + } + + return 0; +} + +/* Scan one vmbus entry, and fill the devices list from it. */ +static int +vmbus_scan_one(const char *name, unsigned int unit_num) +{ + struct rte_vmbus_device *dev, *dev2; + char sysctl_buf[PATH_MAX], sysctl_var[PATH_MAX]; + size_t guid_len = 36, len = PATH_MAX; + char classid[guid_len + 1], deviceid[guid_len + 1]; + + dev = calloc(1, sizeof(*dev)); + if (dev == NULL) + return -1; + + /* get class id and device id */ + snprintf(sysctl_var, len, "dev.%s.%u.%%pnpinfo", name, unit_num); + if (sysctlbyname(sysctl_var, &sysctl_buf, &len, NULL, 0) < 0) + goto error; + + /* pnpinfo: classid=f912ad6d-2b17-48ea-bd65-f927a61c7684 +* deviceid=d34b2567-b9b6-42b9-8778-0a4ec0b955bf +*/ + if (sysctl_buf[0] == 'c' && sysctl_buf[1] == 'l' && + sysctl_buf[7] == '=') { + memcpy(classid, &sysctl_buf[8], guid_len); + classid[guid_len] = '\0'; + } + if (parse_sysfs_uuid(classid, dev->class_id) < 0) + goto error; + + /* skip non-network devices */ + if (rte_uuid_compare(dev->class_id, vmbus_nic_uuid) != 0) { + free(dev); + return 0; + } + + if (sysctl_buf[45] == 'd' && sysctl_buf[46] == 'e' && + sysctl_buf[47] == 'v' && sysctl_buf[53] == '=') { + memcpy(deviceid, &sysctl_buf[54], guid_len); + deviceid[guid_len] = '\0'; + } + if (parse_sysfs_uuid(deviceid, dev->device_id) < 0) + goto error; + + if (!strcmp(name, "hv_uio")) + dev->uio_num = unit_num; + else + dev->uio_num = -1; + dev->device.bus = &rte_vmbus_bus.bus; + dev->device.numa_node = 0; + dev->device.name = strdup(deviceid); + if (!dev->device.name) + goto error; + + dev->device.devargs = vmbus_devargs_lookup(dev); + + dev->intr_handle = rte_intr_instance_alloc(RTE_INTR_INSTANCE_F_PRIVATE); + if (dev->intr_handle == NULL) + goto error; + + /* device is valid, add in list (sorted) */ + VMBUS_LOG(DEBUG, "Adding vmbus device %s", name); + + TAILQ_FOREACH(dev2, &rte_vmbus_bus.device_list, next) { + int ret; + + ret = rte_uuid_compare(dev->device_id, dev2->device_id); + if (ret > 0) + continue; + + if (ret < 0) { + vmbus_insert_device(dev2, dev); + } else { /* already registered */ + VMBUS_LOG(NOTICE, + "%s already registered", name); + free(dev); + } + return 0; + } + + vmbus_add_device(dev); + return 0; +error: + VMBUS_LOG(DEBUG, "failed"); + + free(dev); + return -1; +} + +static int +vmbus_unpack(char *walker, char *ep, char **str) +{ + int ret = 0; + + *str = strdup(walker); + if (*str == NULL) { + ret = -ENOMEM; + goto exit; + } + + if (walker + strnlen(walker, ep - walker) >= ep) { + ret = -EINVAL; + goto exit; + } +exit: + return ret; +} + +/* + * Scan the content of the vmbus, and the devices in the devices list + */ +int +rte_vmbus_scan(void) +{ + struct u_device udev; + struct u_businfo ubus; + int dev_idx, dev_ptr, name2oid[2], oid[CTL_MAXNAME + 12], error; + size_t oidlen, rlen, ub_size; + uintptr_t vmbus_handle = 0; + char *walker, *ep; + char name[16] = "hw.bus.devices"; + char *dd_name, *dd_desc, *dd_drivername, *dd_pnpinfo, *dd_location; + + /* +* devinfo F
[PATCH v5 05/14] bus/vmbus: handle mapping of device resources
All resource values are published by HV_UIO driver as sysctl key value pairs and they are read at a later point of the code flow Signed-off-by: Srikanth Kaka Signed-off-by: Vag Singh Signed-off-by: Anand Thulasiram --- drivers/bus/vmbus/freebsd/vmbus_bus.c | 18 ++ 1 file changed, 18 insertions(+) diff --git a/drivers/bus/vmbus/freebsd/vmbus_bus.c b/drivers/bus/vmbus/freebsd/vmbus_bus.c index c1a3a5f..28f5ff4 100644 --- a/drivers/bus/vmbus/freebsd/vmbus_bus.c +++ b/drivers/bus/vmbus/freebsd/vmbus_bus.c @@ -28,6 +28,24 @@ return 0; } +/* map the resources of a vmbus device in virtual memory */ +int +rte_vmbus_map_device(struct rte_vmbus_device *dev) +{ + if (dev->uio_num < 0) { + VMBUS_LOG(DEBUG, "Not managed by UIO driver, skipped"); + return 1; + } + + return vmbus_uio_map_resource(dev); +} + +void +rte_vmbus_unmap_device(struct rte_vmbus_device *dev) +{ + vmbus_uio_unmap_resource(dev); +} + /* Scan one vmbus entry, and fill the devices list from it. */ static int vmbus_scan_one(const char *name, unsigned int unit_num) -- 1.8.3.1
[PATCH v5 06/14] bus/vmbus: get device resource values using sysctl
The UIO device's attribute (relid, monitor id, etc) values are retrieved using following sysctl variables: $ sysctl dev.hv_uio.0 dev.hv_uio.0.send_buf.gpadl: 925241 dev.hv_uio.0.send_buf.size: 16777216 dev.hv_uio.0.recv_buf.gpadl: 925240 dev.hv_uio.0.recv_buf.size: 32505856 dev.hv_uio.0.monitor_page.size: 4096 dev.hv_uio.0.int_page.size: 4096 Signed-off-by: Srikanth Kaka Signed-off-by: Vag Singh Signed-off-by: Anand Thulasiram --- drivers/bus/vmbus/freebsd/vmbus_uio.c | 105 drivers/bus/vmbus/linux/vmbus_uio.c | 16 + drivers/bus/vmbus/unix/vmbus_unix.h | 4 ++ drivers/bus/vmbus/unix/vmbus_unix_uio.c | 6 +- 4 files changed, 130 insertions(+), 1 deletion(-) create mode 100644 drivers/bus/vmbus/freebsd/vmbus_uio.c diff --git a/drivers/bus/vmbus/freebsd/vmbus_uio.c b/drivers/bus/vmbus/freebsd/vmbus_uio.c new file mode 100644 index 000..0544371 --- /dev/null +++ b/drivers/bus/vmbus/freebsd/vmbus_uio.c @@ -0,0 +1,105 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright (c) 2018, Microsoft Corporation. + * All Rights Reserved. + */ + +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include + +#include "private.h" +#include "vmbus_unix.h" + +const char *driver_name = "hv_uio"; + +/* Check map names with kernel names */ +static const char *map_names[VMBUS_MAX_RESOURCE] = { + [HV_TXRX_RING_MAP] = "txrx_rings", + [HV_INT_PAGE_MAP] = "int_page", + [HV_MON_PAGE_MAP] = "monitor_page", + [HV_RECV_BUF_MAP] = "recv_buf", + [HV_SEND_BUF_MAP] = "send_buf", +}; + +static int +sysctl_get_vmbus_device_info(struct rte_vmbus_device *dev) +{ + char sysctl_buf[PATH_MAX]; + char sysctl_var[PATH_MAX]; + size_t len = PATH_MAX, sysctl_len; + unsigned long tmp; + int i; + + snprintf(sysctl_buf, len, "dev.%s.%d", driver_name, dev->uio_num); + + sysctl_len = sizeof(unsigned long); + /* get relid */ + snprintf(sysctl_var, len, "%s.channel.ch_id", sysctl_buf); + if (sysctlbyname(sysctl_var, &tmp, &sysctl_len, NULL, 0) < 0) { + VMBUS_LOG(ERR, "could not read %s", sysctl_var); + goto error; + } + dev->relid = tmp; + + /* get monitor id */ + snprintf(sysctl_var, len, "%s.channel.%u.monitor_id", sysctl_buf, +dev->relid); + if (sysctlbyname(sysctl_var, &tmp, &sysctl_len, NULL, 0) < 0) { + VMBUS_LOG(ERR, "could not read %s", sysctl_var); + goto error; + } + dev->monitor_id = tmp; + + /* Extract resource value */ + for (i = 0; i < VMBUS_MAX_RESOURCE; i++) { + struct rte_mem_resource *res = &dev->resource[i]; + unsigned long size, gpad = 0; + size_t sizelen = sizeof(len); + + snprintf(sysctl_var, sizeof(sysctl_var), "%s.%s.size", +sysctl_buf, map_names[i]); + if (sysctlbyname(sysctl_var, &size, &sizelen, NULL, 0) < 0) { + VMBUS_LOG(ERR, + "could not read %s", sysctl_var); + goto error; + } + res->len = size; + + if (i == HV_RECV_BUF_MAP || i == HV_SEND_BUF_MAP) { + snprintf(sysctl_var, sizeof(sysctl_var), "%s.%s.gpadl", +sysctl_buf, map_names[i]); + if (sysctlbyname(sysctl_var, &gpad, &sizelen, NULL, 0) < 0) { + VMBUS_LOG(ERR, + "could not read %s", sysctl_var); + goto error; + } + /* put the GPAD value in physical address */ + res->phys_addr = gpad; + } + } + return 0; +error: + return -1; +} + +/* + * On FreeBSD, the device is opened first to ensure kernel UIO driver + * is properly initialized before reading device attributes + */ +int vmbus_get_device_info_os(struct rte_vmbus_device *dev) +{ + return sysctl_get_vmbus_device_info(dev); +} + +const char *get_devname_os(void) +{ + return "/dev/hv_uio"; +} diff --git a/drivers/bus/vmbus/linux/vmbus_uio.c b/drivers/bus/vmbus/linux/vmbus_uio.c index b5d15c9..69f0b26 100644 --- a/drivers/bus/vmbus/linux/vmbus_uio.c +++ b/drivers/bus/vmbus/linux/vmbus_uio.c @@ -199,3 +199,19 @@ int vmbus_uio_get_subchan(struct vmbus_channel *primary, closedir(chan_dir); return err; } + +/* + * In Linux the device info is fetched from SYSFS and doesn't need + * opening of the device before reading its attributes + * This is a stub function and it should always succeed. + */ +int vmbus_get_device_info_os(struct rte_vmbus_device *dev) +{ + RTE_SET_USED(dev); + return 0; +} + +const char *get_devname_os(void) +{ + return "/dev/uio"; +} diff --git a/drivers/bus/v
[PATCH v5 07/14] net/netvsc: make event monitor OS dependent
- Event monitoring is not yet supported on FreeBSD, hence moving it to the OS specific files - Add meson support to OS environment Signed-off-by: Srikanth Kaka Signed-off-by: Vag Singh Signed-off-by: Anand Thulasiram --- drivers/net/netvsc/freebsd/hn_os.c | 16 drivers/net/netvsc/freebsd/meson.build | 6 ++ drivers/net/netvsc/hn_ethdev.c | 7 +++ drivers/net/netvsc/hn_os.h | 6 ++ drivers/net/netvsc/linux/hn_os.c | 21 + drivers/net/netvsc/linux/meson.build | 6 ++ drivers/net/netvsc/meson.build | 3 +++ 7 files changed, 61 insertions(+), 4 deletions(-) create mode 100644 drivers/net/netvsc/freebsd/hn_os.c create mode 100644 drivers/net/netvsc/freebsd/meson.build create mode 100644 drivers/net/netvsc/hn_os.h create mode 100644 drivers/net/netvsc/linux/hn_os.c create mode 100644 drivers/net/netvsc/linux/meson.build diff --git a/drivers/net/netvsc/freebsd/hn_os.c b/drivers/net/netvsc/freebsd/hn_os.c new file mode 100644 index 000..4c6a798 --- /dev/null +++ b/drivers/net/netvsc/freebsd/hn_os.c @@ -0,0 +1,16 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2016-2021 Microsoft Corporation + */ + +#include + +#include + +#include "hn_logs.h" +#include "hn_os.h" + +int eth_hn_os_dev_event(void) +{ + PMD_DRV_LOG(DEBUG, "rte_dev_event_monitor_start not supported on FreeBSD"); + return 0; +} diff --git a/drivers/net/netvsc/freebsd/meson.build b/drivers/net/netvsc/freebsd/meson.build new file mode 100644 index 000..78f824f --- /dev/null +++ b/drivers/net/netvsc/freebsd/meson.build @@ -0,0 +1,6 @@ +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2018 Microsoft Corporation + +sources += files( + 'hn_os.c', +) diff --git a/drivers/net/netvsc/hn_ethdev.c b/drivers/net/netvsc/hn_ethdev.c index 8a95040..8b1e07b 100644 --- a/drivers/net/netvsc/hn_ethdev.c +++ b/drivers/net/netvsc/hn_ethdev.c @@ -39,6 +39,7 @@ #include "hn_rndis.h" #include "hn_nvs.h" #include "ndis.h" +#include "hn_os.h" #define HN_TX_OFFLOAD_CAPS (RTE_ETH_TX_OFFLOAD_IPV4_CKSUM | \ RTE_ETH_TX_OFFLOAD_TCP_CKSUM | \ @@ -1240,11 +1241,9 @@ static int eth_hn_probe(struct rte_vmbus_driver *drv __rte_unused, PMD_INIT_FUNC_TRACE(); - ret = rte_dev_event_monitor_start(); - if (ret) { - PMD_DRV_LOG(ERR, "Failed to start device event monitoring"); + ret = eth_hn_os_dev_event(); + if (ret) return ret; - } eth_dev = eth_dev_vmbus_allocate(dev, sizeof(struct hn_data)); if (!eth_dev) diff --git a/drivers/net/netvsc/hn_os.h b/drivers/net/netvsc/hn_os.h new file mode 100644 index 000..618c53c --- /dev/null +++ b/drivers/net/netvsc/hn_os.h @@ -0,0 +1,6 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright (c) 2009-2021 Microsoft Corp. + * All rights reserved. + */ + +int eth_hn_os_dev_event(void); diff --git a/drivers/net/netvsc/linux/hn_os.c b/drivers/net/netvsc/linux/hn_os.c new file mode 100644 index 000..1ea12ce --- /dev/null +++ b/drivers/net/netvsc/linux/hn_os.c @@ -0,0 +1,21 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2016-2021 Microsoft Corporation + */ + +#include + +#include + +#include "hn_logs.h" +#include "hn_os.h" + +int eth_hn_os_dev_event(void) +{ + int ret; + + ret = rte_dev_event_monitor_start(); + if (ret) + PMD_DRV_LOG(ERR, "Failed to start device event monitoring"); + + return ret; +} diff --git a/drivers/net/netvsc/linux/meson.build b/drivers/net/netvsc/linux/meson.build new file mode 100644 index 000..78f824f --- /dev/null +++ b/drivers/net/netvsc/linux/meson.build @@ -0,0 +1,6 @@ +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2018 Microsoft Corporation + +sources += files( + 'hn_os.c', +) diff --git a/drivers/net/netvsc/meson.build b/drivers/net/netvsc/meson.build index bb6225d..c50414d 100644 --- a/drivers/net/netvsc/meson.build +++ b/drivers/net/netvsc/meson.build @@ -8,6 +8,7 @@ if is_windows endif deps += 'bus_vmbus' +includes += include_directories(exec_env) sources = files( 'hn_ethdev.c', 'hn_nvs.c', @@ -15,3 +16,5 @@ sources = files( 'hn_rxtx.c', 'hn_vf.c', ) + +subdir(exec_env) -- 1.8.3.1
[PATCH v5 08/14] bus/vmbus: add sub-channel mapping support
To map the subchannels, an mmap request is directly made after determining the subchan memory offset Signed-off-by: Srikanth Kaka Signed-off-by: Vag Singh Signed-off-by: Anand Thulasiram --- drivers/bus/vmbus/freebsd/vmbus_uio.c | 48 +++ 1 file changed, 48 insertions(+) diff --git a/drivers/bus/vmbus/freebsd/vmbus_uio.c b/drivers/bus/vmbus/freebsd/vmbus_uio.c index 0544371..55b8f18 100644 --- a/drivers/bus/vmbus/freebsd/vmbus_uio.c +++ b/drivers/bus/vmbus/freebsd/vmbus_uio.c @@ -18,6 +18,13 @@ #include "private.h" #include "vmbus_unix.h" +/* + * Macros to distinguish mmap request + * [7-0] - Device memory region + * [15-8]- Sub-channel id + */ +#define UH_SUBCHAN_MASK_SHIFT 8 + const char *driver_name = "hv_uio"; /* Check map names with kernel names */ @@ -99,6 +106,47 @@ int vmbus_get_device_info_os(struct rte_vmbus_device *dev) return sysctl_get_vmbus_device_info(dev); } +int vmbus_uio_map_subchan_os(const struct rte_vmbus_device *dev, +const struct vmbus_channel *chan, +void **mapaddr, size_t *size) +{ + char ring_path[PATH_MAX]; + off_t offset; + int fd; + + snprintf(ring_path, sizeof(ring_path), +"/dev/hv_uio%d", dev->uio_num); + + fd = open(ring_path, O_RDWR); + if (fd < 0) { + VMBUS_LOG(ERR, "Cannot open %s: %s", + ring_path, strerror(errno)); + return -errno; + } + + /* subchannel rings are of the same size as primary */ + *size = dev->resource[HV_TXRX_RING_MAP].len; + offset = (chan->relid << UH_SUBCHAN_MASK_SHIFT) * PAGE_SIZE; + + *mapaddr = vmbus_map_resource(vmbus_map_addr, fd, + offset, *size, 0); + close(fd); + + if (*mapaddr == MAP_FAILED) + return -EIO; + + return 0; +} + +/* This function should always succeed */ +bool vmbus_uio_subchannels_supported(const struct rte_vmbus_device *dev, +const struct vmbus_channel *chan) +{ + RTE_SET_USED(dev); + RTE_SET_USED(chan); + return true; +} + const char *get_devname_os(void) { return "/dev/hv_uio"; -- 1.8.3.1
[PATCH v5 09/14] bus/vmbus: open subchannels
In FreeBSD, unlike Linux there is no sub-channel open callback that could be called by HV_UIO driver upon their grant by the hypervisor. Thus the PMD makes an IOCTL to the HV_UIO to open the sub-channels On Linux, the vmbus_uio_subchan_open() will always return success as the Linux HV_UIO opens them implicitly. Signed-off-by: Srikanth Kaka Signed-off-by: Vag Singh Signed-off-by: Anand Thulasiram --- drivers/bus/vmbus/freebsd/vmbus_uio.c | 30 ++ drivers/bus/vmbus/linux/vmbus_uio.c | 12 drivers/bus/vmbus/private.h | 1 + drivers/bus/vmbus/rte_bus_vmbus.h | 11 +++ drivers/bus/vmbus/version.map | 6 ++ drivers/bus/vmbus/vmbus_channel.c | 5 + 6 files changed, 65 insertions(+) diff --git a/drivers/bus/vmbus/freebsd/vmbus_uio.c b/drivers/bus/vmbus/freebsd/vmbus_uio.c index 55b8f18..438db41 100644 --- a/drivers/bus/vmbus/freebsd/vmbus_uio.c +++ b/drivers/bus/vmbus/freebsd/vmbus_uio.c @@ -25,6 +25,9 @@ */ #define UH_SUBCHAN_MASK_SHIFT 8 +/* ioctl */ +#define HVIOOPENSUBCHAN _IOW('h', 14, uint32_t) + const char *driver_name = "hv_uio"; /* Check map names with kernel names */ @@ -151,3 +154,30 @@ const char *get_devname_os(void) { return "/dev/hv_uio"; } + +int vmbus_uio_subchan_open(struct rte_vmbus_device *dev, uint32_t subchan) +{ + struct mapped_vmbus_resource *uio_res; + int fd, err = 0; + + uio_res = vmbus_uio_find_resource(dev); + if (!uio_res) { + VMBUS_LOG(ERR, "cannot find uio resource"); + return -EINVAL; + } + + fd = open(uio_res->path, O_RDWR); + if (fd < 0) { + VMBUS_LOG(ERR, "Cannot open %s: %s", + uio_res->path, strerror(errno)); + return -1; + } + + if (ioctl(fd, HVIOOPENSUBCHAN, &subchan)) { + VMBUS_LOG(ERR, "open subchan ioctl failed %s: %s", + uio_res->path, strerror(errno)); + err = -1; + } + close(fd); + return err; +} diff --git a/drivers/bus/vmbus/linux/vmbus_uio.c b/drivers/bus/vmbus/linux/vmbus_uio.c index 69f0b26..b9616bd 100644 --- a/drivers/bus/vmbus/linux/vmbus_uio.c +++ b/drivers/bus/vmbus/linux/vmbus_uio.c @@ -215,3 +215,15 @@ const char *get_devname_os(void) { return "/dev/uio"; } + +/* + * This is a stub function and it should always succeed. + * The Linux UIO kernel driver opens the subchannels implicitly. + */ +int vmbus_uio_subchan_open(struct rte_vmbus_device *dev, + uint32_t subchan) +{ + RTE_SET_USED(dev); + RTE_SET_USED(subchan); + return 0; +} diff --git a/drivers/bus/vmbus/private.h b/drivers/bus/vmbus/private.h index 1bca147..ea0276a 100644 --- a/drivers/bus/vmbus/private.h +++ b/drivers/bus/vmbus/private.h @@ -116,6 +116,7 @@ bool vmbus_uio_subchannels_supported(const struct rte_vmbus_device *dev, int vmbus_uio_get_subchan(struct vmbus_channel *primary, struct vmbus_channel **subchan); int vmbus_uio_map_rings(struct vmbus_channel *chan); +int vmbus_uio_subchan_open(struct rte_vmbus_device *device, uint32_t subchan); void vmbus_br_setup(struct vmbus_br *br, void *buf, unsigned int blen); diff --git a/drivers/bus/vmbus/rte_bus_vmbus.h b/drivers/bus/vmbus/rte_bus_vmbus.h index a24bad8..06b2ffc 100644 --- a/drivers/bus/vmbus/rte_bus_vmbus.h +++ b/drivers/bus/vmbus/rte_bus_vmbus.h @@ -404,6 +404,17 @@ void rte_vmbus_set_latency(const struct rte_vmbus_device *dev, */ void rte_vmbus_unregister(struct rte_vmbus_driver *driver); +/** + * Perform IOCTL to VMBUS device + * + * @param device + * A pointer to a rte_vmbus_device structure + * @param subchan + * Count of subchannels to open + */ +__rte_experimental +int rte_vmbus_ioctl(struct rte_vmbus_device *device, uint32_t subchan); + /** Helper for VMBUS device registration from driver instance */ #define RTE_PMD_REGISTER_VMBUS(nm, vmbus_drv) \ RTE_INIT(vmbusinitfn_ ##nm) \ diff --git a/drivers/bus/vmbus/version.map b/drivers/bus/vmbus/version.map index 3cadec7..e5b7218 100644 --- a/drivers/bus/vmbus/version.map +++ b/drivers/bus/vmbus/version.map @@ -26,3 +26,9 @@ DPDK_22 { local: *; }; + +EXPERIMENTAL { + global: + + rte_vmbus_ioctl; +}; diff --git a/drivers/bus/vmbus/vmbus_channel.c b/drivers/bus/vmbus/vmbus_channel.c index 119b9b3..9a8f6e3 100644 --- a/drivers/bus/vmbus/vmbus_channel.c +++ b/drivers/bus/vmbus/vmbus_channel.c @@ -365,6 +365,11 @@ int rte_vmbus_max_channels(const struct rte_vmbus_device *device) return 1; } +int rte_vmbus_ioctl(struct rte_vmbus_device *device, uint32_t subchan) +{ + return vmbus_uio_subchan_open(device, subchan); +} + /* Setup secondary channel */ int rte_vmbus_subchan_open(struct vmbus_channel *primary, struct vmbus_channel **new_chan) -- 1.8.3.
[PATCH v5 10/14] net/netvsc: make IOCTL call to open subchannels
make IOCTL call to open subchannels Signed-off-by: Srikanth Kaka Signed-off-by: Vag Singh Signed-off-by: Anand Thulasiram --- drivers/net/netvsc/hn_ethdev.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/net/netvsc/hn_ethdev.c b/drivers/net/netvsc/hn_ethdev.c index 8b1e07b..104c7ae 100644 --- a/drivers/net/netvsc/hn_ethdev.c +++ b/drivers/net/netvsc/hn_ethdev.c @@ -516,6 +516,10 @@ static int hn_subchan_configure(struct hn_data *hv, if (err) return err; + err = rte_vmbus_ioctl(hv->vmbus, subchan); + if (err) + return err; + while (subchan > 0) { struct vmbus_channel *new_sc; uint16_t chn_index; -- 1.8.3.1
[PATCH v5 11/14] bus/vmbus: get subchannel info
Using sysctl, all the subchannel's attributes are fetched Signed-off-by: Srikanth Kaka Signed-off-by: Vag Singh Signed-off-by: Anand Thulasiram --- drivers/bus/vmbus/freebsd/vmbus_uio.c | 73 +++ 1 file changed, 73 insertions(+) diff --git a/drivers/bus/vmbus/freebsd/vmbus_uio.c b/drivers/bus/vmbus/freebsd/vmbus_uio.c index 438db41..6a9a196 100644 --- a/drivers/bus/vmbus/freebsd/vmbus_uio.c +++ b/drivers/bus/vmbus/freebsd/vmbus_uio.c @@ -155,6 +155,79 @@ const char *get_devname_os(void) return "/dev/hv_uio"; } +int vmbus_uio_get_subchan(struct vmbus_channel *primary, + struct vmbus_channel **subchan) +{ + const struct rte_vmbus_device *dev = primary->device; + char sysctl_buf[PATH_MAX], sysctl_var[PATH_MAX]; + size_t len = PATH_MAX, sysctl_len; + /* nr_schan, relid, subid & monid datatype must match kernel's for sysctl */ + uint32_t relid, subid, nr_schan, i; + uint8_t monid; + int err; + + /* get no. of sub-channels opened by hv_uio +* dev.hv_uio.0.subchan_cnt +*/ + snprintf(sysctl_var, len, "dev.%s.%d.subchan_cnt", driver_name, +dev->uio_num); + sysctl_len = sizeof(nr_schan); + if (sysctlbyname(sysctl_var, &nr_schan, &sysctl_len, NULL, 0) < 0) { + VMBUS_LOG(ERR, "could not read %s : %s", sysctl_var, + strerror(errno)); + return -1; + } + + /* dev.hv_uio.0.channel.14.sub */ + snprintf(sysctl_buf, len, "dev.%s.%d.channel.%u.sub", driver_name, +dev->uio_num, primary->relid); + for (i = 1; i <= nr_schan; i++) { + /* get relid */ + snprintf(sysctl_var, len, "%s.%u.chanid", sysctl_buf, i); + sysctl_len = sizeof(relid); + if (sysctlbyname(sysctl_var, &relid, &sysctl_len, NULL, 0) < 0) { + VMBUS_LOG(ERR, "could not read %s : %s", sysctl_var, + strerror(errno)); + goto error; + } + + if (!vmbus_isnew_subchannel(primary, (uint16_t)relid)) { + VMBUS_LOG(DEBUG, "skip already found channel: %u", + relid); + continue; + } + + /* get sub-channel id */ + snprintf(sysctl_var, len, "%s.%u.ch_subidx", sysctl_buf, i); + sysctl_len = sizeof(subid); + if (sysctlbyname(sysctl_var, &subid, &sysctl_len, NULL, 0) < 0) { + VMBUS_LOG(ERR, "could not read %s : %s", sysctl_var, + strerror(errno)); + goto error; + } + + /* get monitor id */ + snprintf(sysctl_var, len, "%s.%u.monitor_id", sysctl_buf, i); + sysctl_len = sizeof(monid); + if (sysctlbyname(sysctl_var, &monid, &sysctl_len, NULL, 0) < 0) { + VMBUS_LOG(ERR, "could not read %s : %s", sysctl_var, + strerror(errno)); + goto error; + } + + err = vmbus_chan_create(dev, (uint16_t)relid, (uint16_t)subid, + monid, subchan); + if (err) { + VMBUS_LOG(ERR, "subchannel setup failed"); + return err; + } + break; + } + return 0; +error: + return -1; +} + int vmbus_uio_subchan_open(struct rte_vmbus_device *dev, uint32_t subchan) { struct mapped_vmbus_resource *uio_res; -- 1.8.3.1
[PATCH v5 12/14] net/netvsc: moving hotplug retry to OS dir
Moved netvsc_hotplug_retry to respective OS dir as it contains OS dependent code. For Linux, it is copied as is and for FreeBSD it is not supported yet. Signed-off-by: Srikanth Kaka Signed-off-by: Vag Singh Signed-off-by: Anand Thulasiram --- drivers/net/netvsc/freebsd/hn_os.c | 5 +++ drivers/net/netvsc/hn_ethdev.c | 84 --- drivers/net/netvsc/hn_os.h | 2 + drivers/net/netvsc/linux/hn_os.c | 90 ++ 4 files changed, 97 insertions(+), 84 deletions(-) diff --git a/drivers/net/netvsc/freebsd/hn_os.c b/drivers/net/netvsc/freebsd/hn_os.c index 4c6a798..fece1be 100644 --- a/drivers/net/netvsc/freebsd/hn_os.c +++ b/drivers/net/netvsc/freebsd/hn_os.c @@ -14,3 +14,8 @@ int eth_hn_os_dev_event(void) PMD_DRV_LOG(DEBUG, "rte_dev_event_monitor_start not supported on FreeBSD"); return 0; } + +void netvsc_hotplug_retry(void *args) +{ + RTE_SET_USED(args); +} diff --git a/drivers/net/netvsc/hn_ethdev.c b/drivers/net/netvsc/hn_ethdev.c index 104c7ae..dd4b872 100644 --- a/drivers/net/netvsc/hn_ethdev.c +++ b/drivers/net/netvsc/hn_ethdev.c @@ -57,9 +57,6 @@ #define NETVSC_ARG_TXBREAK "tx_copybreak" #define NETVSC_ARG_RX_EXTMBUF_ENABLE "rx_extmbuf_enable" -/* The max number of retry when hot adding a VF device */ -#define NETVSC_MAX_HOTADD_RETRY 10 - struct hn_xstats_name_off { char name[RTE_ETH_XSTATS_NAME_SIZE]; unsigned int offset; @@ -556,87 +553,6 @@ static int hn_subchan_configure(struct hn_data *hv, return err; } -static void netvsc_hotplug_retry(void *args) -{ - int ret; - struct hn_data *hv = args; - struct rte_eth_dev *dev = &rte_eth_devices[hv->port_id]; - struct rte_devargs *d = &hv->devargs; - char buf[256]; - - DIR *di; - struct dirent *dir; - struct ifreq req; - struct rte_ether_addr eth_addr; - int s; - - PMD_DRV_LOG(DEBUG, "%s: retry count %d", - __func__, hv->eal_hot_plug_retry); - - if (hv->eal_hot_plug_retry++ > NETVSC_MAX_HOTADD_RETRY) - return; - - snprintf(buf, sizeof(buf), "/sys/bus/pci/devices/%s/net", d->name); - di = opendir(buf); - if (!di) { - PMD_DRV_LOG(DEBUG, "%s: can't open directory %s, " - "retrying in 1 second", __func__, buf); - goto retry; - } - - while ((dir = readdir(di))) { - /* Skip . and .. directories */ - if (!strcmp(dir->d_name, ".") || !strcmp(dir->d_name, "..")) - continue; - - /* trying to get mac address if this is a network device*/ - s = socket(PF_INET, SOCK_DGRAM, IPPROTO_IP); - if (s == -1) { - PMD_DRV_LOG(ERR, "Failed to create socket errno %d", - errno); - break; - } - strlcpy(req.ifr_name, dir->d_name, sizeof(req.ifr_name)); - ret = ioctl(s, SIOCGIFHWADDR, &req); - close(s); - if (ret == -1) { - PMD_DRV_LOG(ERR, - "Failed to send SIOCGIFHWADDR for device %s", - dir->d_name); - break; - } - if (req.ifr_hwaddr.sa_family != ARPHRD_ETHER) { - closedir(di); - return; - } - memcpy(eth_addr.addr_bytes, req.ifr_hwaddr.sa_data, - RTE_DIM(eth_addr.addr_bytes)); - - if (rte_is_same_ether_addr(ð_addr, dev->data->mac_addrs)) { - PMD_DRV_LOG(NOTICE, - "Found matching MAC address, adding device %s network name %s", - d->name, dir->d_name); - ret = rte_eal_hotplug_add(d->bus->name, d->name, - d->args); - if (ret) { - PMD_DRV_LOG(ERR, - "Failed to add PCI device %s", - d->name); - break; - } - } - /* When the code reaches here, we either have already added -* the device, or its MAC address did not match. -*/ - closedir(di); - return; - } - closedir(di); -retry: - /* The device is still being initialized, retry after 1 second */ - rte_eal_alarm_set(100, netvsc_hotplug_retry, hv); -} - static void netvsc_hotadd_callback(const char *device_name, enum rte_dev_event_type type, void *arg) diff --git a/drivers/net/netvsc/hn_os.h b/drivers/net/netvsc/hn_os.h index 618c53c..1fb7292 100644 --- a/drivers/net/
[PATCH v5 13/14] bus/vmbus: add meson support for FreeBSD
add meson support for FreeBSD OS Signed-off-by: Srikanth Kaka Signed-off-by: Vag Singh Signed-off-by: Anand Thulasiram --- drivers/bus/vmbus/meson.build | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/bus/vmbus/meson.build b/drivers/bus/vmbus/meson.build index 60913d0..77f18ce 100644 --- a/drivers/bus/vmbus/meson.build +++ b/drivers/bus/vmbus/meson.build @@ -26,7 +26,11 @@ if is_linux sources += files('linux/vmbus_bus.c', 'linux/vmbus_uio.c') includes += include_directories('linux') +elif is_freebsd +sources += files('freebsd/vmbus_bus.c', + 'freebsd/vmbus_uio.c') +includes += include_directories('freebsd') else build = false -reason = 'only supported on Linux' +reason = 'only supported on Linux & FreeBSD' endif -- 1.8.3.1
[PATCH v5 14/14] bus/vmbus: update MAINTAINERS and docs
updated MAINTAINERS and doc files for FreeBSD support Signed-off-by: Srikanth Kaka Signed-off-by: Vag Singh Signed-off-by: Anand Thulasiram --- MAINTAINERS| 2 ++ doc/guides/nics/netvsc.rst | 11 +++ 2 files changed, 13 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 7c4f541..01a494e 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -567,6 +567,7 @@ F: app/test/test_vdev.c VMBUS bus driver M: Stephen Hemminger M: Long Li +M: Srikanth Kaka F: drivers/bus/vmbus/ @@ -823,6 +824,7 @@ F: doc/guides/nics/vdev_netvsc.rst Microsoft Hyper-V netvsc M: Stephen Hemminger M: Long Li +M: Srikanth Kaka F: drivers/net/netvsc/ F: doc/guides/nics/netvsc.rst F: doc/guides/nics/features/netvsc.ini diff --git a/doc/guides/nics/netvsc.rst b/doc/guides/nics/netvsc.rst index 77efe1d..12d1702 100644 --- a/doc/guides/nics/netvsc.rst +++ b/doc/guides/nics/netvsc.rst @@ -91,6 +91,12 @@ operations: The dpdk-devbind.py script can not be used since it only handles PCI devices. +On FreeBSD, with hv_uio kernel driver loaded, do the following: + +.. code-block:: console + +devctl set driver -f hn1 hv_uio + Prerequisites - @@ -101,6 +107,11 @@ The following prerequisites apply: Full support of multiple queues requires the 4.17 kernel. It is possible to use the netvsc PMD with 4.16 kernel but it is limited to a single queue. +* FreeBSD support for UIO on vmbus is done with hv_uio driver and it is still +in `review`_ + +.. _`review`: https://reviews.freebsd.org/D32184 + Netvsc PMD arguments -- 1.8.3.1
Re: [dpdk-dev] [PATCH v3 6/7] app/proc-info: provide way to request info on owned ports
Hi Stephen, We were going through the patch set: https://inbox.dpdk.org/dev/20200715212228.28010-7-step...@networkplumber.org/ and hoping to get clarification on the behaviour if post mask is not specified in the input to `dpdk-proc-info` tool. Specifically, In PATCH v3 6/7, we see this: + /* If no port mask was specified, one will be provided */ + if (enabled_port_mask == 0) { + RTE_ETH_FOREACH_DEV(i) { + enabled_port_mask |= 1u << i; However, in PATCH v4 8/8, we see this: + /* If no port mask was specified, then show non-owned ports */ + if (enabled_port_mask == 0) { + RTE_ETH_FOREACH_DEV(i) + enabled_port_mask = 1ul << i; + } Was there any specific reason to show just the last non-owned port in case the port mask was not specified? Should we show all non-owned ports in case the user doesn’t specify any port mask? Regards, Subendu.
librte_bpf: roadmap or any specific plans for this library
Hi all, I hope this is the correct maillist for this topic. DPDK provides the nice library `librte_bpf` to load and execute eBPF bytecode and we would like to broaden our usage of this library. Today there are hints that this library might have been purpose built to enable inspection or modification of packets; for example the eBPF program is expected to only use a single input argument, pointing to data of some sort. We believe it would be beneficial to be able to use this library to run generic eBPF programs as well, as an alternative to run them as RX- TX-port/queue callbacks (i.e. generic programs which only uses supported features) I have seen some discussions regarding moving towards using a common library with the kernel implementation of bpf, but I couldn't figure out the outcome. My question is if there any plans to evolve this library or would improvements possibly be accepted? Here are some improvements we are interested to look into: * Add additional API for loading eBPF code. Today it's possible to load eBPF code from an ELF file, but having an API to load code from an ELF image from memory would open up for other ways to manage eBPF code. Example of the new API: struct rte_bpf * rte_bpf_elf_image_load(const struct rte_bpf_prm *prm, char *image, size_t size, const char *sname); * Add support of more than a single input argument. There are cases when additional information is needed. Being able to use more than a single input argument would help when running generic eBPF programs. Example of change: struct rte_bpf_prm { ... -struct rte_bpf_arg prog_arg; /**< eBPF program input arg description */ +uint32_t nb_args; +struct rte_bpf_arg prog_args[EBPF_FUNC_MAX_ARGS]; /**< eBPF program input args */ }; Any feedback regarding this is welcomed. Best regard Bjorn
RE: [PATCH v6 03/16] vhost: add vhost msg support
HI Chenbo, Thanks for your reply. My reply is inline. > -Original Message- > From: Xia, Chenbo > Sent: Monday, April 25, 2022 8:42 PM > To: Pei, Andy ; dev@dpdk.org > Cc: maxime.coque...@redhat.com; Cao, Gang ; Liu, > Changpeng > Subject: RE: [PATCH v6 03/16] vhost: add vhost msg support > > Hi Andy, > > > -Original Message- > > From: Pei, Andy > > Sent: Thursday, April 21, 2022 4:34 PM > > To: dev@dpdk.org > > Cc: Xia, Chenbo ; maxime.coque...@redhat.com; > > Cao, Gang ; Liu, Changpeng > > > > Subject: [PATCH v6 03/16] vhost: add vhost msg support > > > > Add support for VHOST_USER_GET_CONFIG and > VHOST_USER_SET_CONFIG. > > VHOST_USER_GET_CONFIG and VHOST_USER_SET_CONFIG message is only > > supported by virtio blk VDPA device. > > > > Signed-off-by: Andy Pei > > --- > > lib/vhost/vhost_user.c | 69 > > ++ > > lib/vhost/vhost_user.h | 13 ++ > > 2 files changed, 82 insertions(+) > > > > diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c index > > 1d39067..3780804 100644 > > --- a/lib/vhost/vhost_user.c > > +++ b/lib/vhost/vhost_user.c > > @@ -80,6 +80,8 @@ > > [VHOST_USER_NET_SET_MTU] = "VHOST_USER_NET_SET_MTU", > > [VHOST_USER_SET_SLAVE_REQ_FD] = > "VHOST_USER_SET_SLAVE_REQ_FD", > > [VHOST_USER_IOTLB_MSG] = "VHOST_USER_IOTLB_MSG", > > + [VHOST_USER_GET_CONFIG] = "VHOST_USER_GET_CONFIG", > > + [VHOST_USER_SET_CONFIG] = "VHOST_USER_SET_CONFIG", > > [VHOST_USER_CRYPTO_CREATE_SESS] = > "VHOST_USER_CRYPTO_CREATE_SESS", > > [VHOST_USER_CRYPTO_CLOSE_SESS] = > "VHOST_USER_CRYPTO_CLOSE_SESS", > > [VHOST_USER_POSTCOPY_ADVISE] = > "VHOST_USER_POSTCOPY_ADVISE", @@ > > -2542,6 +2544,71 @@ static int is_vring_iotlb(struct virtio_net *dev, > > } > > > > static int > > +vhost_user_get_config(struct virtio_net **pdev, > > + struct vhu_msg_context *ctx, > > + int main_fd __rte_unused) > > +{ > > + struct virtio_net *dev = *pdev; > > + struct rte_vdpa_device *vdpa_dev = dev->vdpa_dev; > > + int ret = 0; > > + > > + if (vdpa_dev->ops->get_config) { > > + ret = vdpa_dev->ops->get_config(dev->vid, > > + ctx->msg.payload.cfg.region, > > + ctx->msg.payload.cfg.size); > > + if (ret != 0) { > > + ctx->msg.size = 0; > > + VHOST_LOG_CONFIG(ERR, > > +"(%s) get_config() return error!\n", > > +dev->ifname); > > + } > > + } else { > > + VHOST_LOG_CONFIG(ERR, "(%s) get_config() not > supportted!\n", > > Supported > I will send out a new version to fix this. > > +dev->ifname); > > + } > > + > > + return RTE_VHOST_MSG_RESULT_REPLY; > > +} > > + > > +static int > > +vhost_user_set_config(struct virtio_net **pdev, > > + struct vhu_msg_context *ctx, > > + int main_fd __rte_unused) > > +{ > > + struct virtio_net *dev = *pdev; > > + struct rte_vdpa_device *vdpa_dev = dev->vdpa_dev; > > + int ret = 0; > > + > > + if (ctx->msg.size != sizeof(struct vhost_user_config)) { > > I think you should do sanity check on payload.cfg.size and make sure it's > smaller than VHOST_USER_MAX_CONFIG_SIZE > > and same check for offset > I think payload.cfg.size can be smaller than or equal to VHOST_USER_MAX_CONFIG_SIZE. payload.cfg.ofset can be smaller than or equal to VHOST_USER_MAX_CONFIG_SIZE as well > > + VHOST_LOG_CONFIG(ERR, > > + "(%s) invalid set config msg size: %"PRId32" != %d\n", > > + dev->ifname, ctx->msg.size, > > Based on you will change the log too, payload.cfg.size is uint32_t, so PRId32 > -> > PRIu32 > > > + (int)sizeof(struct vhost_user_config)); > > So this can be %u > Sure. > > + goto OUT; > > + } > > + > > + if (vdpa_dev->ops->set_config) { > > + ret = vdpa_dev->ops->set_config(dev->vid, > > + ctx->msg.payload.cfg.region, > > + ctx->msg.payload.cfg.offset, > > + ctx->msg.payload.cfg.size, > > + ctx->msg.payload.cfg.flags); > > + if (ret) > > + VHOST_LOG_CONFIG(ERR, > > +"(%s) set_config() return error!\n", > > +dev->ifname); > > + } else { > > + VHOST_LOG_CONFIG(ERR, "(%s) set_config() not > supportted!\n", > > Supported > I will send out a new version to fix this. > > +dev->ifname); > > + } > > + > > + return RTE_VHOST_MSG_RESULT_OK; > > + > > +OUT: > > Lower case looks better > OK. I will send out a new version to fix this. > > + return RTE_VHOST_MSG_RESULT_ERR; > > +} > > Almost all handlers need check on expected fd num (this case is 0), so the > above new
RE: [PATCH] doc: fix support table for ETH and VLAN flow items
>-Original Message- >From: Ferruh Yigit >Sent: Wednesday, April 20, 2022 8:52 PM >To: Ilya Maximets ; dev@dpdk.org; Asaf Penso > >Cc: Ajit Khaparde ; Rahul Lakkireddy >; Hemant Agrawal >; Haiyue Wang ; John >Daley ; Guoyang Zhou ; >Min Hu (Connor) ; Beilei Xing >; Jingjing Wu ; Qi Zhang >; Rosen Xu ; Matan Azrad >; Slava Ovsiienko ; Liron Himi >; Jiawen Wu ; Ori Kam >; Dekel Peled ; NBU-Contact- >Thomas Monjalon (EXTERNAL) ; sta...@dpdk.org; >NBU-Contact-Thomas Monjalon (EXTERNAL) >Subject: Re: [PATCH] doc: fix support table for ETH and VLAN flow items > >On 3/16/2022 12:01 PM, Ilya Maximets wrote: >> 'has_vlan' attribute is only supported by sfc, mlx5 and cnxk. >> Other drivers doesn't support it. Most of them (like i40e) just >> ignore it silently. Some drivers (like mlx4) never had a full support >> of the eth item even before introduction of 'has_vlan' >> (mlx4 allows to match on the destination MAC only). >> >> Same for the 'has_more_vlan' flag of the vlan item. >> >> Changing the support level to 'partial' for all such drivers. >> This doesn't solve the issue, but at least marks the problematic >> drivers. >> > >Hi Asaf, > >This was the kind of maintanance issue I was referring to have this kind of >capability documentation for flow API. > Are you referring to the fact that fields like has_vlan are not part of the table? If so, you are right, but IMHO having the high level items still allows the users to understand what is supported quickly. We can have another level of tables per each relevant item to address this specific issue. In this case, we'll have a table for ETH that elaborates the different fields' support, like has_vlan. If you are referring to a different issue, please elaborate. >All below drivers are using 'RTE_FLOW_ITEM_TYPE_VLAN', the script verifies >this, but are they actually supporting VLAN filter and in which case? > >We need comment from driver maintainers about the support level. @Ori Kam, please comment for mlx driver. > >> Some details are available in: >>https://bugs.dpdk.org/show_bug.cgi?id=958 >> >> Fixes: 09315fc83861 ("ethdev: add VLAN attributes to ethernet and VLAN >> items") >> Cc: sta...@dpdk.org >> >> Signed-off-by: Ilya Maximets >> --- >> >> I added the stable in CC, but the patch should be extended while >> backporting. For 21.11 the cnxk driver should be also updated, for >> 20.11, sfc driver should also be included. >> >> doc/guides/nics/features/bnxt.ini | 4 ++-- >> doc/guides/nics/features/cxgbe.ini | 4 ++-- >> doc/guides/nics/features/dpaa2.ini | 4 ++-- >> doc/guides/nics/features/e1000.ini | 2 +- >> doc/guides/nics/features/enic.ini | 4 ++-- >> doc/guides/nics/features/hinic.ini | 2 +- >> doc/guides/nics/features/hns3.ini | 4 ++-- >> doc/guides/nics/features/i40e.ini | 4 ++-- >> doc/guides/nics/features/iavf.ini | 4 ++-- >> doc/guides/nics/features/ice.ini| 4 ++-- >> doc/guides/nics/features/igc.ini| 2 +- >> doc/guides/nics/features/ipn3ke.ini | 4 ++-- >> doc/guides/nics/features/ixgbe.ini | 4 ++-- >> doc/guides/nics/features/mlx4.ini | 4 ++-- >> doc/guides/nics/features/mvpp2.ini | 4 ++-- >> doc/guides/nics/features/tap.ini| 4 ++-- >> doc/guides/nics/features/txgbe.ini | 4 ++-- >> 17 files changed, 31 insertions(+), 31 deletions(-) >> >> diff --git a/doc/guides/nics/features/bnxt.ini >> b/doc/guides/nics/features/bnxt.ini >> index afb5414b49..ac682c5779 100644 >> --- a/doc/guides/nics/features/bnxt.ini >> +++ b/doc/guides/nics/features/bnxt.ini >> @@ -57,7 +57,7 @@ Perf doc = Y >> >> [rte_flow items] >> any = Y >> -eth = Y >> +eth = P >> ipv4 = Y >> ipv6 = Y >> gre = Y >> @@ -71,7 +71,7 @@ represented_port = Y >> tcp = Y >> udp = Y >> vf = Y >> -vlan = Y >> +vlan = P >> vxlan= Y >> >> [rte_flow actions] >> diff --git a/doc/guides/nics/features/cxgbe.ini >> b/doc/guides/nics/features/cxgbe.ini >> index f674803ec4..f9912390fb 100644 >> --- a/doc/guides/nics/features/cxgbe.ini >> +++ b/doc/guides/nics/features/cxgbe.ini >> @@ -36,7 +36,7 @@ x86-64 = Y >> Usage doc= Y >> >> [rte_flow items] >> -eth = Y >> +eth = P >> ipv4 = Y >> ipv6 = Y >> pf = Y >> @@ -44,7 +44,7 @@ phy_port = Y >> tcp = Y >> udp = Y >> vf = Y >> -vlan = Y >> +vlan = P >> >> [rte_flow actions] >> count= Y >> diff --git a/doc/guides/nics/features/dpaa2.ini >> b/doc/guides/nics/features/dpaa2.ini >> index 4c06841a87..09ce66c788 100644 >> --- a/doc/guides/nics/features/dpaa2.ini >> +++ b/doc/guides/nics/features/dpaa2.ini >> @@ -31,7
RE: [PATCH 1/2] ci: switch to Ubuntu 20.04
> -Original Message- > From: David Marchand > Sent: Tuesday, April 26, 2022 3:18 PM > To: dev@dpdk.org > Cc: Aaron Conole ; Michael Santana > ; Ruifeng Wang ; > Jan Viktorin ; Bruce Richardson > ; David Christensen > Subject: [PATCH 1/2] ci: switch to Ubuntu 20.04 > > Ubuntu 18.04 is now rather old. > Besides, other entities in our CI are also testing this distribution. > > Switch to a newer Ubuntu release and benefit from more recent > tool(chain)s: for example, net/cnxk now builds fine and can be re-enabled. > > Signed-off-by: David Marchand > --- > diff --git a/config/arm/arm64_armv8_linux_clang_ubuntu2004 > b/config/arm/arm64_armv8_linux_clang_ubuntu2004 > new file mode 12 > index 00..01f5b7643e > --- /dev/null > +++ b/config/arm/arm64_armv8_linux_clang_ubuntu2004 How about naming it without '2004'? It is a link to ubuntu1804 crossfile because distribution dependent paths in the file doesn't change. And I believe the consistency will be kept across distribution releases. So we can use a file name without distribution release number for the latest/default Ubuntu environment. This removes the need for a new crossfile for each Ubuntu LTS release. Thanks. > @@ -0,0 +1 @@ > +arm64_armv8_linux_clang_ubuntu1804 > \ No newline at end of file > diff --git a/config/ppc/ppc64le-power8-linux-gcc-ubuntu2004 > b/config/ppc/ppc64le-power8-linux-gcc-ubuntu2004 > new file mode 12 > index 00..9d6139a19b > --- /dev/null > +++ b/config/ppc/ppc64le-power8-linux-gcc-ubuntu2004 > @@ -0,0 +1 @@ > +ppc64le-power8-linux-gcc-ubuntu1804 > \ No newline at end of file > -- > 2.23.0
RE: [PATCH v6 03/16] vhost: add vhost msg support
> -Original Message- > From: Pei, Andy > Sent: Tuesday, April 26, 2022 4:56 PM > To: Xia, Chenbo ; dev@dpdk.org > Cc: maxime.coque...@redhat.com; Cao, Gang ; Liu, > Changpeng > Subject: RE: [PATCH v6 03/16] vhost: add vhost msg support > > HI Chenbo, > > Thanks for your reply. > My reply is inline. > > > -Original Message- > > From: Xia, Chenbo > > Sent: Monday, April 25, 2022 8:42 PM > > To: Pei, Andy ; dev@dpdk.org > > Cc: maxime.coque...@redhat.com; Cao, Gang ; Liu, > > Changpeng > > Subject: RE: [PATCH v6 03/16] vhost: add vhost msg support > > > > Hi Andy, > > > > > -Original Message- > > > From: Pei, Andy > > > Sent: Thursday, April 21, 2022 4:34 PM > > > To: dev@dpdk.org > > > Cc: Xia, Chenbo ; maxime.coque...@redhat.com; > > > Cao, Gang ; Liu, Changpeng > > > > > > Subject: [PATCH v6 03/16] vhost: add vhost msg support > > > > > > Add support for VHOST_USER_GET_CONFIG and > > VHOST_USER_SET_CONFIG. > > > VHOST_USER_GET_CONFIG and VHOST_USER_SET_CONFIG message is only > > > supported by virtio blk VDPA device. > > > > > > Signed-off-by: Andy Pei > > > --- > > > lib/vhost/vhost_user.c | 69 > > > ++ > > > lib/vhost/vhost_user.h | 13 ++ > > > 2 files changed, 82 insertions(+) > > > > > > diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c index > > > 1d39067..3780804 100644 > > > --- a/lib/vhost/vhost_user.c > > > +++ b/lib/vhost/vhost_user.c > > > @@ -80,6 +80,8 @@ > > > [VHOST_USER_NET_SET_MTU] = "VHOST_USER_NET_SET_MTU", > > > [VHOST_USER_SET_SLAVE_REQ_FD] = > > "VHOST_USER_SET_SLAVE_REQ_FD", > > > [VHOST_USER_IOTLB_MSG] = "VHOST_USER_IOTLB_MSG", > > > +[VHOST_USER_GET_CONFIG] = "VHOST_USER_GET_CONFIG", > > > +[VHOST_USER_SET_CONFIG] = "VHOST_USER_SET_CONFIG", > > > [VHOST_USER_CRYPTO_CREATE_SESS] = > > "VHOST_USER_CRYPTO_CREATE_SESS", > > > [VHOST_USER_CRYPTO_CLOSE_SESS] = > > "VHOST_USER_CRYPTO_CLOSE_SESS", > > > [VHOST_USER_POSTCOPY_ADVISE] = > > "VHOST_USER_POSTCOPY_ADVISE", @@ > > > -2542,6 +2544,71 @@ static int is_vring_iotlb(struct virtio_net *dev, > > > } > > > > > > static int > > > +vhost_user_get_config(struct virtio_net **pdev, > > > +struct vhu_msg_context *ctx, > > > +int main_fd __rte_unused) > > > +{ > > > +struct virtio_net *dev = *pdev; > > > +struct rte_vdpa_device *vdpa_dev = dev->vdpa_dev; > > > +int ret = 0; > > > + > > > +if (vdpa_dev->ops->get_config) { > > > +ret = vdpa_dev->ops->get_config(dev->vid, > > > + ctx->msg.payload.cfg.region, > > > + ctx->msg.payload.cfg.size); > > > +if (ret != 0) { > > > +ctx->msg.size = 0; > > > +VHOST_LOG_CONFIG(ERR, > > > + "(%s) get_config() return error!\n", > > > + dev->ifname); > > > +} > > > +} else { > > > +VHOST_LOG_CONFIG(ERR, "(%s) get_config() not > > supportted!\n", > > > > Supported > > > I will send out a new version to fix this. > > > + dev->ifname); > > > +} > > > + > > > +return RTE_VHOST_MSG_RESULT_REPLY; > > > +} > > > + > > > +static int > > > +vhost_user_set_config(struct virtio_net **pdev, > > > +struct vhu_msg_context *ctx, > > > +int main_fd __rte_unused) > > > +{ > > > +struct virtio_net *dev = *pdev; > > > +struct rte_vdpa_device *vdpa_dev = dev->vdpa_dev; > > > +int ret = 0; > > > + > > > +if (ctx->msg.size != sizeof(struct vhost_user_config)) { > > > > I think you should do sanity check on payload.cfg.size and make sure > it's > > smaller than VHOST_USER_MAX_CONFIG_SIZE > > > > and same check for offset > > > I think payload.cfg.size can be smaller than or equal to > VHOST_USER_MAX_CONFIG_SIZE. > payload.cfg.ofset can be smaller than or equal to > VHOST_USER_MAX_CONFIG_SIZE as well After double check: offset is the config space offset, so this should be checked in vdpa driver. Size check on vhost lib layer should be just <= MAX_you_defined Thanks, Chenbo > > > > +VHOST_LOG_CONFIG(ERR, > > > +"(%s) invalid set config msg size: %"PRId32" != %d\n", > > > +dev->ifname, ctx->msg.size, > > > > Based on you will change the log too, payload.cfg.size is uint32_t, so > PRId32 -> > > PRIu32 > > > > > +(int)sizeof(struct vhost_user_config)); > > > > So this can be %u > > > Sure. > > > +goto OUT; > > > +} > > > + > > > +if (vdpa_dev->ops->set_config) { > > > +ret = vdpa_dev->ops->set_config(dev->vid, > > > +ctx->msg.payload.cfg.region, > > > +ctx->msg.payload.cfg.offset, > > > +ctx->msg.payload.cfg.size, > > > +ctx->msg.payload.cfg.flags); > > > +if (ret) > > > +VHOST_LOG_CONFIG(ERR, > > > + "(%s) set_config() return error!\n", > > > + dev->ifname); > > > +} else { > > > +VHOST_LOG_CONFIG(ERR, "(%s) set_config() not > > supportted!\n", > > > > Supported > > > I will send out a new version to fix this. > > > + dev->ifname); > > > +} > > > + > > > +return RTE_VHOST_MSG_RESULT_OK; > > > + > > > +OUT: > > > > Lower case looks better > > > OK. I will send out a new version to fix this. > > > +return RTE_VHOST_MSG_RESULT_ERR; > > > +} > > > > Almost all handlers need check on expected fd num (this case is 0), so
RE: [PATCH] net/mlx5: fix rxq/txq stats memory access sync
Hi, > -Original Message- > From: Raja Zidane > Sent: Wednesday, April 20, 2022 6:32 PM > To: dev@dpdk.org > Cc: Matan Azrad ; Slava Ovsiienko > ; sta...@dpdk.org > Subject: [PATCH] net/mlx5: fix rxq/txq stats memory access sync > > Queue statistics are being continuously updated in Rx/Tx burst > routines while handling traffic. In addition to that, statistics > can be reset (written with zeroes) on statistics reset in other > threads, causing a race condition, which in turn could result in > wrong stats. > > The patch provides an approach with reference values, allowing > the actual counters to be writable within Rx/Tx burst threads > only, and updating reference values on stats reset. > > Fixes: 87011737b715 ("mlx5: add software counters") > Cc: sta...@dpdk.org > > Signed-off-by: Raja Zidane > Acked-by: Slava Ovsiienko > --- Patch applied to next-net-mlx, Kindest regards, Raslan Darawsheh
RE: [PATCH v6 05/16] vdpa/ifc: add vDPA interrupt for blk device
Hi Chenbo, Thanks for your reply. My reply is inline. > -Original Message- > From: Xia, Chenbo > Sent: Monday, April 25, 2022 8:58 PM > To: Pei, Andy ; dev@dpdk.org > Cc: maxime.coque...@redhat.com; Cao, Gang ; Liu, > Changpeng > Subject: RE: [PATCH v6 05/16] vdpa/ifc: add vDPA interrupt for blk device > > Hi Andy, > > > -Original Message- > > From: Pei, Andy > > Sent: Thursday, April 21, 2022 4:34 PM > > To: dev@dpdk.org > > Cc: Xia, Chenbo ; maxime.coque...@redhat.com; > > Cao, Gang ; Liu, Changpeng > > > > Subject: [PATCH v6 05/16] vdpa/ifc: add vDPA interrupt for blk device > > > > For the block device type, we have to relay the commands on all > > queues. > > It's a bit short... although I can understand, please add some background on > current implementation for others to easily understand. > Sure, I will send a new patch set to address this. > > > > Signed-off-by: Andy Pei > > --- > > drivers/vdpa/ifc/ifcvf_vdpa.c | 46 > > -- > > - > > 1 file changed, 35 insertions(+), 11 deletions(-) > > > > diff --git a/drivers/vdpa/ifc/ifcvf_vdpa.c > > b/drivers/vdpa/ifc/ifcvf_vdpa.c index 8ee041f..8d104b7 100644 > > --- a/drivers/vdpa/ifc/ifcvf_vdpa.c > > +++ b/drivers/vdpa/ifc/ifcvf_vdpa.c > > @@ -370,24 +370,48 @@ struct rte_vdpa_dev_info { > > irq_set->index = VFIO_PCI_MSIX_IRQ_INDEX; > > irq_set->start = 0; > > fd_ptr = (int *)&irq_set->data; > > + /* The first interrupt is for the configure space change > > notification */ > > fd_ptr[RTE_INTR_VEC_ZERO_OFFSET] = > > rte_intr_fd_get(internal->pdev->intr_handle); > > > > for (i = 0; i < nr_vring; i++) > > internal->intr_fd[i] = -1; > > > > - for (i = 0; i < nr_vring; i++) { > > - rte_vhost_get_vhost_vring(internal->vid, i, &vring); > > - fd_ptr[RTE_INTR_VEC_RXTX_OFFSET + i] = vring.callfd; > > - if ((i & 1) == 0 && m_rx == true) { > > - fd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC); > > - if (fd < 0) { > > - DRV_LOG(ERR, "can't setup eventfd: %s", > > - strerror(errno)); > > - return -1; > > + if (internal->device_type == IFCVF_NET) { > > + for (i = 0; i < nr_vring; i++) { > > + rte_vhost_get_vhost_vring(internal->vid, i, &vring); > > + fd_ptr[RTE_INTR_VEC_RXTX_OFFSET + i] = vring.callfd; > > + if ((i & 1) == 0 && m_rx == true) { > > + /* For the net we only need to relay rx queue, > > +* which will change the mem of VM. > > +*/ > > + fd = eventfd(0, EFD_NONBLOCK | > EFD_CLOEXEC); > > + if (fd < 0) { > > + DRV_LOG(ERR, "can't setup > eventfd: %s", > > + strerror(errno)); > > + return -1; > > + } > > + internal->intr_fd[i] = fd; > > + fd_ptr[RTE_INTR_VEC_RXTX_OFFSET + i] = fd; > > + } > > + } > > + } else if (internal->device_type == IFCVF_BLK) { > > + for (i = 0; i < nr_vring; i++) { > > + rte_vhost_get_vhost_vring(internal->vid, i, &vring); > > + fd_ptr[RTE_INTR_VEC_RXTX_OFFSET + i] = vring.callfd; > > + if (m_rx == true) { > > + /* For the blk we need to relay all the read > cmd > > +* of each queue > > +*/ > > + fd = eventfd(0, EFD_NONBLOCK | > EFD_CLOEXEC); > > + if (fd < 0) { > > + DRV_LOG(ERR, "can't setup > eventfd: %s", > > + strerror(errno)); > > + return -1; > > + } > > + internal->intr_fd[i] = fd; > > + fd_ptr[RTE_INTR_VEC_RXTX_OFFSET + i] = fd; > > Many duplicated code here for blk and net. What if we use this condition to > know creating eventfd or not: > > if (m_rx == true && (is_blk_dev || (i & 1) == 0)) { > /* create eventfd and save now */ > } > Sure, I will send a new patch set to address this. > Thanks, > Chenbo > > > } > > - internal->intr_fd[i] = fd; > > - fd_ptr[RTE_INTR_VEC_RXTX_OFFSET + i] = fd; > > } > > } > > > > -- > > 1.8.3.1 >
RE: [PATCH v6 06/16] vdpa/ifc: add block device SW live-migration
Hi Chenbo, Thanks for your reply. My reply is inline. > -Original Message- > From: Xia, Chenbo > Sent: Monday, April 25, 2022 9:10 PM > To: Pei, Andy ; dev@dpdk.org > Cc: maxime.coque...@redhat.com; Cao, Gang ; Liu, > Changpeng > Subject: RE: [PATCH v6 06/16] vdpa/ifc: add block device SW live-migration > > > -Original Message- > > From: Pei, Andy > > Sent: Thursday, April 21, 2022 4:34 PM > > To: dev@dpdk.org > > Cc: Xia, Chenbo ; maxime.coque...@redhat.com; > > Cao, Gang ; Liu, Changpeng > > > > Subject: [PATCH v6 06/16] vdpa/ifc: add block device SW live-migration > > > > Add SW live-migration support to block device. > > Add dirty page logging to block device. > > Add SW live-migration support including dirty page logging for block device. > Sure, I will remove " Add dirty page logging to block device." In next version. > > > > Signed-off-by: Andy Pei > > --- > > drivers/vdpa/ifc/base/ifcvf.c | 4 +- > > drivers/vdpa/ifc/base/ifcvf.h | 6 ++ > > drivers/vdpa/ifc/ifcvf_vdpa.c | 128 > > +++-- > > - > > 3 files changed, 115 insertions(+), 23 deletions(-) > > > > diff --git a/drivers/vdpa/ifc/base/ifcvf.c > > b/drivers/vdpa/ifc/base/ifcvf.c index d10c1fd..e417c50 100644 > > --- a/drivers/vdpa/ifc/base/ifcvf.c > > +++ b/drivers/vdpa/ifc/base/ifcvf.c > > @@ -191,7 +191,7 @@ > > IFCVF_WRITE_REG32(val >> 32, hi); > > } > > > > -STATIC int > > +int > > ifcvf_hw_enable(struct ifcvf_hw *hw) > > { > > struct ifcvf_pci_common_cfg *cfg; > > @@ -240,7 +240,7 @@ > > return 0; > > } > > > > -STATIC void > > +void > > ifcvf_hw_disable(struct ifcvf_hw *hw) { > > u32 i; > > diff --git a/drivers/vdpa/ifc/base/ifcvf.h > > b/drivers/vdpa/ifc/base/ifcvf.h index 769c603..6dd7925 100644 > > --- a/drivers/vdpa/ifc/base/ifcvf.h > > +++ b/drivers/vdpa/ifc/base/ifcvf.h > > @@ -179,4 +179,10 @@ struct ifcvf_hw { > > u64 > > ifcvf_get_queue_notify_off(struct ifcvf_hw *hw, int qid); > > > > +int > > +ifcvf_hw_enable(struct ifcvf_hw *hw); > > + > > +void > > +ifcvf_hw_disable(struct ifcvf_hw *hw); > > + > > #endif /* _IFCVF_H_ */ > > diff --git a/drivers/vdpa/ifc/ifcvf_vdpa.c > > b/drivers/vdpa/ifc/ifcvf_vdpa.c index 8d104b7..a23dc2d 100644 > > --- a/drivers/vdpa/ifc/ifcvf_vdpa.c > > +++ b/drivers/vdpa/ifc/ifcvf_vdpa.c > > @@ -345,6 +345,56 @@ struct rte_vdpa_dev_info { > > } > > } > > > > +static void > > +vdpa_ifcvf_blk_pause(struct ifcvf_internal *internal) { > > + struct ifcvf_hw *hw = &internal->hw; > > + struct rte_vhost_vring vq; > > + int i, vid; > > + uint64_t features = 0; > > + uint64_t log_base = 0, log_size = 0; > > + uint64_t len; > > + > > + vid = internal->vid; > > + > > + if (internal->device_type == IFCVF_BLK) { > > + for (i = 0; i < hw->nr_vring; i++) { > > + rte_vhost_get_vhost_vring(internal->vid, i, &vq); > > + while (vq.avail->idx != vq.used->idx) { > > + ifcvf_notify_queue(hw, i); > > + usleep(10); > > + } > > + hw->vring[i].last_avail_idx = vq.avail->idx; > > + hw->vring[i].last_used_idx = vq.used->idx; > > + } > > + } > > + > > + ifcvf_hw_disable(hw); > > + > > + for (i = 0; i < hw->nr_vring; i++) > > + rte_vhost_set_vring_base(vid, i, hw->vring[i].last_avail_idx, > > + hw->vring[i].last_used_idx); > > + > > + if (internal->sw_lm) > > + return; > > + > > + rte_vhost_get_negotiated_features(vid, &features); > > + if (RTE_VHOST_NEED_LOG(features)) { > > + ifcvf_disable_logging(hw); > > + rte_vhost_get_log_base(internal->vid, &log_base, &log_size); > > + rte_vfio_container_dma_unmap(internal->vfio_container_fd, > > + log_base, IFCVF_LOG_BASE, log_size); > > + /* > > +* IFCVF marks dirty memory pages for only packet buffer, > > +* SW helps to mark the used ring as dirty after device stops. > > +*/ > > + for (i = 0; i < hw->nr_vring; i++) { > > + len = IFCVF_USED_RING_LEN(hw->vring[i].size); > > + rte_vhost_log_used_vring(vid, i, 0, len); > > + } > > + } > > +} > > Can we consider combining vdpa_ifcvf_blk_pause and vdpa_ifcvf_stop to > one function and check device type internally to do different things? Because > as I see, most logic is the same. > OK, I will address it in next version. > > + > > #define MSIX_IRQ_SET_BUF_LEN (sizeof(struct vfio_irq_set) + \ > > sizeof(int) * (IFCVF_MAX_QUEUES * 2 + 1)) static int @@ - > 659,15 > > +709,22 @@ struct rte_vdpa_dev_info { > > } > > hw->vring[i].avail = gpa; > > > > - /* Direct I/O for Tx queue, relay for Rx queue */ > > - if (i & 1) { > > - gpa = hva_to_gpa(vid, (uint64_t)(uintptr_t)vq.used); > > -
Re: [DPDK v4] net/ixgbe: promote MDIO API
Zeng, ZhichaoX writes: > Hi, Ray, David: > > What is your opinion on this patch? > > Regards, > Zhichao > > -Original Message- > From: Zeng, ZhichaoX > Sent: Tuesday, April 19, 2022 7:06 PM > To: dev@dpdk.org > Cc: Yang, Qiming ; Wang, Haiyue > ; m...@ashroe.eu; Zeng, ZhichaoX > > Subject: [DPDK v4] net/ixgbe: promote MDIO API > > From: Zhichao Zeng > > Promote the MDIO APIs to be stable. > > Signed-off-by: Zhichao Zeng > --- > drivers/net/ixgbe/rte_pmd_ixgbe.h | 5 - > drivers/net/ixgbe/version.map | 10 +- > 2 files changed, 5 insertions(+), 10 deletions(-) > Acked-by: Ray Kinsella
Re: [PATCH v2 04/28] common/cnxk: support to configure the ts pkind in CPT
Nithin Dabilpuram writes: > From: Vidya Sagar Velumuri > > Add new API to configure the SA table entries with new CPT PKIND > when timestamp is enabled. > > Signed-off-by: Vidya Sagar Velumuri > --- > drivers/common/cnxk/roc_nix_inl.c | 59 > ++ > drivers/common/cnxk/roc_nix_inl.h | 2 ++ > drivers/common/cnxk/roc_nix_inl_priv.h | 1 + > drivers/common/cnxk/version.map| 1 + > 4 files changed, 63 insertions(+) > Acked-by: Ray Kinsella
Re: [dpdk-dev][PATCH 3/3] net/cnxk: adding cnxk support to configure custom sa index
kirankum...@marvell.com writes: > From: Kiran Kumar K > > Adding cnxk device driver support to configure custom sa index. > Custom sa index can be configured as part of the session create > as SPI, and later original SPI can be updated using session update. > > Signed-off-by: Kiran Kumar K > --- > doc/api/doxy-api-index.md | 3 +- > doc/api/doxy-api.conf.in| 1 + > drivers/net/cnxk/cn10k_ethdev_sec.c | 107 +++- > drivers/net/cnxk/cn9k_ethdev.c | 6 ++ > drivers/net/cnxk/cn9k_ethdev_sec.c | 2 +- > drivers/net/cnxk/cnxk_ethdev.h | 3 +- > drivers/net/cnxk/cnxk_ethdev_sec.c | 30 +--- > drivers/net/cnxk/cnxk_flow.c| 1 + > drivers/net/cnxk/meson.build| 2 + > drivers/net/cnxk/rte_pmd_cnxk.h | 94 > drivers/net/cnxk/version.map| 6 ++ > 11 files changed, 240 insertions(+), 15 deletions(-) > create mode 100644 drivers/net/cnxk/rte_pmd_cnxk.h > > diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md > index 4245b9635c..8f9564ee84 100644 > --- a/doc/api/doxy-api-index.md > +++ b/doc/api/doxy-api-index.md > @@ -56,7 +56,8 @@ The public API headers are grouped by topics: >[dpaa2_qdma] (@ref rte_pmd_dpaa2_qdma.h), >[crypto_scheduler] (@ref rte_cryptodev_scheduler.h), >[dlb2] (@ref rte_pmd_dlb2.h), > - [ifpga] (@ref rte_pmd_ifpga.h) > + [ifpga] (@ref rte_pmd_ifpga.h), > + [cnxk] (@ref rte_pmd_cnxk.h) > > - **memory**: >[memseg] (@ref rte_memory.h), > diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in > index db2ca9b6ed..b49942412d 100644 > --- a/doc/api/doxy-api.conf.in > +++ b/doc/api/doxy-api.conf.in > @@ -12,6 +12,7 @@ INPUT = > @TOPDIR@/doc/api/doxy-api-index.md \ >@TOPDIR@/drivers/net/ark \ >@TOPDIR@/drivers/net/bnxt \ >@TOPDIR@/drivers/net/bonding \ > + @TOPDIR@/drivers/net/cnxk \ >@TOPDIR@/drivers/net/dpaa \ >@TOPDIR@/drivers/net/dpaa2 \ >@TOPDIR@/drivers/net/i40e \ > diff --git a/drivers/net/cnxk/cn10k_ethdev_sec.c > b/drivers/net/cnxk/cn10k_ethdev_sec.c > index 87bb691ab4..60ae5d7d99 100644 > --- a/drivers/net/cnxk/cn10k_ethdev_sec.c > +++ b/drivers/net/cnxk/cn10k_ethdev_sec.c > @@ -6,6 +6,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -502,7 +503,7 @@ cn10k_eth_sec_session_create(void *device, > ROC_NIX_INL_OT_IPSEC_OUTB_SW_RSVD); > > /* Alloc an sa index */ > - rc = cnxk_eth_outb_sa_idx_get(dev, &sa_idx); > + rc = cnxk_eth_outb_sa_idx_get(dev, &sa_idx, ipsec->spi); > if (rc) > goto mempool_put; > > @@ -657,6 +658,109 @@ cn10k_eth_sec_capabilities_get(void *device > __rte_unused) > return cn10k_eth_sec_capabilities; > } > > +static int > +cn10k_eth_sec_session_update(void *device, struct rte_security_session *sess, > + struct rte_security_session_conf *conf) > +{ > + struct rte_eth_dev *eth_dev = (struct rte_eth_dev *)device; > + struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev); > + struct roc_ot_ipsec_inb_sa *inb_sa_dptr; > + struct rte_security_ipsec_xform *ipsec; > + struct rte_crypto_sym_xform *crypto; > + struct cnxk_eth_sec_sess *eth_sec; > + bool inbound; > + int rc; > + > + if (conf->action_type != RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL || > + conf->protocol != RTE_SECURITY_PROTOCOL_IPSEC) > + return -ENOENT; > + > + ipsec = &conf->ipsec; > + crypto = conf->crypto_xform; > + inbound = !!(ipsec->direction == RTE_SECURITY_IPSEC_SA_DIR_INGRESS); > + > + eth_sec = cnxk_eth_sec_sess_get_by_sess(dev, sess); > + if (!eth_sec) > + return -ENOENT; > + > + eth_sec->spi = conf->ipsec.spi; > + > + if (inbound) { > + inb_sa_dptr = (struct roc_ot_ipsec_inb_sa *)dev->inb.sa_dptr; > + memset(inb_sa_dptr, 0, sizeof(struct roc_ot_ipsec_inb_sa)); > + > + rc = cnxk_ot_ipsec_inb_sa_fill(inb_sa_dptr, ipsec, crypto, > +true); > + if (rc) > + return -EINVAL; > + > + rc = roc_nix_inl_ctx_write(&dev->nix, inb_sa_dptr, eth_sec->sa, > +eth_sec->inb, > +sizeof(struct roc_ot_ipsec_inb_sa)); > + if (rc) > + return -EINVAL; > + } else { > + struct roc_ot_ipsec_outb_sa *outb_sa_dptr; > + > + outb_sa_dptr = (struct roc_ot_ipsec_outb_sa *)dev->outb.sa_dptr; > + memset(outb_sa_dptr, 0, sizeof(struct roc_ot
Re: [dpdk-dev] [PATCH v4] ethdev: mtr: support protocol based input color selection
jer...@marvell.com writes: > From: Jerin Jacob > > Currently, meter object supports only DSCP based on input color table, > The patch enhance that to support VLAN based input color table, > color table based on inner field for the tunnel use case, and > support for fallback color per meter if packet based on a different field. > > All of the above features are exposed through capability and added > additional capability to specify the implementation supports > more than one input color table per ethdev port. > > Suggested-by: Cristian Dumitrescu > Signed-off-by: Jerin Jacob > --- > v4..v3: > > - Aligned with community meeting call which is documented in > https://patches.dpdk.org/project/dpdk/patch/20220301085824.1041009-1-sk...@marvell.com/ > as last message. With following exception, > - Used RTE_MTR_COLOR_IN_*_DSCP instead of RTE_MTR_COLOR_IN_*_IP as > there is already dscp_table and rte_mtr_meter_dscp_table_update() API. > Changing above symbols break existing application for no good. > - Updated 22.07 release notes > - Remove testpmd changes from series to finalize the API spec first and > then we can send testpmd changes. > > v3..v2: > > - Fix input color flags as a bitmask > - Add definitions for newly added API > > v2..v1: > - Fix seperate typo > > v1..RFC: > > Address the review comments by Cristian at > https://patches.dpdk.org/project/dpdk/patch/20210820082401.3778736-1-jer...@marvell.com/ > > - Moved to v22.07 release > - Updated rte_mtr_input_color_method to support all VLAN, DSCP, Inner > cases > - Added input_color_method > - Removed union between vlan_table and dscp_table > - Kept VLAN instead of PCP as HW coloring based on DEI(1bit), PCP(3 > bits) > > .../traffic_metering_and_policing.rst | 33 > doc/guides/rel_notes/release_22_07.rst| 10 + > lib/ethdev/rte_mtr.c | 23 +++ > lib/ethdev/rte_mtr.h | 186 +- > lib/ethdev/rte_mtr_driver.h | 19 ++ > lib/ethdev/version.map| 4 + > 6 files changed, 265 insertions(+), 10 deletions(-) > > diff --git a/doc/guides/prog_guide/traffic_metering_and_policing.rst > b/doc/guides/prog_guide/traffic_metering_and_policing.rst > index ceb5a96488..75deabbaf1 100644 > --- a/doc/guides/prog_guide/traffic_metering_and_policing.rst > +++ b/doc/guides/prog_guide/traffic_metering_and_policing.rst > @@ -21,6 +21,7 @@ The main features are: > * Policer actions (per meter output color): recolor, drop > * Statistics (per policer output color) > * Chaining multiple meter objects > +* Protocol based input color selection > > Configuration steps > --- > @@ -105,3 +106,35 @@ traffic meter and policing library. > * Adding one (or multiple) actions of the type > ``RTE_FLOW_ACTION_TYPE_METER`` > to the list of meter actions (``struct > rte_mtr_meter_policy_params::actions``) > specified per color as show in :numref:`figure_rte_mtr_chaining`. > + > +Protocol based input color selection > + > + > +The API supports selecting the input color based on the packet content. > +Following is the API usage model for the same. > + > +#. Probe the protocol based input color selection device capabilities using > + following parameter using ``rte_mtr_capabilities_get()`` API. > + > + * ``struct rte_mtr_capabilities::input_color_proto_mask;`` > + * ``struct rte_mtr_capabilities::separate_input_color_table_per_port`` > + > +#. When creating the meter object using ``rte_mtr_create()``, configure > + relevant input color selection parameters such as > + > + * Input color protocols with ``struct > rte_mtr_params::input_color_proto_mask`` > + > + * If ``struct rte_mtr_params::input_color_proto_mask`` has multiple bits > set then > + ``rte_mtr_color_in_protocol_priority_set()`` shall be used to set the > priority, > + in the order, in which protocol to be used to find the input color. > + > + * Fill the tables ``struct rte_mtr_params::dscp_table``, > + ``struct rte_mtr_params::vlan_table`` based on input color selected. > + > + * Update the ``struct rte_mtr_params::default_input_color`` to determine > + the default input color in case the input packet does not match > + the input color method. > + > + * If needed, update the input color table at runtime using > + ``rte_mtr_meter_vlan_table_update()`` and > ``rte_mtr_meter_dscp_table_update()`` > + APIs. > diff --git a/doc/guides/rel_notes/release_22_07.rst > b/doc/guides/rel_notes/release_22_07.rst > index 42a5f2d990..746622f9b3 100644 > --- a/doc/guides/rel_notes/release_22_07.rst > +++ b/doc/guides/rel_notes/release_22_07.rst > @@ -55,6 +55,13 @@ New Features > Also, make sure to start the actual text at the margin. > === > > +* **Added protocol based input color for meter.** > + > + Added new APIs ``rte_mt
Re: [PATCH] doc: fix support table for ETH and VLAN flow items
On 4/26/2022 9:55 AM, Asaf Penso wrote: -Original Message- From: Ferruh Yigit Sent: Wednesday, April 20, 2022 8:52 PM To: Ilya Maximets ; dev@dpdk.org; Asaf Penso Cc: Ajit Khaparde ; Rahul Lakkireddy ; Hemant Agrawal ; Haiyue Wang ; John Daley ; Guoyang Zhou ; Min Hu (Connor) ; Beilei Xing ; Jingjing Wu ; Qi Zhang ; Rosen Xu ; Matan Azrad ; Slava Ovsiienko ; Liron Himi ; Jiawen Wu ; Ori Kam ; Dekel Peled ; NBU-Contact- Thomas Monjalon (EXTERNAL) ; sta...@dpdk.org; NBU-Contact-Thomas Monjalon (EXTERNAL) Subject: Re: [PATCH] doc: fix support table for ETH and VLAN flow items On 3/16/2022 12:01 PM, Ilya Maximets wrote: 'has_vlan' attribute is only supported by sfc, mlx5 and cnxk. Other drivers doesn't support it. Most of them (like i40e) just ignore it silently. Some drivers (like mlx4) never had a full support of the eth item even before introduction of 'has_vlan' (mlx4 allows to match on the destination MAC only). Same for the 'has_more_vlan' flag of the vlan item. Changing the support level to 'partial' for all such drivers. This doesn't solve the issue, but at least marks the problematic drivers. Hi Asaf, This was the kind of maintanance issue I was referring to have this kind of capability documentation for flow API. Are you referring to the fact that fields like has_vlan are not part of the table? If so, you are right, but IMHO having the high level items still allows the users to understand what is supported quickly. We can have another level of tables per each relevant item to address this specific issue. In this case, we'll have a table for ETH that elaborates the different fields' support, like has_vlan. If you are referring to a different issue, please elaborate. 'vlan' in the .ini file is already to document the flow API VLAN support, so I am not suggesting adding more to the table. My point was it is hard to make this kind documentation correct. All below drivers are using 'RTE_FLOW_ITEM_TYPE_VLAN', the script verifies this, but are they actually supporting VLAN filter and in which case? We need comment from driver maintainers about the support level. @Ori Kam, please comment for mlx driver. Some details are available in: https://bugs.dpdk.org/show_bug.cgi?id=958 Fixes: 09315fc83861 ("ethdev: add VLAN attributes to ethernet and VLAN items") Cc: sta...@dpdk.org Signed-off-by: Ilya Maximets --- I added the stable in CC, but the patch should be extended while backporting. For 21.11 the cnxk driver should be also updated, for 20.11, sfc driver should also be included. doc/guides/nics/features/bnxt.ini | 4 ++-- doc/guides/nics/features/cxgbe.ini | 4 ++-- doc/guides/nics/features/dpaa2.ini | 4 ++-- doc/guides/nics/features/e1000.ini | 2 +- doc/guides/nics/features/enic.ini | 4 ++-- doc/guides/nics/features/hinic.ini | 2 +- doc/guides/nics/features/hns3.ini | 4 ++-- doc/guides/nics/features/i40e.ini | 4 ++-- doc/guides/nics/features/iavf.ini | 4 ++-- doc/guides/nics/features/ice.ini| 4 ++-- doc/guides/nics/features/igc.ini| 2 +- doc/guides/nics/features/ipn3ke.ini | 4 ++-- doc/guides/nics/features/ixgbe.ini | 4 ++-- doc/guides/nics/features/mlx4.ini | 4 ++-- doc/guides/nics/features/mvpp2.ini | 4 ++-- doc/guides/nics/features/tap.ini| 4 ++-- doc/guides/nics/features/txgbe.ini | 4 ++-- 17 files changed, 31 insertions(+), 31 deletions(-) diff --git a/doc/guides/nics/features/bnxt.ini b/doc/guides/nics/features/bnxt.ini index afb5414b49..ac682c5779 100644 --- a/doc/guides/nics/features/bnxt.ini +++ b/doc/guides/nics/features/bnxt.ini @@ -57,7 +57,7 @@ Perf doc = Y [rte_flow items] any = Y -eth = Y +eth = P ipv4 = Y ipv6 = Y gre = Y @@ -71,7 +71,7 @@ represented_port = Y tcp = Y udp = Y vf = Y -vlan = Y +vlan = P vxlan= Y [rte_flow actions] diff --git a/doc/guides/nics/features/cxgbe.ini b/doc/guides/nics/features/cxgbe.ini index f674803ec4..f9912390fb 100644 --- a/doc/guides/nics/features/cxgbe.ini +++ b/doc/guides/nics/features/cxgbe.ini @@ -36,7 +36,7 @@ x86-64 = Y Usage doc= Y [rte_flow items] -eth = Y +eth = P ipv4 = Y ipv6 = Y pf = Y @@ -44,7 +44,7 @@ phy_port = Y tcp = Y udp = Y vf = Y -vlan = Y +vlan = P [rte_flow actions] count= Y diff --git a/doc/guides/nics/features/dpaa2.ini b/doc/guides/nics/features/dpaa2.ini index 4c06841a87..09ce66c788 100644 --- a/doc/guides/nics/features/dpaa2.ini +++ b/doc/guides/nics/features/dpaa2.ini @@ -31,7 +31,7 @@ ARMv8
[PATCH v5 0/3] ethdev: introduce protocol based buffer split
From: Wenxuan Wu Protocol based buffer split consists of splitting a received packet into two separate regions based on the packet content. It is useful in some scenarios, such as GPU acceleration. The splitting will help to enable true zero copy and hence improve the performance significantly. This patchset aims to support protocol split based on current buffer split. When Rx queue is configured with RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT offload and corresponding protocol, packets received will be directly split into two different mempools. v4->v5: * Use protocol and mbuf_offset based buffer split instead of header split. * Use RTE_PTYPE* instead of enum rte_eth_rx_header_split_protocol_type. * Improve the description of rte_eth_rxseg_split.proto. v3->v4: * Use RTE_ETH_RX_HEADER_SPLIT_NONE instead of 0. v2->v3: * Fix a PMD bug. * Add rx queue header split check. * Revise the log and doc. v1->v2: * Add support for all header split protocol types. Wenxuan Wu (3): ethdev: introduce protocol type based buffer split app/testpmd: add proto based buffer split config net/ice: support proto based buf split in Rx path app/test-pmd/cmdline.c| 118 ++ app/test-pmd/testpmd.c| 7 +- app/test-pmd/testpmd.h| 2 + drivers/net/ice/ice_ethdev.c | 10 +- drivers/net/ice/ice_rxtx.c| 217 ++ drivers/net/ice/ice_rxtx.h| 16 ++ drivers/net/ice/ice_rxtx_vec_common.h | 3 + lib/ethdev/rte_ethdev.c | 36 - lib/ethdev/rte_ethdev.h | 21 ++- 9 files changed, 388 insertions(+), 42 deletions(-) -- 2.25.1
[PATCH v5 1/4] lib/ethdev: introduce protocol type based buffer split
From: Wenxuan Wu Protocol based buffer split consists of splitting a received packet into two separate regions based on its content. The split happens after the packet protocol header and before the packet payload. Splitting is usually between the packet protocol header that can be posted to a dedicated buffer and the packet payload that can be posted to a different buffer. Currently, Rx buffer split supports length and offset based packet split. protocol split is based on buffer split, configuring length of buffer split is not suitable for NICs that do split based on protocol types. Because tunneling makes the conversion from length to protocol type impossible. This patch extends the current buffer split to support protocol and offset based buffer split. A new proto field is introduced in the rte_eth_rxseg_split structure reserved field to specify header protocol type. With Rx queue offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and corresponding protocol type configured. PMD will split the ingress packets into two separate regions. Currently, both inner and outer L2/L3/L4 level protocol based buffer split can be supported. For example, let's suppose we configured the Rx queue with the following segments: seg0 - pool0, off0=2B seg1 - pool1, off1=128B With protocol split type configured with RTE_PTYPE_L4_UDP. The packet consists of MAC_IP_UDP_PAYLOAD will be splitted like following: seg0 - udp header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0 seg1 - payload @ 128 in mbuf from pool1 The memory attributes for the split parts may differ either - for example the mempool0 and mempool1 belong to dpdk memory and external memory, respectively. Signed-off-by: Xuan Ding Signed-off-by: Yuan Wang Signed-off-by: Wenxuan Wu Reviewed-by: Qi Zhang --- lib/ethdev/rte_ethdev.c | 36 +--- lib/ethdev/rte_ethdev.h | 15 ++- 2 files changed, 43 insertions(+), 8 deletions(-) diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index 29a3d80466..1a2bc172ab 100644 --- a/lib/ethdev/rte_ethdev.c +++ b/lib/ethdev/rte_ethdev.c @@ -1661,6 +1661,7 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg, struct rte_mempool *mpl = rx_seg[seg_idx].mp; uint32_t length = rx_seg[seg_idx].length; uint32_t offset = rx_seg[seg_idx].offset; + uint32_t proto = rx_seg[seg_idx].proto; if (mpl == NULL) { RTE_ETHDEV_LOG(ERR, "null mempool pointer\n"); @@ -1694,13 +1695,34 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg, } offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM; *mbp_buf_size = rte_pktmbuf_data_room_size(mpl); - length = length != 0 ? length : *mbp_buf_size; - if (*mbp_buf_size < length + offset) { - RTE_ETHDEV_LOG(ERR, - "%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n", - mpl->name, *mbp_buf_size, - length + offset, length, offset); - return -EINVAL; + if (proto == 0) { + length = length != 0 ? length : *mbp_buf_size; + if (*mbp_buf_size < length + offset) { + RTE_ETHDEV_LOG(ERR, + "%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n", + mpl->name, *mbp_buf_size, + length + offset, length, offset); + return -EINVAL; + } + } else { + /* Ensure n_seg is 2 in protocol based buffer split. */ + if (n_seg != 2) { + RTE_ETHDEV_LOG(ERR, "number of buffer split protocol segments should be 2.\n"); + return -EINVAL; + } + /* Length and protocol are exclusive here, so make sure length is 0 in protocol + based buffer split. */ + if (length != 0) { + RTE_ETHDEV_LOG(ERR, "segment length should be set to zero in buffer split\n"); + return -EINVAL; + } + if (*mbp_buf_size < offset) { + RTE_ETHDEV_LOG(ERR, + "%s mbuf_data_room_size %u < %u segment offset)\n", + mpl->name, *mbp_buf_size, + offset); + return -EINVAL; + } } } return 0; diff --git a/li
[PATCH v5 2/4] app/testpmd: add proto based buffer split config
From: Wenxuan Wu This patch adds protocol based buffer split configuration in testpmd. The protocol split feature is off by default. To enable protocol split, you need: 1. Start testpmd with two mempools. e.g. --mbuf-size=2048,2048 2. Configure Rx queue with rx_offload buffer split on. 3. Set the protocol type of buffer split. Testpmd View: testpmd>port config rx_offload buffer_split on testpmd>port config buffer_split mac|ipv4|ipv6|l3|tcp|udp|sctp| l4|inner_mac|inner_ipv4|inner_ipv6|inner_l3|inner_tcp| inner_udp|inner_sctp|inner_l4 Signed-off-by: Xuan Ding Signed-off-by: Yuan Wang Signed-off-by: Wenxuan Wu Reviewed-by: Qi Zhang --- app/test-pmd/cmdline.c | 118 + app/test-pmd/testpmd.c | 7 +-- app/test-pmd/testpmd.h | 2 + 3 files changed, 124 insertions(+), 3 deletions(-) diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index 6ffea8e21a..5cd4beca95 100644 --- a/app/test-pmd/cmdline.c +++ b/app/test-pmd/cmdline.c @@ -866,6 +866,12 @@ static void cmd_help_long_parsed(void *parsed_result, " Enable or disable a per port Rx offloading" " on all Rx queues of a port\n\n" + "port config buffer_split mac|ipv4|ipv6|l3|tcp|udp|sctp|l4|" + "inner_mac|inner_ipv4|inner_ipv6|inner_l3|inner_tcp|" + "inner_udp|inner_sctp|inner_l4\n" + " Configure protocol type for buffer split" + " on all Rx queues of a port\n\n" + "port (port_id) rxq (queue_id) rx_offload vlan_strip|" "ipv4_cksum|udp_cksum|tcp_cksum|tcp_lro|qinq_strip|" "outer_ipv4_cksum|macsec_strip|header_split|" @@ -16353,6 +16359,117 @@ cmdline_parse_inst_t cmd_config_per_port_rx_offload = { } }; +/* config a per port buffer split protocol */ +struct cmd_config_per_port_buffer_split_protocol_result { + cmdline_fixed_string_t port; + cmdline_fixed_string_t config; + uint16_t port_id; + cmdline_fixed_string_t buffer_split; + cmdline_fixed_string_t protocol; +}; + +cmdline_parse_token_string_t cmd_config_per_port_buffer_split_protocol_result_port = + TOKEN_STRING_INITIALIZER + (struct cmd_config_per_port_buffer_split_protocol_result, +port, "port"); +cmdline_parse_token_string_t cmd_config_per_port_buffer_split_protocol_result_config = + TOKEN_STRING_INITIALIZER + (struct cmd_config_per_port_buffer_split_protocol_result, +config, "config"); +cmdline_parse_token_num_t cmd_config_per_port_buffer_split_protocol_result_port_id = + TOKEN_NUM_INITIALIZER + (struct cmd_config_per_port_buffer_split_protocol_result, +port_id, RTE_UINT16); +cmdline_parse_token_string_t cmd_config_per_port_buffer_split_protocol_result_buffer_split = + TOKEN_STRING_INITIALIZER + (struct cmd_config_per_port_buffer_split_protocol_result, +buffer_split, "buffer_split"); +cmdline_parse_token_string_t cmd_config_per_port_buffer_split_protocol_result_protocol = + TOKEN_STRING_INITIALIZER + (struct cmd_config_per_port_buffer_split_protocol_result, +protocol, "mac#ipv4#ipv6#l3#tcp#udp#sctp#l4#" + "inner_mac#inner_ipv4#inner_ipv6#inner_l3#inner_tcp#" + "inner_udp#inner_sctp#inner_l4"); + +static void +cmd_config_per_port_buffer_split_protocol_parsed(void *parsed_result, + __rte_unused struct cmdline *cl, + __rte_unused void *data) +{ + struct cmd_config_per_port_buffer_split_protocol_result *res = parsed_result; + portid_t port_id = res->port_id; + struct rte_port *port = &ports[port_id]; + uint32_t protocol; + + if (port_id_is_invalid(port_id, ENABLED_WARN)) + return; + + if (port->port_status != RTE_PORT_STOPPED) { + fprintf(stderr, + "Error: Can't config offload when Port %d is not stopped\n", + port_id); + return; + } + + if (!strcmp(res->protocol, "mac")) + protocol = RTE_PTYPE_L2_ETHER; + else if (!strcmp(res->protocol, "ipv4")) + protocol = RTE_PTYPE_L3_IPV4; + else if (!strcmp(res->protocol, "ipv6")) + protocol = RTE_PTYPE_L3_IPV6; + else if (!strcmp(res->protocol, "l3")) + protocol = RTE_PTYPE_L3_IPV4|RTE_PTYPE_L3_IPV6; + else if (!strcmp(res->protocol, "tcp")) + protocol = RTE_PTYPE_L4_TCP; + else if (!strcmp(res->protocol, "udp")) + protocol = RTE_PTYPE_L4_UDP; + else if (!strcmp(res->protocol, "sctp")) + protocol = RTE_PTYPE_L4_SCTP; + else if (!str
[PATCH v5 3/4] net/ice: support proto based buf split in Rx path
From: Wenxuan Wu This patch adds support for proto based buffer split in normal Rx data paths. When the Rx queue is configured with specific protocol type, packets received will be directly splitted into protocol header and payload parts. And the two parts will be put into different mempools. Currently, protocol based buffer split is not supported in vectorized paths. Signed-off-by: Xuan Ding Signed-off-by: Yuan Wang Signed-off-by: Wenxuan Wu Reviewed-by: Qi Zhang --- drivers/net/ice/ice_ethdev.c | 10 +- drivers/net/ice/ice_rxtx.c| 219 ++ drivers/net/ice/ice_rxtx.h| 16 ++ drivers/net/ice/ice_rxtx_vec_common.h | 3 + 4 files changed, 216 insertions(+), 32 deletions(-) diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c index 73e550f5fb..ce3f49c863 100644 --- a/drivers/net/ice/ice_ethdev.c +++ b/drivers/net/ice/ice_ethdev.c @@ -3713,7 +3713,8 @@ ice_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info) RTE_ETH_RX_OFFLOAD_OUTER_IPV4_CKSUM | RTE_ETH_RX_OFFLOAD_VLAN_EXTEND | RTE_ETH_RX_OFFLOAD_RSS_HASH | - RTE_ETH_RX_OFFLOAD_TIMESTAMP; + RTE_ETH_RX_OFFLOAD_TIMESTAMP | + RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT; dev_info->tx_offload_capa |= RTE_ETH_TX_OFFLOAD_QINQ_INSERT | RTE_ETH_TX_OFFLOAD_IPV4_CKSUM | @@ -3725,7 +3726,7 @@ ice_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info) dev_info->flow_type_rss_offloads |= ICE_RSS_OFFLOAD_ALL; } - dev_info->rx_queue_offload_capa = 0; + dev_info->rx_queue_offload_capa = RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT; dev_info->tx_queue_offload_capa = RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE; dev_info->reta_size = pf->hash_lut_size; @@ -3794,6 +3795,11 @@ ice_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info) dev_info->default_rxportconf.ring_size = ICE_BUF_SIZE_MIN; dev_info->default_txportconf.ring_size = ICE_BUF_SIZE_MIN; + dev_info->rx_seg_capa.max_nseg = ICE_RX_MAX_NSEG; + dev_info->rx_seg_capa.multi_pools = 1; + dev_info->rx_seg_capa.offset_allowed = 0; + dev_info->rx_seg_capa.offset_align_log2 = 0; + return 0; } diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c index 2dd2637fbb..8cbcee3543 100644 --- a/drivers/net/ice/ice_rxtx.c +++ b/drivers/net/ice/ice_rxtx.c @@ -282,7 +282,6 @@ ice_program_hw_rx_queue(struct ice_rx_queue *rxq) /* Set buffer size as the head split is disabled. */ buf_size = (uint16_t)(rte_pktmbuf_data_room_size(rxq->mp) - RTE_PKTMBUF_HEADROOM); - rxq->rx_hdr_len = 0; rxq->rx_buf_len = RTE_ALIGN(buf_size, (1 << ICE_RLAN_CTX_DBUF_S)); rxq->max_pkt_len = RTE_MIN((uint32_t)ICE_SUPPORT_CHAIN_NUM * rxq->rx_buf_len, @@ -311,11 +310,52 @@ ice_program_hw_rx_queue(struct ice_rx_queue *rxq) memset(&rx_ctx, 0, sizeof(rx_ctx)); + if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) { + switch (rxq->rxseg[0].proto) { + case RTE_PTYPE_L2_ETHER: + rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT; + rx_ctx.hsplit_1 = ICE_RLAN_RX_HSPLIT_1_SPLIT_L2; + break; + case RTE_PTYPE_INNER_L2_ETHER: + rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT; + rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_L2; + break; + case RTE_PTYPE_L3_IPV4: + case RTE_PTYPE_L3_IPV6: + case RTE_PTYPE_INNER_L3_IPV4: + case RTE_PTYPE_INNER_L3_IPV6: + rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT; + rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_IP; + break; + case RTE_PTYPE_L4_TCP: + case RTE_PTYPE_L4_UDP: + case RTE_PTYPE_INNER_L4_TCP: + case RTE_PTYPE_INNER_L4_UDP: + rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT; + rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_TCP_UDP; + break; + case RTE_PTYPE_L4_SCTP: + case RTE_PTYPE_INNER_L4_SCTP: + rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT; + rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_SCTP; + break; + case 0: + PMD_DRV_LOG(ERR, "Buffer split protocol must be configured"); + return -EINVAL; + default: + PMD_DRV_LOG(ERR, "Buffer split protocol is not supported"); + return -EINVAL; + } + rxq->
RE: [dpdk-dev] [PATCH v4] ethdev: mtr: support protocol based input color selection
Hi Jerin, Thank you for implementing according to our agreement, I am happy to see that we are converging. Here are some comments below: > diff --git a/lib/ethdev/rte_mtr.h b/lib/ethdev/rte_mtr.h > index 40df0888c8..76ffbcf724 100644 > --- a/lib/ethdev/rte_mtr.h > +++ b/lib/ethdev/rte_mtr.h > @@ -213,6 +213,52 @@ struct rte_mtr_meter_policy_params { > const struct rte_flow_action *actions[RTE_COLORS]; > }; > > +/** > + * Input color protocol method I suggest adding some more explanations here: More than one method can be enabled for a given meter. Even if enabled, a method might not be applicable to each input packet, in case the associated protocol header is not present in the packet. The highest priority method that is both enabled for the meter and also applicable for the current input packet wins; if none is both enabled and applicable, the default input color is used. @see function rte_mtr_color_in_protocol_priority_set() > + */ > +enum rte_mtr_color_in_protocol { > + /** > + * If the input packet has at least one VLAN label, its input color is > + * detected by the outermost VLAN DEI(1bit), PCP(3 bits) > + * indexing into the struct rte_mtr_params::vlan_table. > + * Otherwise, the *default_input_color* is applied. > + * The statement "Otherwise, the *default_input_color* is applied" is incorrect IMO and should be removed, as multiple methods might be enabled and also applicable to a specific input packet, in which case the highest priority method wins, as opposed to the default input color. I suggest a simplification "Enable the detection of the packet input color based on the outermost VLAN header fields DEI (1 bit) and PCP (3 bits). These fields are used as index into the VLAN table" > + * @see struct rte_mtr_params::default_input_color > + * @see struct rte_mtr_params::vlan_table > + */ > + RTE_MTR_COLOR_IN_PROTO_OUTER_VLAN = RTE_BIT64(0), > + /** > + * If the input packet has at least one VLAN label, its input color is > + * detected by the innermost VLAN DEI(1bit), PCP(3 bits) > + * indexing into the struct rte_mtr_params::vlan_table. > + * Otherwise, the *default_input_color* is applied. > + * > + * @see struct rte_mtr_params::default_input_color > + * @see struct rte_mtr_params::vlan_table > + */ Same simplification suggested here. > + RTE_MTR_COLOR_IN_PROTO_INNER_VLAN = RTE_BIT64(1), > + /** > + * If the input packet is IPv4 or IPv6, its input color is detected by > + * the outermost DSCP field indexing into the > + * struct rte_mtr_params::dscp_table. > + * Otherwise, the *default_input_color* is applied. > + * > + * @see struct rte_mtr_params::default_input_color > + * @see struct rte_mtr_params::dscp_table > + */ Same simplification suggested here. > + RTE_MTR_COLOR_IN_PROTO_OUTER_DSCP = RTE_BIT64(2), I am OK to keep DSCP for the name of the table instead of renaming the table, as you suggested, but this method name should reflect the protocol, not the field: RTE_MTR_COLOR_IN_PROTO_OUTER_IP. > + /** > + * If the input packet is IPv4 or IPv6, its input color is detected by > + * the innermost DSCP field indexing into the > + * struct rte_mtr_params::dscp_table. > + * Otherwise, the *default_input_color* is applied. > + * > + * @see struct rte_mtr_params::default_input_color > + * @see struct rte_mtr_params::dscp_table > + */ Same simplification suggested here. > + RTE_MTR_COLOR_IN_PROTO_INNER_DSCP = RTE_BIT64(3), I am OK to keep DSCP for the name of the table instead of renaming the table, as you suggested, but this method name should reflect the protocol, not the field: RTE_MTR_COLOR_IN_PROTO_INNER_IP. > + > /** > * Parameters for each traffic metering & policing object > * > @@ -233,20 +279,58 @@ struct rte_mtr_params { >*/ > int use_prev_mtr_color; > > - /** Meter input color. When non-NULL: it points to a pre-allocated and > + /** Meter input color based on IP DSCP protocol field. > + * > + * Valid when *input_color_proto_mask* set to any of the following > + * RTE_MTR_COLOR_IN_PROTO_OUTER_DSCP, > + * RTE_MTR_COLOR_IN_PROTO_INNER_DSCP > + * > + * When non-NULL: it points to a pre-allocated and >* pre-populated table with exactly 64 elements providing the input >* color for each value of the IPv4/IPv6 Differentiated Services Code > - * Point (DSCP) input packet field. When NULL: it is equivalent to > - * setting this parameter to an all-green populated table (i.e. table > - * with all the 64 elements set to green color). The color blind mode > - * is configured by setting *use_prev_mtr_color* to 0 and *dscp_table* > - * to either NULL or to an all-green populated table. When > - * *use_prev_mtr_color* is non-zero value or when *dscp_table* > contains > -
[PATCH] net/ivaf: make reset wait time longer
From: Wenxuan Wu In 810 CA series, reset time would be longer to wait the kernel return value. this patch enable reset wait time longer to finish kernel reset operation. Signed-off-by: Wenxuan Wu --- drivers/net/iavf/iavf.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/iavf/iavf.h b/drivers/net/iavf/iavf.h index a01d18e61b..6183fc40d6 100644 --- a/drivers/net/iavf/iavf.h +++ b/drivers/net/iavf/iavf.h @@ -18,7 +18,7 @@ #define IAVF_AQ_LEN 32 #define IAVF_AQ_BUF_SZ4096 -#define IAVF_RESET_WAIT_CNT 50 +#define IAVF_RESET_WAIT_CNT 100 #define IAVF_BUF_SIZE_MIN 1024 #define IAVF_FRAME_SIZE_MAX 9728 #define IAVF_QUEUE_BASE_ADDR_UNIT 128 -- 2.25.1
[RFC] eal: allow worker lcore stacks to be allocated from hugepage memory
Add support for using hugepages for worker lcore stack memory. The intent is to improve performance by reducing stack memory related TLB misses and also by using memory local to the NUMA node of each lcore. Platforms desiring to make use of this capability must enable the associated option flag and stack size settings in platform config files. --- lib/eal/linux/eal.c | 39 +++ 1 file changed, 39 insertions(+) diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c index 1ef263434a..4e1e5b6915 100644 --- a/lib/eal/linux/eal.c +++ b/lib/eal/linux/eal.c @@ -1143,9 +1143,48 @@ rte_eal_init(int argc, char **argv) lcore_config[i].state = WAIT; +#ifdef RTE_EAL_NUMA_AWARE_LCORE_STACK + /* Allocate NUMA aware stack memory and set pthread attributes */ + pthread_attr_t attr; + void *stack_ptr = + rte_zmalloc_socket("lcore_stack", + RTE_EAL_NUMA_AWARE_LCORE_STACK_SIZE, + RTE_EAL_NUMA_AWARE_LCORE_STACK_SIZE, + rte_lcore_to_socket_id(i)); + + if (stack_ptr == NULL) { + rte_eal_init_alert("Cannot allocate stack memory"); + rte_errno = ENOMEM; + return -1; + } + + if (pthread_attr_init(&attr) != 0) { + rte_eal_init_alert("Cannot init pthread attributes"); + rte_errno = EINVAL; + return -1; + } + if (pthread_attr_setstack(&attr, + stack_ptr, + RTE_EAL_NUMA_AWARE_LCORE_STACK_SIZE) != 0) { + rte_eal_init_alert("Cannot set pthread stack attributes"); + rte_errno = ENOTSUP; + return -1; + } + + /* create a thread for each lcore */ + ret = pthread_create(&lcore_config[i].thread_id, &attr, +eal_thread_loop, (void *)(uintptr_t)i); + + if (pthread_attr_destroy(&attr) != 0) { + rte_eal_init_alert("Cannot destroy pthread attributes"); + rte_errno = EFAULT; + return -1; + } +#else /* create a thread for each lcore */ ret = pthread_create(&lcore_config[i].thread_id, NULL, eal_thread_loop, (void *)(uintptr_t)i); +#endif if (ret != 0) rte_panic("Cannot create thread\n"); -- 2.17.1
Re: [PATCH 2/3] mem: fix ASan shadow for remapped memory segments
On 21-Apr-22 2:18 PM, Burakov, Anatoly wrote: On 21-Apr-22 10:37 AM, David Marchand wrote: On Wed, Apr 20, 2022 at 4:47 PM Burakov, Anatoly wrote: On 15-Apr-22 6:31 PM, David Marchand wrote: When releasing some memory, the allocator can choose to return some pages to the OS. At the same time, this memory was poisoned in ASAn shadow. Doing the latter made it impossible to remap this same page later. On the other hand, without this poison, the OS would pagefault in any case for this page. Remove the poisoning for unmapped pages. Bugzilla ID: 994 Fixes: 6cc51b1293ce ("mem: instrument allocator for ASan") Cc: sta...@dpdk.org Signed-off-by: David Marchand --- lib/eal/common/malloc_elem.h | 4 lib/eal/common/malloc_heap.c | 12 +++- 2 files changed, 15 insertions(+), 1 deletion(-) diff --git a/lib/eal/common/malloc_elem.h b/lib/eal/common/malloc_elem.h index 228f178418..b859003722 100644 --- a/lib/eal/common/malloc_elem.h +++ b/lib/eal/common/malloc_elem.h @@ -272,6 +272,10 @@ old_malloc_size(struct malloc_elem *elem) #else /* !RTE_MALLOC_ASAN */ +static inline void +asan_set_zone(void *ptr __rte_unused, size_t len __rte_unused, + uint32_t val __rte_unused) { } + static inline void asan_set_freezone(void *ptr __rte_unused, size_t size __rte_unused) { } diff --git a/lib/eal/common/malloc_heap.c b/lib/eal/common/malloc_heap.c index 6c572b6f2c..5913d9f862 100644 --- a/lib/eal/common/malloc_heap.c +++ b/lib/eal/common/malloc_heap.c @@ -860,6 +860,7 @@ malloc_heap_free(struct malloc_elem *elem) size_t len, aligned_len, page_sz; struct rte_memseg_list *msl; unsigned int i, n_segs, before_space, after_space; + bool unmapped_pages = false; int ret; const struct internal_config *internal_conf = eal_get_internal_configuration(); @@ -999,6 +1000,13 @@ malloc_heap_free(struct malloc_elem *elem) /* don't care if any of this fails */ malloc_heap_free_pages(aligned_start, aligned_len); + /* + * Clear any poisoning in ASan for the associated pages so that + * next time EAL maps those pages, the allocator can access + * them. + */ + asan_set_zone(aligned_start, aligned_len, 0x00); + unmapped_pages = true; request_sync(); } else { @@ -1032,7 +1040,9 @@ malloc_heap_free(struct malloc_elem *elem) rte_mcfg_mem_write_unlock(); free_unlock: - asan_set_freezone(asan_ptr, asan_data_len); + /* Poison memory range if belonging to some still mapped pages. */ + if (!unmapped_pages) + asan_set_freezone(asan_ptr, asan_data_len); rte_spinlock_unlock(&(heap->lock)); return ret; I suspect the patch should be a little more complicated than that. When we unmap pages, we don't necessarily unmap the entire malloc element, it could be that we have a freed allocation like so: | malloc header | free space | unmapped space | free space | next malloc header | So, i think the freezone should be set from asan_ptr till aligned_start, and then from (aligned_start + aligned_len) till (asan_ptr + asan_data_len). Does that make sense? (btw, I get a bounce for Zhihong mail address, is he not working at Intel anymore?) To be honest, I don't understand if we can get to this situation :-) (especially the free space after the unmapped region). But I guess you mean something like (on top of current patch): @@ -1040,9 +1040,25 @@ malloc_heap_free(struct malloc_elem *elem) rte_mcfg_mem_write_unlock(); free_unlock: - /* Poison memory range if belonging to some still mapped pages. */ - if (!unmapped_pages) + if (!unmapped_pages) { asan_set_freezone(asan_ptr, asan_data_len); + } else { + /* + * We may be in a situation where we unmapped pages like this: + * malloc header | free space | unmapped space | free space | malloc header + */ + void *free1_start = asan_ptr; + void *free1_end = aligned_start; + void *free2_start = RTE_PTR_ADD(aligned_start, aligned_len); + void *free2_end = RTE_PTR_ADD(asan_ptr, asan_data_len); + + if (free1_start < free1_end) + asan_set_freezone(free1_start, + RTE_PTR_DIFF(free1_end, free1_start)); + if (free2_start < free2_end) + asan_set_freezone(free2_start, + RTE_PTR_DIFF(free2_end, free2_start)); + } rte_spinlock_unlock(&(heap->lock)); return ret; Something like that, yes. I will have to think through this a bit more, especially in light of your func_reentrancy splat :) So, the reason splat in func_reentrancy test happens is as follows: the above patch is sorta correct (i have a different one but does
RE: [PATCH v2] net/ice: optimize max queue number calculation
> -Original Message- > From: Zhang, Qi Z > Sent: Friday, April 8, 2022 7:24 PM > To: Yang, Qiming ; Wu, Wenjun1 > > Cc: dev@dpdk.org; Zhang, Qi Z > Subject: [PATCH v2] net/ice: optimize max queue number calculation > > Remove the limitation that max queue pair number must be 2^n. > With this patch, even on a 8 ports device, the max queue pair number > increased from 128 to 254. > > Signed-off-by: Qi Zhang > --- > > v2: > - fix check patch warning > > drivers/net/ice/ice_ethdev.c | 24 > 1 file changed, 20 insertions(+), 4 deletions(-) > > diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c index > 73e550f5fb..ff2b3e45d9 100644 > --- a/drivers/net/ice/ice_ethdev.c > +++ b/drivers/net/ice/ice_ethdev.c > @@ -819,10 +819,26 @@ ice_vsi_config_tc_queue_mapping(struct ice_vsi > *vsi, > return -ENOTSUP; > } > > - vsi->nb_qps = RTE_MIN(vsi->nb_qps, ICE_MAX_Q_PER_TC); > - fls = (vsi->nb_qps == 0) ? 0 : rte_fls_u32(vsi->nb_qps) - 1; > - /* Adjust the queue number to actual queues that can be applied */ > - vsi->nb_qps = (vsi->nb_qps == 0) ? 0 : 0x1 << fls; > + /* vector 0 is reserved and 1 vector for ctrl vsi */ > + if (vsi->adapter->hw.func_caps.common_cap.num_msix_vectors < 2) > + vsi->nb_qps = 0; > + else > + vsi->nb_qps = RTE_MIN > + ((uint16_t)vsi->adapter- > >hw.func_caps.common_cap.num_msix_vectors - 2, > + RTE_MIN(vsi->nb_qps, ICE_MAX_Q_PER_TC)); > + > + /* nb_qps(hex) -> fls */ > + /* -> 0 */ > + /* 0001 -> 0 */ > + /* 0002 -> 1 */ > + /* 0003 ~ 0004 -> 2 */ > + /* 0005 ~ 0008 -> 3 */ > + /* 0009 ~ 0010 -> 4 */ > + /* 0011 ~ 0020 -> 5 */ > + /* 0021 ~ 0040 -> 6 */ > + /* 0041 ~ 0080 -> 7 */ > + /* 0081 ~ 0100 -> 8 */ > + fls = (vsi->nb_qps == 0) ? 0 : rte_fls_u32(vsi->nb_qps - 1); > > qp_idx = 0; > /* Set tc and queue mapping with VSI */ > -- > 2.26.2 Acked-by: Wenjun Wu < wenjun1...@intel.com> Thanks Wenjun
RE: [RFC] ethdev: datapath-focused meter actions
Hi Alexander, After reviewing this RFC, I have to say that your proposal is very unclear to me. I don't understand what is the problem you're trying to solve and what exactly is that you cannot do with the current meter and flow APIs. I suggest we get together for a community call with all the interested folks invited in order to get more clarity on your proposal, thank you! > -Original Message- > From: Jerin Jacob > Sent: Friday, April 8, 2022 9:21 AM > To: Alexander Kozyrev ; Dumitrescu, Cristian > > Cc: dpdk-dev ; Ori Kam ; Thomas > Monjalon ; Ivan Malov ; > Andrew Rybchenko ; Yigit, Ferruh > ; Awal, Mohammad Abdul > ; Zhang, Qi Z ; > Jerin Jacob ; Ajit Khaparde > ; Richardson, Bruce > > Subject: Re: [RFC] ethdev: datapath-focused meter actions > > + @Cristian Dumitrescu meter maintainer. > > > On Fri, Apr 8, 2022 at 8:17 AM Alexander Kozyrev > wrote: > > > > The introduction of asynchronous flow rules operations allowed users > > to create/destroy flow rules as part of the datapath without blocking > > on Flow API and slowing the packet processing down. > > > > That applies to every possible action that has no preparation steps. > > Unfortunately, one notable exception is the meter action. > > There is a separate API to prepare a meter profile and a meter policy > > before any meter object can be used as a flow rule action. I disagree. Creation of meter policies and meter objects is decoupled from the flow creation. Meter policies and meter objects can all be created at initialization or on-the-fly, and their creation does not directly require the data plane to be stopped. Please explain what problem are you trying to fix here. I suggest you provide the sequence diagram and tell us where the problem is. > > > > The application logic is the following: > > 1. rte_mtr_meter_profile_add() is called to create the meter profile > > first to define how to classify incoming packets and to assign an > > appropriate color to them. > > 2. rte_mtr_meter_policy_add() is invoked to define the fate of a packet, > > based on its color (practically creating flow rules, matching colors). Nope, the policy add does not create any flows. In fact, it does not create any meter objects either. It simply defines a configuration pattern that can be reused many times when meter objects are created afterwards. > > 3. rte_mtr_create() is then needed to search (with locks) for previously > > created profile and policy in order to create the meter object. The rte_mtr_create() is not created at the time the flow is created, but at a prior decoupled moment. I don't see any issue here. > > 4. rte_flow_create() is now finally can be used to specify the created > > meter as an action. > > > > This approach doesn't fit into the asynchronous rule creation model > > and can be improved with the following proposal: Again, the creation of meter policies and objects is decoupled from the flow creation; in fact, the meter policies and objects must be created before the flows using them are created. > > 1. Creating a policy may be replaced with the creation of a group with > > up to 3 different rules for every color using asynchronous Flow API. > > That requires the introduction of a new pattern item - meter color. > > Then creation a flow rule with the meter means a simple jump to a group: > > rte_flow_async_create(group=1, pattern=color, actions=...); > > rte_flow_async_create(group=0, pattern=5-tuple, > > actions=meter,jump group 1); > > This allows to classify packets and act upon their color classifications. > > The Meter action assigns a color to a packet and an appropriate action > > is selected based on the Meter color in group 1. > > The meter objects requires a relatively complex configuration procedure. This is one of the reasons meters have their own API, so we can keep that complexity away from the flow API. You seem to indicate that your desired behavior is to create the meter objects when the flow is created rather than in advance. Did I get it correctly? This is possible with the current API as well by simply creating the meter object immediately before the flow gets created. Stitching the creation of new meter object to the flow creation (if I understand your approach right) doe not allow for some important features, such as: -reusing meter objects that were previously created by reassigning them to a different flow -having multiple flows use the same shared meter. > > 2. Preparing a meter object should be the part of flow rule creation Why?? Please take some time to clearly explain this, your entire proposal seems to be predicated on this assertion being true. > > and use the same flow queue to benefit from asynchronous operations: > > rte_flow_async_create(group=0, pattern=5-tuple, > > actions=meter id 1 profile rfc2697, jump group 1); > > Creation of the meter object takes time and flow creation must wait > > until it is ready be
RE: [RFC] ethdev: datapath-focused meter actions
I forgot to mention: besides the my statement at the top of my reply, there are many comments inline below :) > -Original Message- > From: Dumitrescu, Cristian > Sent: Tuesday, April 26, 2022 2:44 PM > To: Jerin Jacob ; Alexander Kozyrev > > Cc: dpdk-dev ; Ori Kam ; Thomas > Monjalon ; Ivan Malov ; > Andrew Rybchenko ; Yigit, Ferruh > ; Awal, Mohammad Abdul > ; Zhang, Qi Z ; > Jerin Jacob ; Ajit Khaparde > ; Richardson, Bruce > > Subject: RE: [RFC] ethdev: datapath-focused meter actions > > Hi Alexander, > > After reviewing this RFC, I have to say that your proposal is very unclear to > me. > I don't understand what is the problem you're trying to solve and what exactly > is that you cannot do with the current meter and flow APIs. > > I suggest we get together for a community call with all the interested folks > invited in order to get more clarity on your proposal, thank you! > > > -Original Message- > > From: Jerin Jacob > > Sent: Friday, April 8, 2022 9:21 AM > > To: Alexander Kozyrev ; Dumitrescu, Cristian > > > > Cc: dpdk-dev ; Ori Kam ; Thomas > > Monjalon ; Ivan Malov ; > > Andrew Rybchenko ; Yigit, Ferruh > > ; Awal, Mohammad Abdul > > ; Zhang, Qi Z ; > > Jerin Jacob ; Ajit Khaparde > > ; Richardson, Bruce > > > > Subject: Re: [RFC] ethdev: datapath-focused meter actions > > > > + @Cristian Dumitrescu meter maintainer. > > > > > > On Fri, Apr 8, 2022 at 8:17 AM Alexander Kozyrev > > wrote: > > > > > > The introduction of asynchronous flow rules operations allowed users > > > to create/destroy flow rules as part of the datapath without blocking > > > on Flow API and slowing the packet processing down. > > > > > > That applies to every possible action that has no preparation steps. > > > Unfortunately, one notable exception is the meter action. > > > There is a separate API to prepare a meter profile and a meter policy > > > before any meter object can be used as a flow rule action. > > I disagree. Creation of meter policies and meter objects is decoupled from the > flow creation. Meter policies and meter objects can all be created at > initialization or on-the-fly, and their creation does not directly require > the data > plane to be stopped. > > Please explain what problem are you trying to fix here. I suggest you provide > the sequence diagram and tell us where the problem is. > > > > > > > The application logic is the following: > > > 1. rte_mtr_meter_profile_add() is called to create the meter profile > > > first to define how to classify incoming packets and to assign an > > > appropriate color to them. > > > 2. rte_mtr_meter_policy_add() is invoked to define the fate of a packet, > > > based on its color (practically creating flow rules, matching colors). > > Nope, the policy add does not create any flows. In fact, it does not create > any > meter objects either. It simply defines a configuration pattern that can be > reused many times when meter objects are created afterwards. > > > > 3. rte_mtr_create() is then needed to search (with locks) for previously > > > created profile and policy in order to create the meter object. > > The rte_mtr_create() is not created at the time the flow is created, but at a > prior decoupled moment. I don't see any issue here. > > > > 4. rte_flow_create() is now finally can be used to specify the created > > > meter as an action. > > > > > > This approach doesn't fit into the asynchronous rule creation model > > > and can be improved with the following proposal: > > Again, the creation of meter policies and objects is decoupled from the flow > creation; in fact, the meter policies and objects must be created before the > flows using them are created. > > > > 1. Creating a policy may be replaced with the creation of a group with > > > up to 3 different rules for every color using asynchronous Flow API. > > > That requires the introduction of a new pattern item - meter color. > > > Then creation a flow rule with the meter means a simple jump to a group: > > > rte_flow_async_create(group=1, pattern=color, actions=...); > > > rte_flow_async_create(group=0, pattern=5-tuple, > > > actions=meter,jump group 1); > > > This allows to classify packets and act upon their color classifications. > > > The Meter action assigns a color to a packet and an appropriate action > > > is selected based on the Meter color in group 1. > > > > > The meter objects requires a relatively complex configuration procedure. This > is one of the reasons meters have their own API, so we can keep that > complexity away from the flow API. > > You seem to indicate that your desired behavior is to create the meter objects > when the flow is created rather than in advance. Did I get it correctly? This > is > possible with the current API as well by simply creating the meter object > immediately before the flow gets created. > > Stitching the creation of new meter object to the flow creation (if I > understand > your approach right) d
DPDK 21.11.1 released
Hi all, Here is a new stable release: https://fast.dpdk.org/rel/dpdk-21.11.1.tar.xz The git tree is at: https://dpdk.org/browse/dpdk-stable/?h=21.11 This is the first stable release of 21.11 LTS and contains ~400 fixes. See the release notes for details: http://doc.dpdk.org/guides-21.11/rel_notes/release_21_11.html#fixes Thanks to the authors who helped with backports and to the following who helped with validation: Nvidia, Intel, Canonical and Red Hat. Kevin --- MAINTAINERS| 2 + VERSION| 2 +- app/dumpcap/main.c | 9 +- app/pdump/main.c | 16 +- app/proc-info/main.c | 6 +- app/test-acl/main.c| 6 +- app/test-compress-perf/comp_perf_test_cyclecount.c | 9 +- app/test-compress-perf/comp_perf_test_throughput.c | 2 +- app/test-compress-perf/comp_perf_test_verify.c | 2 +- app/test-compress-perf/main.c | 5 +- app/test-crypto-perf/cperf_test_pmd_cyclecount.c | 2 +- app/test-eventdev/evt_options.c| 2 +- app/test-eventdev/test_order_common.c | 2 +- app/test-fib/main.c| 12 +- app/test-flow-perf/config.h| 2 +- app/test-flow-perf/main.c | 2 +- app/test-pmd/cmd_flex_item.c | 3 +- app/test-pmd/cmdline.c | 18 +- app/test-pmd/cmdline_flow.c| 13 +- app/test-pmd/cmdline_tm.c | 4 +- app/test-pmd/config.c | 22 +- app/test-pmd/csumonly.c| 24 +- app/test-pmd/parameters.c | 2 +- app/test-pmd/testpmd.c | 28 +- app/test-pmd/testpmd.h | 1 + app/test-pmd/txonly.c | 24 +- app/test-regex/main.c | 38 +- app/test/meson.build | 2 +- app/test/test_barrier.c| 2 +- app/test/test_bpf.c| 10 +- app/test/test_compressdev.c| 2 +- app/test/test_cryptodev.c | 13 +- app/test/test_cryptodev_asym.c | 2 +- app/test/test_cryptodev_rsa_test_vectors.h | 2 +- app/test/test_dmadev.c | 8 +- app/test/test_efd.c| 2 +- app/test/test_fib_perf.c | 2 +- app/test/test_kni.c| 4 +- app/test/test_kvargs.c | 16 +- app/test/test_link_bonding.c | 4 + app/test/test_link_bonding_rssconf.c | 4 + app/test/test_lpm6_data.h | 2 +- app/test/test_mbuf.c | 4 - app/test/test_member.c | 2 +- app/test/test_memory.c | 2 +- app/test/test_mempool.c| 4 +- app/test/test_memzone.c| 6 +- app/test/test_metrics.c| 2 +- app/test/test_pcapng.c | 2 +- app/test/test_power_cpufreq.c | 2 +- app/test/test_rcu_qsbr.c | 4 +- app/test/test_red.c| 8 +- app/test/test_security.c | 2 +- app/test/test_table_pipeline.c | 2 +- app/test/test_thash.c | 2 +- buildtools/binutils-avx512-check.py| 4 +- buildtools/call-sphinx-build.py| 4 +- buildtools/meson.build | 5 +- config/arm/meson.build | 10 +- config/meson.build | 5 +- config/x86/meson.build | 2 +- devtools/check-abi.sh | 4 - devtools/check-forbidden-tokens.awk| 3 + devtools/check-symbol-change.sh| 6 +- devtools/check-symbol-maps.sh | 7 + devtools/libabigail.abignore | 20 + doc/api/generate_examples.sh | 14 +- doc/api/meson.build| 10 +- doc/guides/compressdevs/mlx5.rst | 6 +- doc/guides/conf.py | 6 +- doc/guides/cryptodevs/mlx5.rst | 6 +- doc/guides/dmadevs/hisilicon.rst | 4 +- doc/guides/dmadevs/idxd.rst| 29 +- doc/guides/eventdevs/dlb2.rst | 1
RE: [PATCH] app/test: fix buffer overflow in table unit tests
> -Original Message- > From: Medvedkin, Vladimir > Sent: Thursday, April 21, 2022 6:35 PM > To: dev@dpdk.org > Cc: Dumitrescu, Cristian ; sta...@dpdk.org > Subject: [PATCH] app/test: fix buffer overflow in table unit tests > > This patch fixes stack buffer overflow reported by ASAN. > > Bugzilla ID: 820 > Fixes: 5205954791cb ("app/test: packet framework unit tests") > Cc: cristian.dumitre...@intel.com > Cc: sta...@dpdk.org > > Signed-off-by: Vladimir Medvedkin > --- Acked-by: Cristian Dumitrescu
Re: [PATCH 2/3] mem: fix ASan shadow for remapped memory segments
On Tue, Apr 26, 2022 at 2:54 PM Burakov, Anatoly wrote: > >> @@ -1040,9 +1040,25 @@ malloc_heap_free(struct malloc_elem *elem) > >> > >> rte_mcfg_mem_write_unlock(); > >> free_unlock: > >> - /* Poison memory range if belonging to some still mapped > >> pages. */ > >> - if (!unmapped_pages) > >> + if (!unmapped_pages) { > >> asan_set_freezone(asan_ptr, asan_data_len); > >> + } else { > >> + /* > >> +* We may be in a situation where we unmapped pages > >> like this: > >> +* malloc header | free space | unmapped space | free > >> space | malloc header > >> +*/ > >> + void *free1_start = asan_ptr; > >> + void *free1_end = aligned_start; > >> + void *free2_start = RTE_PTR_ADD(aligned_start, > >> aligned_len); > >> + void *free2_end = RTE_PTR_ADD(asan_ptr, asan_data_len); > >> + > >> + if (free1_start < free1_end) > >> + asan_set_freezone(free1_start, > >> + RTE_PTR_DIFF(free1_end, free1_start)); > >> + if (free2_start < free2_end) > >> + asan_set_freezone(free2_start, > >> + RTE_PTR_DIFF(free2_end, free2_start)); > >> + } > >> > >> rte_spinlock_unlock(&(heap->lock)); > >> return ret; > >> > > > > Something like that, yes. I will have to think through this a bit more, > > especially in light of your func_reentrancy splat :) > > > > So, the reason splat in func_reentrancy test happens is as follows: the > above patch is sorta correct (i have a different one but does the same > thing), but incomplete. What happens then is when we add new memory, we > are integrating it into our existing malloc heap, which triggers > `malloc_elem_join_adjacent_free()` which will trigger a write into old > header space being merged, which may be marked as "freed". So, again we > are hit with our internal allocator messing with ASan. I ended up with the same conclusion. Thanks for confirming. > > To properly fix this is to answer the following question: what is the > goal of having ASan support in DPDK? Is it there to catch bugs *in the > allocator*, or can we just trust that our allocator code is correct, and > only concern ourselves with user-allocated areas of the code? Because it The best would be to handle both. I don't think clang disables ASan for the instrumentations on malloc. > seems like the best way to address this issue would be to just avoid > triggering ASan checks for certain allocator-internal actions: this way, > we don't need to care what allocator itself does, just what user code > does. As in, IIRC there was a compiler attribute that disables ASan > checks for a specific function: perhaps we could just wrap certain > access in that and be done with it? > > What do you think? It is tempting because it is the easiest way to avoid the issue. Though, by waiving those checks in the allocator, does it leave the ASan shadow in a consistent state? -- David Marchand
Re: [PATCH 1/2] rib: mark error checks with unlikely
On 13/04/2022 03:09, Stephen Hemminger wrote: Also mark some conditional functions as const. Signed-off-by: Stephen Hemminger --- lib/rib/rte_rib.c | 26 +- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/lib/rib/rte_rib.c b/lib/rib/rte_rib.c index cd9e823068d2..2a3de5065a31 100644 --- a/lib/rib/rte_rib.c +++ b/lib/rib/rte_rib.c @@ -48,13 +48,13 @@ struct rte_rib { }; static inline bool -is_valid_node(struct rte_rib_node *node) +is_valid_node(const struct rte_rib_node *node) { return (node->flag & RTE_RIB_VALID_NODE) == RTE_RIB_VALID_NODE; } static inline bool -is_right_node(struct rte_rib_node *node) +is_right_node(const struct rte_rib_node *node) { return node->parent->right == node; } @@ -99,7 +99,7 @@ rte_rib_lookup(struct rte_rib *rib, uint32_t ip) { struct rte_rib_node *cur, *prev = NULL; - if (rib == NULL) { + if (unlikely(rib == NULL)) { rte_errno = EINVAL; return NULL; } @@ -147,7 +147,7 @@ __rib_lookup_exact(struct rte_rib *rib, uint32_t ip, uint8_t depth) struct rte_rib_node * rte_rib_lookup_exact(struct rte_rib *rib, uint32_t ip, uint8_t depth) { - if ((rib == NULL) || (depth > RIB_MAXDEPTH)) { + if (unlikely(rib == NULL || depth > RIB_MAXDEPTH)) { rte_errno = EINVAL; return NULL; } @@ -167,7 +167,7 @@ rte_rib_get_nxt(struct rte_rib *rib, uint32_t ip, { struct rte_rib_node *tmp, *prev = NULL; - if ((rib == NULL) || (depth > RIB_MAXDEPTH)) { + if (unlikely(rib == NULL || depth > RIB_MAXDEPTH)) { rte_errno = EINVAL; return NULL; } @@ -244,7 +244,7 @@ rte_rib_insert(struct rte_rib *rib, uint32_t ip, uint8_t depth) uint32_t common_prefix; uint8_t common_depth; - if ((rib == NULL) || (depth > RIB_MAXDEPTH)) { + if (unlikely(rib == NULL || depth > RIB_MAXDEPTH)) { rte_errno = EINVAL; return NULL; } @@ -342,7 +342,7 @@ rte_rib_insert(struct rte_rib *rib, uint32_t ip, uint8_t depth) int rte_rib_get_ip(const struct rte_rib_node *node, uint32_t *ip) { - if ((node == NULL) || (ip == NULL)) { + if (unlikely(node == NULL || ip == NULL)) { rte_errno = EINVAL; return -1; } @@ -353,7 +353,7 @@ rte_rib_get_ip(const struct rte_rib_node *node, uint32_t *ip) int rte_rib_get_depth(const struct rte_rib_node *node, uint8_t *depth) { - if ((node == NULL) || (depth == NULL)) { + if (unlikely(node == NULL || depth == NULL)) { rte_errno = EINVAL; return -1; } @@ -370,7 +370,7 @@ rte_rib_get_ext(struct rte_rib_node *node) int rte_rib_get_nh(const struct rte_rib_node *node, uint64_t *nh) { - if ((node == NULL) || (nh == NULL)) { + if (unlikely(node == NULL || nh == NULL)) { rte_errno = EINVAL; return -1; } @@ -381,7 +381,7 @@ rte_rib_get_nh(const struct rte_rib_node *node, uint64_t *nh) int rte_rib_set_nh(struct rte_rib_node *node, uint64_t nh) { - if (node == NULL) { + if (unlikely(node == NULL)) { rte_errno = EINVAL; return -1; } @@ -399,7 +399,7 @@ rte_rib_create(const char *name, int socket_id, const struct rte_rib_conf *conf) struct rte_mempool *node_pool; /* Check user arguments. */ - if (name == NULL || conf == NULL || conf->max_nodes <= 0) { + if (unlikely(name == NULL || conf == NULL || conf->max_nodes <= 0)) { rte_errno = EINVAL; return NULL; } @@ -434,7 +434,7 @@ rte_rib_create(const char *name, int socket_id, const struct rte_rib_conf *conf) /* allocate tailq entry */ te = rte_zmalloc("RIB_TAILQ_ENTRY", sizeof(*te), 0); - if (te == NULL) { + if (unlikely(te == NULL)) { RTE_LOG(ERR, LPM, "Can not allocate tailq entry for RIB %s\n", name); rte_errno = ENOMEM; @@ -444,7 +444,7 @@ rte_rib_create(const char *name, int socket_id, const struct rte_rib_conf *conf) /* Allocate memory to store the RIB data structures. */ rib = rte_zmalloc_socket(mem_name, sizeof(struct rte_rib), RTE_CACHE_LINE_SIZE, socket_id); - if (rib == NULL) { + if (unlikely(rib == NULL)) { RTE_LOG(ERR, LPM, "RIB %s memory allocation failed\n", name); rte_errno = ENOMEM; goto free_te; Acked-by: Vladimir Medvedkin -- Regards, Vladimir
Re: [PATCH 2/2] rib6: mark error tests with unlikely
On 13/04/2022 03:09, Stephen Hemminger wrote: Also mark some conditional functions as const. Signed-off-by: Stephen Hemminger --- lib/rib/rte_rib6.c | 25 - 1 file changed, 12 insertions(+), 13 deletions(-) diff --git a/lib/rib/rte_rib6.c b/lib/rib/rte_rib6.c index 042ac1f090bf..650bf1b8f681 100644 --- a/lib/rib/rte_rib6.c +++ b/lib/rib/rte_rib6.c @@ -47,13 +47,13 @@ struct rte_rib6 { }; static inline bool -is_valid_node(struct rte_rib6_node *node) +is_valid_node(const struct rte_rib6_node *node) { return (node->flag & RTE_RIB_VALID_NODE) == RTE_RIB_VALID_NODE; } static inline bool -is_right_node(struct rte_rib6_node *node) +is_right_node(const struct rte_rib6_node *node) { return node->parent->right == node; } @@ -171,7 +171,7 @@ rte_rib6_lookup_exact(struct rte_rib6 *rib, uint8_t tmp_ip[RTE_RIB6_IPV6_ADDR_SIZE]; int i; - if ((rib == NULL) || (ip == NULL) || (depth > RIB6_MAXDEPTH)) { + if (unlikely(rib == NULL || ip == NULL || depth > RIB6_MAXDEPTH)) { rte_errno = EINVAL; return NULL; } @@ -210,7 +210,7 @@ rte_rib6_get_nxt(struct rte_rib6 *rib, uint8_t tmp_ip[RTE_RIB6_IPV6_ADDR_SIZE]; int i; - if ((rib == NULL) || (ip == NULL) || (depth > RIB6_MAXDEPTH)) { + if (unlikely(rib == NULL || ip == NULL || depth > RIB6_MAXDEPTH)) { rte_errno = EINVAL; return NULL; } @@ -293,8 +293,7 @@ rte_rib6_insert(struct rte_rib6 *rib, int i, d; uint8_t common_depth, ip_xor; - if (unlikely((rib == NULL) || (ip == NULL) || - (depth > RIB6_MAXDEPTH))) { + if (unlikely((rib == NULL || ip == NULL || depth > RIB6_MAXDEPTH))) { rte_errno = EINVAL; return NULL; } @@ -413,7 +412,7 @@ int rte_rib6_get_ip(const struct rte_rib6_node *node, uint8_t ip[RTE_RIB6_IPV6_ADDR_SIZE]) { - if ((node == NULL) || (ip == NULL)) { + if (unlikely(node == NULL || ip == NULL)) { rte_errno = EINVAL; return -1; } @@ -424,7 +423,7 @@ rte_rib6_get_ip(const struct rte_rib6_node *node, int rte_rib6_get_depth(const struct rte_rib6_node *node, uint8_t *depth) { - if ((node == NULL) || (depth == NULL)) { + if (unlikely(node == NULL || depth == NULL)) { rte_errno = EINVAL; return -1; } @@ -441,7 +440,7 @@ rte_rib6_get_ext(struct rte_rib6_node *node) int rte_rib6_get_nh(const struct rte_rib6_node *node, uint64_t *nh) { - if ((node == NULL) || (nh == NULL)) { + if (unlikely(node == NULL || nh == NULL)) { rte_errno = EINVAL; return -1; } @@ -452,7 +451,7 @@ rte_rib6_get_nh(const struct rte_rib6_node *node, uint64_t *nh) int rte_rib6_set_nh(struct rte_rib6_node *node, uint64_t nh) { - if (node == NULL) { + if (unlikely(node == NULL)) { rte_errno = EINVAL; return -1; } @@ -471,7 +470,7 @@ rte_rib6_create(const char *name, int socket_id, struct rte_mempool *node_pool; /* Check user arguments. */ - if (name == NULL || conf == NULL || conf->max_nodes <= 0) { + if (unlikely(name == NULL || conf == NULL || conf->max_nodes <= 0)) { rte_errno = EINVAL; return NULL; } @@ -506,7 +505,7 @@ rte_rib6_create(const char *name, int socket_id, /* allocate tailq entry */ te = rte_zmalloc("RIB6_TAILQ_ENTRY", sizeof(*te), 0); - if (te == NULL) { + if (unlikely(te == NULL)) { RTE_LOG(ERR, LPM, "Can not allocate tailq entry for RIB6 %s\n", name); rte_errno = ENOMEM; @@ -516,7 +515,7 @@ rte_rib6_create(const char *name, int socket_id, /* Allocate memory to store the RIB6 data structures. */ rib = rte_zmalloc_socket(mem_name, sizeof(struct rte_rib6), RTE_CACHE_LINE_SIZE, socket_id); - if (rib == NULL) { + if (unlikely(rib == NULL)) { RTE_LOG(ERR, LPM, "RIB6 %s memory allocation failed\n", name); rte_errno = ENOMEM; goto free_te; Acked-by: Vladimir Medvedkin -- Regards, Vladimir
Re: [PATCH] rib: fix traversal with /32 route
+Cc:sta...@dpdk.org On 14/04/2022 21:01, Stephen Hemminger wrote: If a /32 route is entered in the RIB the code to traverse will not see that a a end of the tree. This is due to trying to do a negative shift which is an undefined in C. Fix by checking for max depth as is already done in rib6. Signed-off-by: Stephen Hemminger --- lib/rib/rte_rib.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/lib/rib/rte_rib.c b/lib/rib/rte_rib.c index cd9e823068d2..0603980cabd2 100644 --- a/lib/rib/rte_rib.c +++ b/lib/rib/rte_rib.c @@ -71,6 +71,8 @@ is_covered(uint32_t ip1, uint32_t ip2, uint8_t depth) static inline struct rte_rib_node * get_nxt_node(struct rte_rib_node *node, uint32_t ip) { + if (node->depth == RIB_MAXDEPTH) + return NULL; return (ip & (1 << (31 - node->depth))) ? node->right : node->left; } Acked-by: Vladimir Medvedkin -- Regards, Vladimir
Re: [PATCH] rib: fix traversal with /32 route
Fixes: 5a5793a5ffa2 ("rib: add RIB library") On 26/04/2022 15:28, Medvedkin, Vladimir wrote: +Cc:sta...@dpdk.org On 14/04/2022 21:01, Stephen Hemminger wrote: If a /32 route is entered in the RIB the code to traverse will not see that a a end of the tree. This is due to trying to do a negative shift which is an undefined in C. Fix by checking for max depth as is already done in rib6. Signed-off-by: Stephen Hemminger --- lib/rib/rte_rib.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/lib/rib/rte_rib.c b/lib/rib/rte_rib.c index cd9e823068d2..0603980cabd2 100644 --- a/lib/rib/rte_rib.c +++ b/lib/rib/rte_rib.c @@ -71,6 +71,8 @@ is_covered(uint32_t ip1, uint32_t ip2, uint8_t depth) static inline struct rte_rib_node * get_nxt_node(struct rte_rib_node *node, uint32_t ip) { + if (node->depth == RIB_MAXDEPTH) + return NULL; return (ip & (1 << (31 - node->depth))) ? node->right : node->left; } Acked-by: Vladimir Medvedkin -- Regards, Vladimir
Re: [EXT] Re: [PATCH v3 0/5] Add JSON vector set support to fips validation
Hi Gowrishankar, I apologize for the late response. I have not worked on the AES-CBC implementation, so you are free to go ahead. Please let me know if you run into any issues that I can help with. Thanks, Brandon On Thu, Apr 21, 2022 at 4:02 AM Gowrishankar Muthukrishnan wrote: > > Hi Brandon, > Following some cleanup patches I have posted against examples/fips, I would > like to take enabling AES_CBC in fips validation. > Please let me know if you/anyone have already have WIP for the same, before I > proceed. > > Thanks, > Gowrishankar > > > -Original Message- > > From: Brandon Lo > > Sent: Thursday, April 14, 2022 7:12 PM > > To: dev ; Zhang, Roy Fan ; > > Power, Ciara > > Subject: [EXT] Re: [PATCH v3 0/5] Add JSON vector set support to fips > > validation > > > > External Email > > > > -- > > Adding the dev mailing list back into this discussion. > > > > On Wed, Apr 13, 2022 at 9:13 AM Brandon Lo wrote: > > > > > > Hi guys, > > > > > > Lincoln and I would like to know if we can get this patch set looked > > > at and merged before submitting the rest of the algorithms. So far, > > > I've worked on implementing the HMAC and CMAC tests, but I keep > > > getting pulled away by some requests from the community. This patchset > > > does not seem to break backward compatibility, so merging it will only > > > lead to more coverage from the UNH lab. It may also be easier to > > > review since it isn't going to be one huge patchset that needs to be > > > looked at in the future. > > > > > > On Thu, Feb 17, 2022 at 7:47 AM Brandon Lo wrote: > > > > > > > > On Fri, Feb 11, 2022 at 9:16 AM Brandon Lo wrote: > > > > > I only have the AES-GCM algorithm implemented because the current > > > > > implementations of the other algorithms require some extra > > > > > information than what comes with the JSON format in the API. > > > > > For example, I couldn't find the JSON counterpart for things like > > > > > fips_validation_sha.c's "MD =" or "Seed =" as well as > > > > > fips_validation_ccm.c's extra test types like CCM-DVPT, CCM-VADT, > > etc. > > > > > just to name a few. > > > > > This could very well be due to my inexperience with the FIPS > > > > > validation, and I definitely plan to take another look at it again. > > > > > > > > > > My assumption is that the JSON version of FIPS validation files > > > > > isn't used as much as the old CAVP format, so I am more aiming > > > > > towards getting something working in the lab first and then > > > > > expanding on it later. > > > > > > > > Hi all, > > > > > > > > Could I get someone to look at this patch set? > > > > The UNH lab is ready to deploy FIPS testing on patches that affect > > > > the crypto portion of DPDK. > > > > > > > > Thanks, > > > > Brandon > > > > > > > > > > > > -- > > > > Brandon Lo > > > > UNH InterOperability Laboratory > > > > 21 Madbury Rd, Suite 100, Durham, NH 03824 b...@iol.unh.edu > > > > https://urldefense.proofpoint.com/v2/url?u=http- > > 3A__www.iol.unh.edu& > > > > d=DwIBaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=EAtr- > > g7yUFhtOio8r2Rtm13Aqe4WVp_S > > > > > > _gHpcu6KFVo&m=35t4n1T3FnlAkNla3EmGLgWSAhIknbuvLgguNAXKjN0xCMs > > cV7HXyJ > > > > > > 95BftFMJJJ&s=GVCZy3E9sE9H23TSCEcLyQoT4zxNQ4pyameEW76PZno&e= > > > > > > > > > > > > -- > > > Brandon Lo > > > UNH InterOperability Laboratory > > > 21 Madbury Rd, Suite 100, Durham, NH 03824 b...@iol.unh.edu > > > https://urldefense.proofpoint.com/v2/url?u=http- > > 3A__www.iol.unh.edu&d= > > > DwIBaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=EAtr- > > g7yUFhtOio8r2Rtm13Aqe4WVp_S_gHp > > > > > cu6KFVo&m=35t4n1T3FnlAkNla3EmGLgWSAhIknbuvLgguNAXKjN0xCMscV7H > > XyJ95BftF > > > MJJJ&s=GVCZy3E9sE9H23TSCEcLyQoT4zxNQ4pyameEW76PZno&e= > > > > > > > > -- > > Brandon Lo > > UNH InterOperability Laboratory > > 21 Madbury Rd, Suite 100, Durham, NH 03824 b...@iol.unh.edu > > https://urldefense.proofpoint.com/v2/url?u=http- > > 3A__www.iol.unh.edu&d=DwIBaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=EAtr- > > g7yUFhtOio8r2Rtm13Aqe4WVp_S_gHpcu6KFVo&m=35t4n1T3FnlAkNla3EmG > > LgWSAhIknbuvLgguNAXKjN0xCMscV7HXyJ95BftFMJJJ&s=GVCZy3E9sE9H23T > > SCEcLyQoT4zxNQ4pyameEW76PZno&e= -- Brandon Lo UNH InterOperability Laboratory 21 Madbury Rd, Suite 100, Durham, NH 03824 b...@iol.unh.edu www.iol.unh.edu
Re: Reuse Of lcore after returning from its worker thread
On Wed, 20 Apr 2022 17:52:20 +0530 Ansar Kannankattil wrote: > Hi, > As per my understanding "*rte_eal_wait_lcore" *is a blocking call in case > of lcore state running. > 1. Is there any direct way to reuse the lcore which we returned from a > worker thread? > 2. Technically is there any issue in reusing the lcore by some means? Yes just relaunch with new work function.
Re: [RFC] eal: allow worker lcore stacks to be allocated from hugepage memory
On Tue, 26 Apr 2022 08:19:59 -0400 Don Wallwork wrote: > Add support for using hugepages for worker lcore stack memory. The > intent is to improve performance by reducing stack memory related TLB > misses and also by using memory local to the NUMA node of each lcore. > > Platforms desiring to make use of this capability must enable the > associated option flag and stack size settings in platform config > files. > --- > lib/eal/linux/eal.c | 39 +++ > 1 file changed, 39 insertions(+) > Good idea but having a fixed size stack makes writing complex application more difficult. Plus you lose the safety of guard pages.
Re: [PATCH] ci: do not dump error logs in GHA containers
David Marchand writes: > On Tue, Apr 26, 2022 at 9:09 AM David Marchand > wrote: >> >> On error, the build logs are displayed in GHA console and logs unless >> the GITHUB_WORKFLOW env variable is set. >> However, containers in GHA do not automatically inherit this variable. >> We could pass this variable in the container environment, but in the >> end, dumping those logs is only for Travis which we don't really care >> about anymore. >> >> Let's make the linux-build.sh more generic and dump logs from Travis >> yaml itself. >> >> Fixes: b35c4b0aa2bc ("ci: add Fedora 35 container in GHA") >> >> Signed-off-by: David Marchand > > TBH, I did not test Travis by lack of interest (plus I don't want to > be bothered with their ui / credit stuff). > We could consider dropping Travis in the near future. > > Opinions? I think it makes sense. We haven't had travis reports in a while because their credit system made it impossible to use. We had kept it around for users of travis, but at this point, I think most people have migrated to GHA.
Re: [PATCH 1/2] ci: switch to Ubuntu 20.04
David Marchand writes: > Ubuntu 18.04 is now rather old. > Besides, other entities in our CI are also testing this distribution. > > Switch to a newer Ubuntu release and benefit from more recent > tool(chain)s: for example, net/cnxk now builds fine and can be > re-enabled. > > Signed-off-by: David Marchand > --- LGTM Acked-by: Aaron Conole
Re: [PATCH 2/2] ci: add mingw cross compilation in GHA
David Marchand writes: > Add mingw cross compilation in our public CI so that users with their > own github repository have a first level of checks for Windows compilation > before submitting to the mailing list. > This does not replace our better checks in other entities of the CI. > > Only the helloworld example is compiled (same as what is tested in > test-meson-builds.sh). > > Note: the mingw cross compilation toolchain (version 5.0) in Ubuntu > 18.04 was broken (missing a ENOMSG definition). > > Signed-off-by: David Marchand > --- Acked-by: Aaron Conole
Re: [PATCH 2/3] mem: fix ASan shadow for remapped memory segments
On 26-Apr-22 3:15 PM, David Marchand wrote: On Tue, Apr 26, 2022 at 2:54 PM Burakov, Anatoly wrote: @@ -1040,9 +1040,25 @@ malloc_heap_free(struct malloc_elem *elem) rte_mcfg_mem_write_unlock(); free_unlock: - /* Poison memory range if belonging to some still mapped pages. */ - if (!unmapped_pages) + if (!unmapped_pages) { asan_set_freezone(asan_ptr, asan_data_len); + } else { + /* +* We may be in a situation where we unmapped pages like this: +* malloc header | free space | unmapped space | free space | malloc header +*/ + void *free1_start = asan_ptr; + void *free1_end = aligned_start; + void *free2_start = RTE_PTR_ADD(aligned_start, aligned_len); + void *free2_end = RTE_PTR_ADD(asan_ptr, asan_data_len); + + if (free1_start < free1_end) + asan_set_freezone(free1_start, + RTE_PTR_DIFF(free1_end, free1_start)); + if (free2_start < free2_end) + asan_set_freezone(free2_start, + RTE_PTR_DIFF(free2_end, free2_start)); + } rte_spinlock_unlock(&(heap->lock)); return ret; Something like that, yes. I will have to think through this a bit more, especially in light of your func_reentrancy splat :) So, the reason splat in func_reentrancy test happens is as follows: the above patch is sorta correct (i have a different one but does the same thing), but incomplete. What happens then is when we add new memory, we are integrating it into our existing malloc heap, which triggers `malloc_elem_join_adjacent_free()` which will trigger a write into old header space being merged, which may be marked as "freed". So, again we are hit with our internal allocator messing with ASan. I ended up with the same conclusion. Thanks for confirming. To properly fix this is to answer the following question: what is the goal of having ASan support in DPDK? Is it there to catch bugs *in the allocator*, or can we just trust that our allocator code is correct, and only concern ourselves with user-allocated areas of the code? Because it The best would be to handle both. I don't think clang disables ASan for the instrumentations on malloc. I've actually prototyped these changes a bit. We use memset in a few places, and that one can't be disabled as far as i can tell (not without blacklisting memset for entire DPDK). seems like the best way to address this issue would be to just avoid triggering ASan checks for certain allocator-internal actions: this way, we don't need to care what allocator itself does, just what user code does. As in, IIRC there was a compiler attribute that disables ASan checks for a specific function: perhaps we could just wrap certain access in that and be done with it? What do you think? It is tempting because it is the easiest way to avoid the issue. Though, by waiving those checks in the allocator, does it leave the ASan shadow in a consistent state? The "consistent state" is kinda difficult to achieve because there is no "default" state for memory - sometimes it comes as available (0x00), sometimes it is marked as already freed (0xFF). So, coming into a malloc function, we don't know whether the memory we're about to mess with is 0x00 or 0xFF. What we could do is mark every malloc header with 0xFF regardless of its status, and leave the rest to "regular" zoning. This would be strange from ASan's point of view (because we're marking memory as "freed" when it wasn't ever allocated), but at least this would be consistent :D -- Thanks, Anatoly
Re: [PATCH] net/pcap: support MTU set
On 3/22/2022 1:02 PM, Ido Goshen wrote: This test https://doc.dpdk.org/dts/test_plans/jumboframes_test_plan.html#test-case-jumbo-frames-with-no-jumbo-frame-support fails for pcap pmd Jumbo packet is unexpectedly received and transmitted Hi Ido, Yes, pcap ignores MTU, but I don't see why it should use MTU (except from making above DTS test pass). For the cases packets written to .pcap file or read from a .pcap file, most probably user is interested in all packets, I don't think using MTU to filter the packets is a good idea, missing packets (because of MTU) can confuse users. Unless there is a good use case, I am for rejecting this feature. without patch: root@u18c_3nbp:/home/cgs/workspace/master/jumbo# ./dpdk-testpmd --no-huge -m1024 -l 0-2 --vdev='net_pcap0,rx_pcap=rx_pcap=jumbo_9000.pcap,tx_pcap=file_tx.pcap' -- --no-flush-rx --total-num-mbufs=2048 -i ... testpmd> start ... testpmd> show port stats 0 NIC statistics for port 0 RX-packets: 1 RX-missed: 0 RX-bytes: 8996 RX-errors: 0 RX-nombuf: 0 TX-packets: 1 TX-errors: 0 TX-bytes: 8996 Throughput (since last show) Rx-pps:0 Rx-bps:0 Tx-pps:0 Tx-bps:0 While with the patch it will fail unless --max-pkt-len is used to support jumbo root@u18c_3nbp:/home/cgs/workspace/master/jumbo# ./dpdk-testpmd-patch --no-huge -m1024 -l 0-2 --vdev='net_pcap0,rx_pcap=rx_pcap=jumbo_9000.pcap,tx_pcap=file_tx.pcap' -- --no-flush-rx --total-num-mbufs=2048 -i ... testpmd> start ... testpmd> show port stats 0 NIC statistics for port 0 RX-packets: 0 RX-missed: 0 RX-bytes: 0 RX-errors: 1 RX-nombuf: 0 TX-packets: 0 TX-errors: 0 TX-bytes: 0 Throughput (since last show) Rx-pps:0 Rx-bps:0 Tx-pps:0 Tx-bps:0 root@u18c_3nbp:/home/cgs/workspace/master/jumbo# ./dpdk-testpmd-patch --no-huge -m1024 -l 0-2 --vdev='net_pcap0,rx_pcap=rx_pcap=jumbo_9000.pcap,tx_pcap=file_tx.pcap' -- --no-flush-rx --total-num-mbufs=2048 -i --max-pkt-len 9400 ... testpmd> start ... testpmd> show port stats 0 NIC statistics for port 0 RX-packets: 1 RX-missed: 0 RX-bytes: 8996 RX-errors: 0 RX-nombuf: 0 TX-packets: 1 TX-errors: 0 TX-bytes: 8996 Throughput (since last show) Rx-pps:0 Rx-bps:0 Tx-pps:0 Tx-bps:0 -Original Message- From: Ido Goshen Sent: Thursday, 17 March 2022 21:12 To: Stephen Hemminger Cc: Ferruh Yigit ; dev@dpdk.org Subject: RE: [PATCH] net/pcap: support MTU set As far as I can see the initial device MTU is derived from port *RX* configuration in struct rte_eth_rxmode https://doc.dpdk.org/api- 21.11/structrte__eth__rxmode.html Couple of real NICs I've tested (ixgbe, i40e based) don't allow oversized, tests details can be seen in https://bugs.dpdk.org/show_bug.cgi?id=961 -Original Message- From: Stephen Hemminger Sent: Thursday, 17 March 2022 20:21 To: Ido Goshen Cc: Ferruh Yigit ; dev@dpdk.org Subject: Re: [PATCH] net/pcap: support MTU set On Thu, 17 Mar 2022 19:43:47 +0200 ido g wrote: + if (unlikely(header.caplen > dev->data->mtu)) { + pcap_q->rx_stat.err_pkts++; + rte_pktmbuf_free(mbuf); + break; + } MTU should only be enforced on transmit. Other real network devices allow oversized packets. Since the pcap file is something user provides, if you don't want that then use something to filter the file.
[PATCH 0/2] ACL fix 8B field
Fix problem with 8B fields and extend test-acl test coverage. Konstantin Ananyev (2): acl: fix rules with 8 bytes field size are broken app/acl: support different formats for IPv6 address app/test-acl/main.c | 355 ++-- lib/acl/acl_bld.c | 14 +- 2 files changed, 286 insertions(+), 83 deletions(-) -- 2.34.1
[PATCH 2/2] app/acl: support different formats for IPv6 address
Within ACL rule IPv6 address can be represented in different ways: either as 4x4B fields, or as 2x8B fields. Till now, only first format was supported. Extend test-acl to support both formats, mainly for testing and demonstrating purposes. To control desired behavior '--ipv6' command-line option is extend to accept an optional argument: To be more precise: '--ipv6'- use 4x4B fields format (default behavior) '--ipv6=4B' - use 4x4B fields format (default behavior) '--ipv6=8B' - use 2x8B fields format app/acl: use posix functions for network address parsing Also replaced home brewed IPv4/IPv6 address parsing with inet_pton() calls. Signed-off-by: Konstantin Ananyev --- app/test-acl/main.c | 355 ++-- 1 file changed, 276 insertions(+), 79 deletions(-) diff --git a/app/test-acl/main.c b/app/test-acl/main.c index 06e3847ab9..41ce83db08 100644 --- a/app/test-acl/main.c +++ b/app/test-acl/main.c @@ -57,6 +57,12 @@ enum { DUMP_MAX }; +enum { + IPV6_FRMT_NONE, + IPV6_FRMT_U32, + IPV6_FRMT_U64, +}; + struct acl_alg { const char *name; enum rte_acl_classify_alg alg; @@ -123,7 +129,7 @@ static struct { .name = "default", .alg = RTE_ACL_CLASSIFY_DEFAULT, }, - .ipv6 = 0 + .ipv6 = IPV6_FRMT_NONE, }; static struct rte_acl_param prm = { @@ -210,6 +216,7 @@ struct rte_acl_field_def ipv4_defs[NUM_FIELDS_IPV4] = { #defineIPV6_ADDR_LEN 16 #defineIPV6_ADDR_U16 (IPV6_ADDR_LEN / sizeof(uint16_t)) #defineIPV6_ADDR_U32 (IPV6_ADDR_LEN / sizeof(uint32_t)) +#defineIPV6_ADDR_U64 (IPV6_ADDR_LEN / sizeof(uint64_t)) struct ipv6_5tuple { uint8_t proto; @@ -219,6 +226,7 @@ struct ipv6_5tuple { uint16_t port_dst; }; +/* treat IPV6 address as uint32_t[4] (default mode) */ enum { PROTO_FIELD_IPV6, SRC1_FIELD_IPV6, @@ -234,6 +242,27 @@ enum { NUM_FIELDS_IPV6 }; +/* treat IPV6 address as uint64_t[2] (default mode) */ +enum { + PROTO_FIELD_IPV6_U64, + SRC1_FIELD_IPV6_U64, + SRC2_FIELD_IPV6_U64, + DST1_FIELD_IPV6_U64, + DST2_FIELD_IPV6_U64, + SRCP_FIELD_IPV6_U64, + DSTP_FIELD_IPV6_U64, + NUM_FIELDS_IPV6_U64 +}; + +enum { + PROTO_INDEX_IPV6_U64 = PROTO_FIELD_IPV6_U64, + SRC1_INDEX_IPV6_U64 = SRC1_FIELD_IPV6_U64, + SRC2_INDEX_IPV6_U64 = SRC2_FIELD_IPV6_U64 + 1, + DST1_INDEX_IPV6_U64 = DST1_FIELD_IPV6_U64 + 2, + DST2_INDEX_IPV6_U64 = DST2_FIELD_IPV6_U64 + 3, + PRT_INDEX_IPV6_U64 = SRCP_FIELD_IPV6 + 4, +}; + struct rte_acl_field_def ipv6_defs[NUM_FIELDS_IPV6] = { { .type = RTE_ACL_FIELD_TYPE_BITMASK, @@ -314,6 +343,57 @@ struct rte_acl_field_def ipv6_defs[NUM_FIELDS_IPV6] = { }, }; +struct rte_acl_field_def ipv6_u64_defs[NUM_FIELDS_IPV6_U64] = { + { + .type = RTE_ACL_FIELD_TYPE_BITMASK, + .size = sizeof(uint8_t), + .field_index = PROTO_FIELD_IPV6_U64, + .input_index = PROTO_FIELD_IPV6_U64, + .offset = offsetof(struct ipv6_5tuple, proto), + }, + { + .type = RTE_ACL_FIELD_TYPE_MASK, + .size = sizeof(uint64_t), + .field_index = SRC1_FIELD_IPV6_U64, + .input_index = SRC1_INDEX_IPV6_U64, + .offset = offsetof(struct ipv6_5tuple, ip_src[0]), + }, + { + .type = RTE_ACL_FIELD_TYPE_MASK, + .size = sizeof(uint64_t), + .field_index = SRC2_FIELD_IPV6_U64, + .input_index = SRC2_INDEX_IPV6_U64, + .offset = offsetof(struct ipv6_5tuple, ip_src[2]), + }, + { + .type = RTE_ACL_FIELD_TYPE_MASK, + .size = sizeof(uint64_t), + .field_index = DST1_FIELD_IPV6_U64, + .input_index = DST1_INDEX_IPV6_U64, + .offset = offsetof(struct ipv6_5tuple, ip_dst[0]), + }, + { + .type = RTE_ACL_FIELD_TYPE_MASK, + .size = sizeof(uint64_t), + .field_index = DST2_FIELD_IPV6_U64, + .input_index = DST2_INDEX_IPV6_U64, + .offset = offsetof(struct ipv6_5tuple, ip_dst[2]), + }, + { + .type = RTE_ACL_FIELD_TYPE_RANGE, + .size = sizeof(uint16_t), + .field_index = SRCP_FIELD_IPV6_U64, + .input_index = PRT_INDEX_IPV6_U64, + .offset = offsetof(struct ipv6_5tuple, port_src), + }, + { + .type = RTE_ACL_FIELD_TYPE_RANGE, + .size = sizeof(uint16_t), + .field_index = DSTP_FIELD_IPV6_U64, + .input_index = PRT_INDEX_IPV6_U64, + .offset = offsetof(struct ipv6_5tuple, port_dst), + }, +}; enum { CB_FLD_SRC_ADDR, @@ -385,49 +465,11 @@ pa
[PATCH 1/2] acl: fix rules with 8 bytes field size are broken
In theory ACL library allows fields with 8B long. Though in practice they usually not used, not tested, and as was revealed by Ido, this functionality is not working properly. There are few places inside ACL build code-path that need to be addressed. Bugzilla ID: 673 Fixes: dc276b5780c2 ("acl: new library") Cc: sta...@dpdk.org Reported-by: Ido Goshen Signed-off-by: Konstantin Ananyev --- lib/acl/acl_bld.c | 14 ++ 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/lib/acl/acl_bld.c b/lib/acl/acl_bld.c index 7ea30f4186..2816632803 100644 --- a/lib/acl/acl_bld.c +++ b/lib/acl/acl_bld.c @@ -12,6 +12,9 @@ /* number of pointers per alloc */ #define ACL_PTR_ALLOC 32 +/* account for situation when all fields are 8B long */ +#define ACL_MAX_INDEXES(2 * RTE_ACL_MAX_FIELDS) + /* macros for dividing rule sets heuristics */ #define NODE_MAX 0x4000 #define NODE_MIN 0x800 @@ -80,7 +83,7 @@ struct acl_build_context { struct tb_mem_poolpool; struct rte_acl_trie tries[RTE_ACL_MAX_TRIES]; struct rte_acl_bld_trie bld_tries[RTE_ACL_MAX_TRIES]; - uint32_tdata_indexes[RTE_ACL_MAX_TRIES][RTE_ACL_MAX_FIELDS]; + uint32_tdata_indexes[RTE_ACL_MAX_TRIES][ACL_MAX_INDEXES]; /* memory free lists for nodes and blocks used for node ptrs */ struct acl_mem_block blocks[MEM_BLOCK_NUM]; @@ -988,7 +991,7 @@ build_trie(struct acl_build_context *context, struct rte_acl_build_rule *head, */ uint64_t mask; mask = RTE_ACL_MASKLEN_TO_BITMASK( - fld->mask_range.u32, + fld->mask_range.u64, rule->config->defs[n].size); /* gen a mini-trie for this field */ @@ -1301,6 +1304,9 @@ acl_build_index(const struct rte_acl_config *config, uint32_t *data_index) if (last_header != config->defs[n].input_index) { last_header = config->defs[n].input_index; data_index[m++] = config->defs[n].offset; + if (config->defs[n].size > sizeof(uint32_t)) + data_index[m++] = config->defs[n].offset + + sizeof(uint32_t); } } @@ -1487,7 +1493,7 @@ acl_set_data_indexes(struct rte_acl_ctx *ctx) memcpy(ctx->data_indexes + ofs, ctx->trie[i].data_index, n * sizeof(ctx->data_indexes[0])); ctx->trie[i].data_index = ctx->data_indexes + ofs; - ofs += RTE_ACL_MAX_FIELDS; + ofs += ACL_MAX_INDEXES; } } @@ -1643,7 +1649,7 @@ rte_acl_build(struct rte_acl_ctx *ctx, const struct rte_acl_config *cfg) /* allocate and fill run-time structures. */ rc = rte_acl_gen(ctx, bcx.tries, bcx.bld_tries, bcx.num_tries, bcx.cfg.num_categories, - RTE_ACL_MAX_FIELDS * RTE_DIM(bcx.tries) * + ACL_MAX_INDEXES * RTE_DIM(bcx.tries) * sizeof(ctx->data_indexes[0]), max_size); if (rc == 0) { /* set data indexes. */ -- 2.34.1
Fwd: Does ACL support field size of 8 bytes?
Hi Ido, I've lots of good experience with ACL but can't make it work with u64 values I know it can be split to 2xu32 fields, but it makes it more complex to use and a wastes double number of fields (we hit the RTE_ACL_MAX_FIELDS 64 limit) Wow, that's a lot of fields... According to the documentation and rte_acl.h fields size can be 8 bytes (u64) e.g. 'The size parameter defines the length of the field in bytes. Allowable values are 1, 2, 4, or 8 bytes.' (from https://doc.dpdk.org/guides-21.11/prog_guide/packet_classif_access_ctrl.html#rule-definition) Though there's a hint it's less recommended 'Also, it is best to define fields of 8 or more bytes as 4 byte fields so that the build processes can eliminate fields that are all wild.' It's also not clear how it fits in a group (i.e. what's input_index stride) which is only 4 bytes 'All subsequent fields has to be grouped into sets of 4 consecutive bytes.' I couldn't find any example or test app that's using 8 bytes e.g. for IPv6 address 4xu32 fields are always used and not 2xu64 Should it work? Did anyone try it successfully and/or can share an example? You are right: though it is formally supported, we do not test it, and AFAIK no-one used it till now. As we do group fields by 4B long chunks anyway, 8B field is sort of awkward and confusing. To be honest, I don't even remember what was the rationale beyond introducing it at first place. Anyway, just submitted patches that should fix 8B field support (at least it works for me now): https://patches.dpdk.org/project/dpdk/list/?series=22676 Please give it a try. In long term it probably would be good to hear from you and other users, should we keep 8B support at all, or might be it would be easier just to abandon it. Thanks Konstantin
Re: [PATCH V2 1/4] net/bonding: fix non-active slaves aren't stopped
On 3/24/2022 3:00 AM, Min Hu (Connor) wrote: From: Huisong Li When stopping a bonded port, all slaves should be deactivated. But only s/deactivated/stopped/ ? active slaves are stopped. So fix it and do "deactivae_slave()" for active s/deactivae_slave()/deactivate_slave()/ slaves. Hi Connor, When a bonding port is closed, is it clear if all slave ports or active slave ports should be stopped? Fixes: 0911d4ec0183 ("net/bonding: fix crash when stopping mode 4 port") Cc: sta...@dpdk.org Signed-off-by: Huisong Li Signed-off-by: Min Hu (Connor) --- drivers/net/bonding/rte_eth_bond_pmd.c | 20 +++- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c index b305b6a35b..469dc71170 100644 --- a/drivers/net/bonding/rte_eth_bond_pmd.c +++ b/drivers/net/bonding/rte_eth_bond_pmd.c @@ -2118,18 +2118,20 @@ bond_ethdev_stop(struct rte_eth_dev *eth_dev) internals->link_status_polling_enabled = 0; for (i = 0; i < internals->slave_count; i++) { uint16_t slave_id = internals->slaves[i].port_id; + + internals->slaves[i].last_link_status = 0; + ret = rte_eth_dev_stop(slave_id); + if (ret != 0) { + RTE_BOND_LOG(ERR, "Failed to stop device on port %u", +slave_id); + return ret; Should it return here or try to stop all ports? What about to record the return status, but keep continue to stop all ports. And return error if any of the stop failed? + } + + /* active slaves need to deactivate. */ " active slaves need to be deactivated. " ? if (find_slave_by_id(internals->active_slaves, internals->active_slave_count, slave_id) != - internals->active_slave_count) { - internals->slaves[i].last_link_status = 0; - ret = rte_eth_dev_stop(slave_id); - if (ret != 0) { - RTE_BOND_LOG(ERR, "Failed to stop device on port %u", -slave_id); - return ret; - } + internals->active_slave_count) I think original indentation for this line is better. deactivate_slave(eth_dev, slave_id); - } } return 0;
Re: [PATCH V2 2/4] net/bonding: fix non-terminable while loop
On 3/24/2022 3:00 AM, Min Hu (Connor) wrote: From: Huisong Li All slaves will be stopped and removed when closing a bonded port. But the while loop can not stop if both rte_eth_dev_stop and rte_eth_bond_slave_remove fail to run. Agree that this is a defect introduced in below commit. Thanks for the fix. Fixes: fb0379bc5db3 ("net/bonding: check stop call status") Cc: sta...@dpdk.org Signed-off-by: Huisong Li Signed-off-by: Min Hu (Connor) --- drivers/net/bonding/rte_eth_bond_pmd.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c index 469dc71170..00d4deda44 100644 --- a/drivers/net/bonding/rte_eth_bond_pmd.c +++ b/drivers/net/bonding/rte_eth_bond_pmd.c @@ -2149,13 +2149,14 @@ bond_ethdev_close(struct rte_eth_dev *dev) return 0; RTE_BOND_LOG(INFO, "Closing bonded device %s", dev->device->name); - while (internals->slave_count != skipped) { + while (skipped < internals->slave_count) { When below fixed with adding 'continue', no need to change the check, right? Although new one is also correct. uint16_t port_id = internals->slaves[skipped].port_id; if (rte_eth_dev_stop(port_id) != 0) { RTE_BOND_LOG(ERR, "Failed to stop device on port %u", port_id); skipped++; + continue; Can't we remove the slave even if 'stop()' failed? If so I think better to just log the error and keep continue in that case, what do you think? } if (rte_eth_bond_slave_remove(bond_port_id, port_id) != 0) {
Re: [RFC] eal: allow worker lcore stacks to be allocated from hugepage memory
On 4/26/2022 10:58 AM, Stephen Hemminger wrote: On Tue, 26 Apr 2022 08:19:59 -0400 Don Wallwork wrote: Add support for using hugepages for worker lcore stack memory. The intent is to improve performance by reducing stack memory related TLB misses and also by using memory local to the NUMA node of each lcore. Platforms desiring to make use of this capability must enable the associated option flag and stack size settings in platform config files. --- lib/eal/linux/eal.c | 39 +++ 1 file changed, 39 insertions(+) Good idea but having a fixed size stack makes writing complex application more difficult. Plus you lose the safety of guard pages. Thanks for the quick reply. The expectation is that use of this optional feature would be limited to cases where the performance gains justify the implications of these tradeoffs. For example, a specific data plane application may be okay with limited stack size and could be tested to ensure stack usage remains within limits. Also, since this applies only to worker threads, the main thread would not be impacted by this change.
[PATCH 1/6] app/eventdev: simplify signal handling and teardown
Remove rte_*_dev calls from signal handler callback. Split ethernet device teardown into Rx and Tx sections, wait for workers to finish processing after disabling Rx to allow workers to complete processing currently held packets. Signed-off-by: Pavan Nikhilesh --- app/test-eventdev/evt_main.c | 58 +--- app/test-eventdev/evt_test.h | 3 ++ app/test-eventdev/test_perf_atq.c| 1 + app/test-eventdev/test_perf_common.c | 20 +++- app/test-eventdev/test_perf_common.h | 4 +- app/test-eventdev/test_perf_queue.c | 1 + app/test-eventdev/test_pipeline_atq.c| 1 + app/test-eventdev/test_pipeline_common.c | 19 +++- app/test-eventdev/test_pipeline_common.h | 5 +- app/test-eventdev/test_pipeline_queue.c | 1 + 10 files changed, 72 insertions(+), 41 deletions(-) diff --git a/app/test-eventdev/evt_main.c b/app/test-eventdev/evt_main.c index a7d6b0c1cf..c5d63061bf 100644 --- a/app/test-eventdev/evt_main.c +++ b/app/test-eventdev/evt_main.c @@ -19,11 +19,7 @@ struct evt_test *test; static void signal_handler(int signum) { - int i; - static uint8_t once; - - if ((signum == SIGINT || signum == SIGTERM) && !once) { - once = true; + if (signum == SIGINT || signum == SIGTERM) { printf("\nSignal %d received, preparing to exit...\n", signum); @@ -31,36 +27,7 @@ signal_handler(int signum) /* request all lcores to exit from the main loop */ *(int *)test->test_priv = true; rte_wmb(); - - if (test->ops.ethdev_destroy) - test->ops.ethdev_destroy(test, &opt); - - if (test->ops.cryptodev_destroy) - test->ops.cryptodev_destroy(test, &opt); - - rte_eal_mp_wait_lcore(); - - if (test->ops.test_result) - test->ops.test_result(test, &opt); - - if (opt.prod_type == EVT_PROD_TYPE_ETH_RX_ADPTR) { - RTE_ETH_FOREACH_DEV(i) - rte_eth_dev_close(i); - } - - if (test->ops.eventdev_destroy) - test->ops.eventdev_destroy(test, &opt); - - if (test->ops.mempool_destroy) - test->ops.mempool_destroy(test, &opt); - - if (test->ops.test_destroy) - test->ops.test_destroy(test, &opt); } - - /* exit with the expected status */ - signal(signum, SIG_DFL); - kill(getpid(), signum); } } @@ -189,10 +156,29 @@ main(int argc, char **argv) } } + if (test->ops.ethdev_rx_stop) + test->ops.ethdev_rx_stop(test, &opt); + + if (test->ops.cryptodev_destroy) + test->ops.cryptodev_destroy(test, &opt); + rte_eal_mp_wait_lcore(); - /* Print the test result */ - ret = test->ops.test_result(test, &opt); + if (test->ops.test_result) + test->ops.test_result(test, &opt); + + if (test->ops.ethdev_destroy) + test->ops.ethdev_destroy(test, &opt); + + if (test->ops.eventdev_destroy) + test->ops.eventdev_destroy(test, &opt); + + if (test->ops.mempool_destroy) + test->ops.mempool_destroy(test, &opt); + + if (test->ops.test_destroy) + test->ops.test_destroy(test, &opt); + nocap: if (ret == EVT_TEST_SUCCESS) { printf("Result: "CLGRN"%s"CLNRM"\n", "Success"); diff --git a/app/test-eventdev/evt_test.h b/app/test-eventdev/evt_test.h index 50fa474ec2..1049f99ddc 100644 --- a/app/test-eventdev/evt_test.h +++ b/app/test-eventdev/evt_test.h @@ -41,6 +41,8 @@ typedef void (*evt_test_eventdev_destroy_t) (struct evt_test *test, struct evt_options *opt); typedef void (*evt_test_ethdev_destroy_t) (struct evt_test *test, struct evt_options *opt); +typedef void (*evt_test_ethdev_rx_stop_t)(struct evt_test *test, + struct evt_options *opt); typedef void (*evt_test_cryptodev_destroy_t) (struct evt_test *test, struct evt_options *opt); typedef void (*evt_test_mempool_destroy_t) @@ -60,6 +62,7 @@ struct evt_test_ops { evt_test_launch_lcores_t launch_lcores; evt_test_result_t test_result; evt_test_eventdev_destroy_t eventdev_destroy; + evt_test_ethdev_rx_stop_t ethdev_rx_stop; evt_test_ethdev_destroy_t ethdev_destroy; evt_test_cryptodev_destroy_t cryptodev_destroy; evt_test_mempool_destroy_t mempool_destroy; diff --git a/app/test-eventdev/test_perf_atq.c b/app/test-eventdev/test_perf_atq.c index 67ff681666
[PATCH 2/6] app/eventdev: clean up worker state before exit
Event ports are configured to implicitly release the scheduler contexts currently held in the next call to rte_event_dequeue_burst(). A worker core might still hold a scheduling context during exit, as the next call to rte_event_dequeue_burst() is never made. This might lead to deadlock based on the worker exit timing and when there are very less number of flows. Add clean up function to release any scheduling contexts held by the worker by using RTE_EVENT_OP_RELEASE. Signed-off-by: Pavan Nikhilesh --- app/test-eventdev/test_perf_atq.c| 31 +++-- app/test-eventdev/test_perf_common.c | 17 +++ app/test-eventdev/test_perf_common.h | 3 + app/test-eventdev/test_perf_queue.c | 30 +++-- app/test-eventdev/test_pipeline_atq.c| 134 - app/test-eventdev/test_pipeline_common.c | 39 ++ app/test-eventdev/test_pipeline_common.h | 59 ++--- app/test-eventdev/test_pipeline_queue.c | 145 ++- 8 files changed, 304 insertions(+), 154 deletions(-) diff --git a/app/test-eventdev/test_perf_atq.c b/app/test-eventdev/test_perf_atq.c index bac3ea602f..5a0b190384 100644 --- a/app/test-eventdev/test_perf_atq.c +++ b/app/test-eventdev/test_perf_atq.c @@ -37,13 +37,14 @@ atq_fwd_event(struct rte_event *const ev, uint8_t *const sched_type_list, static int perf_atq_worker(void *arg, const int enable_fwd_latency) { - PERF_WORKER_INIT; + uint16_t enq = 0, deq = 0; struct rte_event ev; + PERF_WORKER_INIT; while (t->done == false) { - uint16_t event = rte_event_dequeue_burst(dev, port, &ev, 1, 0); + deq = rte_event_dequeue_burst(dev, port, &ev, 1, 0); - if (!event) { + if (!deq) { rte_pause(); continue; } @@ -78,24 +79,29 @@ perf_atq_worker(void *arg, const int enable_fwd_latency) bufs, sz, cnt); } else { atq_fwd_event(&ev, sched_type_list, nb_stages); - while (rte_event_enqueue_burst(dev, port, &ev, 1) != 1) - rte_pause(); + do { + enq = rte_event_enqueue_burst(dev, port, &ev, + 1); + } while (!enq && !t->done); } } + + perf_worker_cleanup(pool, dev, port, &ev, enq, deq); + return 0; } static int perf_atq_worker_burst(void *arg, const int enable_fwd_latency) { - PERF_WORKER_INIT; - uint16_t i; /* +1 to avoid prefetch out of array check */ struct rte_event ev[BURST_SIZE + 1]; + uint16_t enq = 0, nb_rx = 0; + PERF_WORKER_INIT; + uint16_t i; while (t->done == false) { - uint16_t const nb_rx = rte_event_dequeue_burst(dev, port, ev, - BURST_SIZE, 0); + nb_rx = rte_event_dequeue_burst(dev, port, ev, BURST_SIZE, 0); if (!nb_rx) { rte_pause(); @@ -146,14 +152,15 @@ perf_atq_worker_burst(void *arg, const int enable_fwd_latency) } } - uint16_t enq; - enq = rte_event_enqueue_burst(dev, port, ev, nb_rx); - while (enq < nb_rx) { + while ((enq < nb_rx) && !t->done) { enq += rte_event_enqueue_burst(dev, port, ev + enq, nb_rx - enq); } } + + perf_worker_cleanup(pool, dev, port, ev, enq, nb_rx); + return 0; } diff --git a/app/test-eventdev/test_perf_common.c b/app/test-eventdev/test_perf_common.c index e93b0e7272..f673a9fddd 100644 --- a/app/test-eventdev/test_perf_common.c +++ b/app/test-eventdev/test_perf_common.c @@ -985,6 +985,23 @@ perf_opt_dump(struct evt_options *opt, uint8_t nb_queues) evt_dump("prod_enq_burst_sz", "%d", opt->prod_enq_burst_sz); } +void +perf_worker_cleanup(struct rte_mempool *const pool, uint8_t dev_id, + uint8_t port_id, struct rte_event events[], uint16_t nb_enq, + uint16_t nb_deq) +{ + int i; + + if (nb_deq) { + for (i = nb_enq; i < nb_deq; i++) + rte_mempool_put(pool, events[i].event_ptr); + + for (i = 0; i < nb_deq; i++) + events[i].op = RTE_EVENT_OP_RELEASE; + rte_event_enqueue_burst(dev_id, port_id, events, nb_deq); + } +} + void perf_eventdev_destroy(struct evt_test *test, struct evt_options *opt) { diff --git a/app/test-eventdev/test_perf_common.h b/app/test-eventdev/test_perf_common.h index e504bb1df9..f6bfc73be0 100644 --- a/app/test-eventdev/test_perf_common.h +++ b/app/test-eventdev/test_perf_common.h @@ -184,5 +184,8 @@ void perf_cryptodev_d
[PATCH 3/6] examples/eventdev: clean up worker state before exit
Event ports are configured to implicitly release the scheduler contexts currently held in the next call to rte_event_dequeue_burst(). A worker core might still hold a scheduling context during exit, as the next call to rte_event_dequeue_burst() is never made. This might lead to deadlock based on the worker exit timing and when there are very less number of flows. Add clean up function to release any scheduling contexts held by the worker by using RTE_EVENT_OP_RELEASE. Signed-off-by: Pavan Nikhilesh --- examples/eventdev_pipeline/pipeline_common.h | 22 ++ .../pipeline_worker_generic.c | 23 +++--- .../eventdev_pipeline/pipeline_worker_tx.c| 79 --- 3 files changed, 87 insertions(+), 37 deletions(-) diff --git a/examples/eventdev_pipeline/pipeline_common.h b/examples/eventdev_pipeline/pipeline_common.h index b12eb281e1..9899b257b0 100644 --- a/examples/eventdev_pipeline/pipeline_common.h +++ b/examples/eventdev_pipeline/pipeline_common.h @@ -140,5 +140,27 @@ schedule_devices(unsigned int lcore_id) } } +static inline void +worker_cleanup(uint8_t dev_id, uint8_t port_id, struct rte_event events[], + uint16_t nb_enq, uint16_t nb_deq) +{ + int i; + + if (!(nb_deq - nb_enq)) + return; + + if (nb_deq) { + for (i = nb_enq; i < nb_deq; i++) { + if (events[i].op == RTE_EVENT_OP_RELEASE) + continue; + rte_pktmbuf_free(events[i].mbuf); + } + + for (i = 0; i < nb_deq; i++) + events[i].op = RTE_EVENT_OP_RELEASE; + rte_event_enqueue_burst(dev_id, port_id, events, nb_deq); + } +} + void set_worker_generic_setup_data(struct setup_data *caps, bool burst); void set_worker_tx_enq_setup_data(struct setup_data *caps, bool burst); diff --git a/examples/eventdev_pipeline/pipeline_worker_generic.c b/examples/eventdev_pipeline/pipeline_worker_generic.c index ce1e92d59e..c564c808e2 100644 --- a/examples/eventdev_pipeline/pipeline_worker_generic.c +++ b/examples/eventdev_pipeline/pipeline_worker_generic.c @@ -16,6 +16,7 @@ worker_generic(void *arg) uint8_t port_id = data->port_id; size_t sent = 0, received = 0; unsigned int lcore_id = rte_lcore_id(); + uint16_t nb_rx = 0, nb_tx = 0; while (!fdata->done) { @@ -27,8 +28,7 @@ worker_generic(void *arg) continue; } - const uint16_t nb_rx = rte_event_dequeue_burst(dev_id, port_id, - &ev, 1, 0); + nb_rx = rte_event_dequeue_burst(dev_id, port_id, &ev, 1, 0); if (nb_rx == 0) { rte_pause(); @@ -47,11 +47,14 @@ worker_generic(void *arg) work(); - while (rte_event_enqueue_burst(dev_id, port_id, &ev, 1) != 1) - rte_pause(); + do { + nb_tx = rte_event_enqueue_burst(dev_id, port_id, &ev, + 1); + } while (!nb_tx && !fdata->done); sent++; } + worker_cleanup(dev_id, port_id, &ev, nb_tx, nb_rx); if (!cdata.quiet) printf(" worker %u thread done. RX=%zu TX=%zu\n", rte_lcore_id(), received, sent); @@ -69,10 +72,9 @@ worker_generic_burst(void *arg) uint8_t port_id = data->port_id; size_t sent = 0, received = 0; unsigned int lcore_id = rte_lcore_id(); + uint16_t i, nb_rx = 0, nb_tx = 0; while (!fdata->done) { - uint16_t i; - if (fdata->cap.scheduler) fdata->cap.scheduler(lcore_id); @@ -81,8 +83,8 @@ worker_generic_burst(void *arg) continue; } - const uint16_t nb_rx = rte_event_dequeue_burst(dev_id, port_id, - events, RTE_DIM(events), 0); + nb_rx = rte_event_dequeue_burst(dev_id, port_id, events, + RTE_DIM(events), 0); if (nb_rx == 0) { rte_pause(); @@ -103,8 +105,7 @@ worker_generic_burst(void *arg) work(); } - uint16_t nb_tx = rte_event_enqueue_burst(dev_id, port_id, - events, nb_rx); + nb_tx = rte_event_enqueue_burst(dev_id, port_id, events, nb_rx); while (nb_tx < nb_rx && !fdata->done) nb_tx += rte_event_enqueue_burst(dev_id, port_id, events + nb_tx, @@ -112,6 +113,8 @@ worker_generic_burst(void *arg) sent += nb_tx; } + worker_cleanup(dev_id, port_id, events, nb_tx, nb_rx); + if (!cdata.quiet) p
[PATCH 4/6] examples/l3fwd: clean up worker state before exit
Event ports are configured to implicitly release the scheduler contexts currently held in the next call to rte_event_dequeue_burst(). A worker core might still hold a scheduling context during exit, as the next call to rte_event_dequeue_burst() is never made. This might lead to deadlock based on the worker exit timing and when there are very less number of flows. Add clean up function to release any scheduling contexts held by the worker by using RTE_EVENT_OP_RELEASE. Signed-off-by: Pavan Nikhilesh --- examples/l3fwd/l3fwd_em.c| 32 ++-- examples/l3fwd/l3fwd_event.c | 34 ++ examples/l3fwd/l3fwd_event.h | 5 + examples/l3fwd/l3fwd_fib.c | 10 -- examples/l3fwd/l3fwd_lpm.c | 32 ++-- 5 files changed, 91 insertions(+), 22 deletions(-) diff --git a/examples/l3fwd/l3fwd_em.c b/examples/l3fwd/l3fwd_em.c index 24d0910fe0..6f8d94f120 100644 --- a/examples/l3fwd/l3fwd_em.c +++ b/examples/l3fwd/l3fwd_em.c @@ -653,6 +653,7 @@ em_event_loop_single(struct l3fwd_event_resources *evt_rsrc, const uint8_t tx_q_id = evt_rsrc->evq.event_q_id[ evt_rsrc->evq.nb_queues - 1]; const uint8_t event_d_id = evt_rsrc->event_d_id; + uint8_t deq = 0, enq = 0; struct lcore_conf *lconf; unsigned int lcore_id; struct rte_event ev; @@ -665,7 +666,9 @@ em_event_loop_single(struct l3fwd_event_resources *evt_rsrc, RTE_LOG(INFO, L3FWD, "entering %s on lcore %u\n", __func__, lcore_id); while (!force_quit) { - if (!rte_event_dequeue_burst(event_d_id, event_p_id, &ev, 1, 0)) + deq = rte_event_dequeue_burst(event_d_id, event_p_id, &ev, 1, + 0); + if (!deq) continue; struct rte_mbuf *mbuf = ev.mbuf; @@ -684,19 +687,22 @@ em_event_loop_single(struct l3fwd_event_resources *evt_rsrc, if (flags & L3FWD_EVENT_TX_ENQ) { ev.queue_id = tx_q_id; ev.op = RTE_EVENT_OP_FORWARD; - while (rte_event_enqueue_burst(event_d_id, event_p_id, - &ev, 1) && !force_quit) - ; + do { + enq = rte_event_enqueue_burst( + event_d_id, event_p_id, &ev, 1); + } while (!enq && !force_quit); } if (flags & L3FWD_EVENT_TX_DIRECT) { rte_event_eth_tx_adapter_txq_set(mbuf, 0); - while (!rte_event_eth_tx_adapter_enqueue(event_d_id, - event_p_id, &ev, 1, 0) && - !force_quit) - ; + do { + enq = rte_event_eth_tx_adapter_enqueue( + event_d_id, event_p_id, &ev, 1, 0); + } while (!enq && !force_quit); } } + + l3fwd_event_worker_cleanup(event_d_id, event_p_id, &ev, enq, deq, 0); } static __rte_always_inline void @@ -709,9 +715,9 @@ em_event_loop_burst(struct l3fwd_event_resources *evt_rsrc, const uint8_t event_d_id = evt_rsrc->event_d_id; const uint16_t deq_len = evt_rsrc->deq_depth; struct rte_event events[MAX_PKT_BURST]; + int i, nb_enq = 0, nb_deq = 0; struct lcore_conf *lconf; unsigned int lcore_id; - int i, nb_enq, nb_deq; if (event_p_id < 0) return; @@ -769,6 +775,9 @@ em_event_loop_burst(struct l3fwd_event_resources *evt_rsrc, nb_deq - nb_enq, 0); } } + + l3fwd_event_worker_cleanup(event_d_id, event_p_id, events, nb_enq, + nb_deq, 0); } static __rte_always_inline void @@ -832,9 +841,9 @@ em_event_loop_vector(struct l3fwd_event_resources *evt_rsrc, const uint8_t event_d_id = evt_rsrc->event_d_id; const uint16_t deq_len = evt_rsrc->deq_depth; struct rte_event events[MAX_PKT_BURST]; + int i, nb_enq = 0, nb_deq = 0; struct lcore_conf *lconf; unsigned int lcore_id; - int i, nb_enq, nb_deq; if (event_p_id < 0) return; @@ -887,6 +896,9 @@ em_event_loop_vector(struct l3fwd_event_resources *evt_rsrc, nb_deq - nb_enq, 0); } } + + l3fwd_event_worker_cleanup(event_d_id, event_p_id, events, nb_enq, + nb_deq, 1); } int __rte_noinline diff --git a/examples/l3fwd/l3fwd_event.c b/examples/l3fwd/l3fwd_event.c index 7a401290f8..a14a21b414 100644 --- a/examples/l3fwd/l3fwd_event.c +++ b/examples/l3fwd/l3fwd_event.c @@
[PATCH 5/6] examples/l2fwd-event: clean up worker state before exit
Event ports are configured to implicitly release the scheduler contexts currently held in the next call to rte_event_dequeue_burst(). A worker core might still hold a scheduling context during exit, as the next call to rte_event_dequeue_burst() is never made. This might lead to deadlock based on the worker exit timing and when there are very less number of flows. Add clean up function to release any scheduling contexts held by the worker by using RTE_EVENT_OP_RELEASE. Signed-off-by: Pavan Nikhilesh --- examples/l2fwd-event/l2fwd_common.c | 34 + examples/l2fwd-event/l2fwd_common.h | 3 +++ examples/l2fwd-event/l2fwd_event.c | 31 -- 3 files changed, 56 insertions(+), 12 deletions(-) diff --git a/examples/l2fwd-event/l2fwd_common.c b/examples/l2fwd-event/l2fwd_common.c index cf3d1b8aaf..15bfe790a0 100644 --- a/examples/l2fwd-event/l2fwd_common.c +++ b/examples/l2fwd-event/l2fwd_common.c @@ -114,3 +114,37 @@ l2fwd_event_init_ports(struct l2fwd_resources *rsrc) return nb_ports_available; } + +static void +l2fwd_event_vector_array_free(struct rte_event events[], uint16_t num) +{ + uint16_t i; + + for (i = 0; i < num; i++) { + rte_pktmbuf_free_bulk(events[i].vec->mbufs, + events[i].vec->nb_elem); + rte_mempool_put(rte_mempool_from_obj(events[i].vec), + events[i].vec); + } +} + +void +l2fwd_event_worker_cleanup(uint8_t event_d_id, uint8_t port_id, + struct rte_event events[], uint16_t nb_enq, + uint16_t nb_deq, uint8_t is_vector) +{ + int i; + + if (nb_deq) { + if (is_vector) + l2fwd_event_vector_array_free(events + nb_enq, + nb_deq - nb_enq); + else + for (i = nb_enq; i < nb_deq; i++) + rte_pktmbuf_free(events[i].mbuf); + + for (i = 0; i < nb_deq; i++) + events[i].op = RTE_EVENT_OP_RELEASE; + rte_event_enqueue_burst(event_d_id, port_id, events, nb_deq); + } +} diff --git a/examples/l2fwd-event/l2fwd_common.h b/examples/l2fwd-event/l2fwd_common.h index 396e238c6a..bff3b65abf 100644 --- a/examples/l2fwd-event/l2fwd_common.h +++ b/examples/l2fwd-event/l2fwd_common.h @@ -140,5 +140,8 @@ l2fwd_get_rsrc(void) } int l2fwd_event_init_ports(struct l2fwd_resources *rsrc); +void l2fwd_event_worker_cleanup(uint8_t event_d_id, uint8_t port_id, + struct rte_event events[], uint16_t nb_enq, + uint16_t nb_deq, uint8_t is_vector); #endif /* __L2FWD_COMMON_H__ */ diff --git a/examples/l2fwd-event/l2fwd_event.c b/examples/l2fwd-event/l2fwd_event.c index 6df3cdfeab..63450537fe 100644 --- a/examples/l2fwd-event/l2fwd_event.c +++ b/examples/l2fwd-event/l2fwd_event.c @@ -193,6 +193,7 @@ l2fwd_event_loop_single(struct l2fwd_resources *rsrc, evt_rsrc->evq.nb_queues - 1]; const uint64_t timer_period = rsrc->timer_period; const uint8_t event_d_id = evt_rsrc->event_d_id; + uint8_t enq = 0, deq = 0; struct rte_event ev; if (port_id < 0) @@ -203,26 +204,28 @@ l2fwd_event_loop_single(struct l2fwd_resources *rsrc, while (!rsrc->force_quit) { /* Read packet from eventdev */ - if (!rte_event_dequeue_burst(event_d_id, port_id, &ev, 1, 0)) + deq = rte_event_dequeue_burst(event_d_id, port_id, &ev, 1, 0); + if (!deq) continue; l2fwd_event_fwd(rsrc, &ev, tx_q_id, timer_period, flags); if (flags & L2FWD_EVENT_TX_ENQ) { - while (rte_event_enqueue_burst(event_d_id, port_id, - &ev, 1) && - !rsrc->force_quit) - ; + do { + enq = rte_event_enqueue_burst(event_d_id, + port_id, &ev, 1); + } while (!enq && !rsrc->force_quit); } if (flags & L2FWD_EVENT_TX_DIRECT) { - while (!rte_event_eth_tx_adapter_enqueue(event_d_id, - port_id, - &ev, 1, 0) && - !rsrc->force_quit) - ; + do { + enq = rte_event_eth_tx_adapter_enqueue( + event_d_id, port_id, &ev, 1, 0); + } while (!enq && !rsrc->force_quit); } }
[PATCH 6/6] examples/ipsec-secgw: cleanup worker state before exit
Event ports are configured to implicitly release the scheduler contexts currently held in the next call to rte_event_dequeue_burst(). A worker core might still hold a scheduling context during exit as the next call to rte_event_dequeue_burst() is never made. This might lead to deadlock based on the worker exit timing and when there are very less number of flows. Add a cleanup function to release any scheduling contexts held by the worker by using RTE_EVENT_OP_RELEASE. Signed-off-by: Pavan Nikhilesh --- examples/ipsec-secgw/ipsec_worker.c | 40 - 1 file changed, 28 insertions(+), 12 deletions(-) diff --git a/examples/ipsec-secgw/ipsec_worker.c b/examples/ipsec-secgw/ipsec_worker.c index 8639426c5c..3df5acf384 100644 --- a/examples/ipsec-secgw/ipsec_worker.c +++ b/examples/ipsec-secgw/ipsec_worker.c @@ -749,7 +749,7 @@ ipsec_wrkr_non_burst_int_port_drv_mode(struct eh_event_link_info *links, uint8_t nb_links) { struct port_drv_mode_data data[RTE_MAX_ETHPORTS]; - unsigned int nb_rx = 0; + unsigned int nb_rx = 0, nb_tx; struct rte_mbuf *pkt; struct rte_event ev; uint32_t lcore_id; @@ -847,11 +847,19 @@ ipsec_wrkr_non_burst_int_port_drv_mode(struct eh_event_link_info *links, * directly enqueued to the adapter and it would be * internally submitted to the eth device. */ - rte_event_eth_tx_adapter_enqueue(links[0].eventdev_id, - links[0].event_port_id, - &ev,/* events */ - 1, /* nb_events */ - 0 /* flags */); + nb_tx = rte_event_eth_tx_adapter_enqueue(links[0].eventdev_id, +links[0].event_port_id, +&ev, /* events */ +1, /* nb_events */ +0 /* flags */); + if (!nb_tx) + rte_pktmbuf_free(ev.mbuf); + } + + if (ev.u64) { + ev.op = RTE_EVENT_OP_RELEASE; + rte_event_enqueue_burst(links[0].eventdev_id, + links[0].event_port_id, &ev, 1); } } @@ -864,7 +872,7 @@ ipsec_wrkr_non_burst_int_port_app_mode(struct eh_event_link_info *links, uint8_t nb_links) { struct lcore_conf_ev_tx_int_port_wrkr lconf; - unsigned int nb_rx = 0; + unsigned int nb_rx = 0, nb_tx; struct rte_event ev; uint32_t lcore_id; int32_t socket_id; @@ -952,11 +960,19 @@ ipsec_wrkr_non_burst_int_port_app_mode(struct eh_event_link_info *links, * directly enqueued to the adapter and it would be * internally submitted to the eth device. */ - rte_event_eth_tx_adapter_enqueue(links[0].eventdev_id, - links[0].event_port_id, - &ev,/* events */ - 1, /* nb_events */ - 0 /* flags */); + nb_tx = rte_event_eth_tx_adapter_enqueue(links[0].eventdev_id, +links[0].event_port_id, +&ev, /* events */ +1, /* nb_events */ +0 /* flags */); + if (!nb_tx) + rte_pktmbuf_free(ev.mbuf); + } + + if (ev.u64) { + ev.op = RTE_EVENT_OP_RELEASE; + rte_event_enqueue_burst(links[0].eventdev_id, + links[0].event_port_id, &ev, 1); } } -- 2.25.1
Re: [RFC] eal: allow worker lcore stacks to be allocated from hugepage memory
On Tue, 26 Apr 2022 17:01:18 -0400 Don Wallwork wrote: > On 4/26/2022 10:58 AM, Stephen Hemminger wrote: > > On Tue, 26 Apr 2022 08:19:59 -0400 > > Don Wallwork wrote: > > > >> Add support for using hugepages for worker lcore stack memory. The > >> intent is to improve performance by reducing stack memory related TLB > >> misses and also by using memory local to the NUMA node of each lcore. > >> > >> Platforms desiring to make use of this capability must enable the > >> associated option flag and stack size settings in platform config > >> files. > >> --- > >> lib/eal/linux/eal.c | 39 +++ > >> 1 file changed, 39 insertions(+) > >> > > Good idea but having a fixed size stack makes writing complex application > > more difficult. Plus you lose the safety of guard pages. > > Thanks for the quick reply. > > The expectation is that use of this optional feature would be limited to > cases where > the performance gains justify the implications of these tradeoffs. For > example, a specific > data plane application may be okay with limited stack size and could be > tested to ensure > stack usage remains within limits. > > Also, since this applies only to worker threads, the main thread would > not be impacted > by this change. > > I would prefer it as a runtime, not compile time option. That way distributions could ship DPDK and application could opt in if it wanted.
Re: [RFC] eal: allow worker lcore stacks to be allocated from hugepage memory
On 4/26/2022 5:21 PM, Stephen Hemminger wrote: On Tue, 26 Apr 2022 17:01:18 -0400 Don Wallwork wrote: On 4/26/2022 10:58 AM, Stephen Hemminger wrote: On Tue, 26 Apr 2022 08:19:59 -0400 Don Wallwork wrote: Add support for using hugepages for worker lcore stack memory. The intent is to improve performance by reducing stack memory related TLB misses and also by using memory local to the NUMA node of each lcore. Platforms desiring to make use of this capability must enable the associated option flag and stack size settings in platform config files. --- lib/eal/linux/eal.c | 39 +++ 1 file changed, 39 insertions(+) Good idea but having a fixed size stack makes writing complex application more difficult. Plus you lose the safety of guard pages. Thanks for the quick reply. The expectation is that use of this optional feature would be limited to cases where the performance gains justify the implications of these tradeoffs. For example, a specific data plane application may be okay with limited stack size and could be tested to ensure stack usage remains within limits. Also, since this applies only to worker threads, the main thread would not be impacted by this change. I would prefer it as a runtime, not compile time option. That way distributions could ship DPDK and application could opt in if it wanted. Good point.. I'll work on a v2 and will post that when it's ready.
[PATCH 1/2] event/cnxk: add additional checks in OP_RELEASE
Add additional checks while performing RTE_EVENT_OP_RELEASE to ensure that there are no pending SWTAGs and FLUSHEs in flight. Signed-off-by: Pavan Nikhilesh --- drivers/event/cnxk/cn10k_eventdev.c | 4 +--- drivers/event/cnxk/cn10k_worker.c | 8 ++-- drivers/event/cnxk/cn9k_eventdev.c | 4 +--- drivers/event/cnxk/cn9k_worker.c| 16 drivers/event/cnxk/cn9k_worker.h| 3 +-- drivers/event/cnxk/cnxk_worker.h| 17 ++--- 6 files changed, 35 insertions(+), 17 deletions(-) diff --git a/drivers/event/cnxk/cn10k_eventdev.c b/drivers/event/cnxk/cn10k_eventdev.c index 9b4d2895ec..2fa2cd31c2 100644 --- a/drivers/event/cnxk/cn10k_eventdev.c +++ b/drivers/event/cnxk/cn10k_eventdev.c @@ -137,9 +137,7 @@ cn10k_sso_hws_flush_events(void *hws, uint8_t queue_id, uintptr_t base, if (fn != NULL && ev.u64 != 0) fn(arg, ev); if (ev.sched_type != SSO_TT_EMPTY) - cnxk_sso_hws_swtag_flush( - ws->base + SSOW_LF_GWS_WQE0, - ws->base + SSOW_LF_GWS_OP_SWTAG_FLUSH); + cnxk_sso_hws_swtag_flush(ws->base); do { val = plt_read64(ws->base + SSOW_LF_GWS_PENDSTATE); } while (val & BIT_ULL(56)); diff --git a/drivers/event/cnxk/cn10k_worker.c b/drivers/event/cnxk/cn10k_worker.c index 975a22336a..0d99b4c5e5 100644 --- a/drivers/event/cnxk/cn10k_worker.c +++ b/drivers/event/cnxk/cn10k_worker.c @@ -18,8 +18,12 @@ cn10k_sso_hws_enq(void *port, const struct rte_event *ev) cn10k_sso_hws_forward_event(ws, ev); break; case RTE_EVENT_OP_RELEASE: - cnxk_sso_hws_swtag_flush(ws->base + SSOW_LF_GWS_WQE0, -ws->base + SSOW_LF_GWS_OP_SWTAG_FLUSH); + if (ws->swtag_req) { + cnxk_sso_hws_desched(ev->u64, ws->base); + ws->swtag_req = 0; + break; + } + cnxk_sso_hws_swtag_flush(ws->base); break; default: return 0; diff --git a/drivers/event/cnxk/cn9k_eventdev.c b/drivers/event/cnxk/cn9k_eventdev.c index 4bba477dd1..41bbe3cb22 100644 --- a/drivers/event/cnxk/cn9k_eventdev.c +++ b/drivers/event/cnxk/cn9k_eventdev.c @@ -156,9 +156,7 @@ cn9k_sso_hws_flush_events(void *hws, uint8_t queue_id, uintptr_t base, if (fn != NULL && ev.u64 != 0) fn(arg, ev); if (ev.sched_type != SSO_TT_EMPTY) - cnxk_sso_hws_swtag_flush( - ws_base + SSOW_LF_GWS_TAG, - ws_base + SSOW_LF_GWS_OP_SWTAG_FLUSH); + cnxk_sso_hws_swtag_flush(ws_base); do { val = plt_read64(ws_base + SSOW_LF_GWS_PENDSTATE); } while (val & BIT_ULL(56)); diff --git a/drivers/event/cnxk/cn9k_worker.c b/drivers/event/cnxk/cn9k_worker.c index a981bc986f..41dbe6cafb 100644 --- a/drivers/event/cnxk/cn9k_worker.c +++ b/drivers/event/cnxk/cn9k_worker.c @@ -19,8 +19,12 @@ cn9k_sso_hws_enq(void *port, const struct rte_event *ev) cn9k_sso_hws_forward_event(ws, ev); break; case RTE_EVENT_OP_RELEASE: - cnxk_sso_hws_swtag_flush(ws->base + SSOW_LF_GWS_TAG, -ws->base + SSOW_LF_GWS_OP_SWTAG_FLUSH); + if (ws->swtag_req) { + cnxk_sso_hws_desched(ev->u64, ws->base); + ws->swtag_req = 0; + break; + } + cnxk_sso_hws_swtag_flush(ws->base); break; default: return 0; @@ -78,8 +82,12 @@ cn9k_sso_hws_dual_enq(void *port, const struct rte_event *ev) cn9k_sso_hws_dual_forward_event(dws, base, ev); break; case RTE_EVENT_OP_RELEASE: - cnxk_sso_hws_swtag_flush(base + SSOW_LF_GWS_TAG, -base + SSOW_LF_GWS_OP_SWTAG_FLUSH); + if (dws->swtag_req) { + cnxk_sso_hws_desched(ev->u64, base); + dws->swtag_req = 0; + break; + } + cnxk_sso_hws_swtag_flush(base); break; default: return 0; diff --git a/drivers/event/cnxk/cn9k_worker.h b/drivers/event/cnxk/cn9k_worker.h index 917d1e0b40..88eb4e9cf9 100644 --- a/drivers/event/cnxk/cn9k_worker.h +++ b/drivers/event/cnxk/cn9k_worker.h @@ -841,8 +841,7 @@ cn9k_sso_hws_event_tx(uint64_t base, struct rte_event *ev, uint64_t *cmd, return 1; } - cnxk_sso_hws_swtag_flush(base + SSOW_LF_GWS_TAG, -base + SSOW_LF_GWS_OP_SWTAG_FLUSH); + cnxk_sso_hws_swtag_flush(b
[PATCH 2/2] event/cnxk: move post-processing to separate function
Move event post-processing to a separate function. Do complete event post-processing in tear-down functions to prevent incorrect memory free. Signed-off-by: Pavan Nikhilesh --- drivers/event/cnxk/cn10k_eventdev.c | 5 +- drivers/event/cnxk/cn10k_worker.h | 190 +--- drivers/event/cnxk/cn9k_eventdev.c | 9 +- drivers/event/cnxk/cn9k_worker.h| 114 ++--- 4 files changed, 138 insertions(+), 180 deletions(-) diff --git a/drivers/event/cnxk/cn10k_eventdev.c b/drivers/event/cnxk/cn10k_eventdev.c index 2fa2cd31c2..94829e789c 100644 --- a/drivers/event/cnxk/cn10k_eventdev.c +++ b/drivers/event/cnxk/cn10k_eventdev.c @@ -133,7 +133,10 @@ cn10k_sso_hws_flush_events(void *hws, uint8_t queue_id, uintptr_t base, while (aq_cnt || cq_ds_cnt || ds_cnt) { plt_write64(req, ws->base + SSOW_LF_GWS_OP_GET_WORK0); - cn10k_sso_hws_get_work_empty(ws, &ev); + cn10k_sso_hws_get_work_empty( + ws, &ev, + (NIX_RX_OFFLOAD_MAX - 1) | NIX_RX_REAS_F | + NIX_RX_MULTI_SEG_F | CPT_RX_WQE_F); if (fn != NULL && ev.u64 != 0) fn(arg, ev); if (ev.sched_type != SSO_TT_EMPTY) diff --git a/drivers/event/cnxk/cn10k_worker.h b/drivers/event/cnxk/cn10k_worker.h index c96048f47d..03bae4bd53 100644 --- a/drivers/event/cnxk/cn10k_worker.h +++ b/drivers/event/cnxk/cn10k_worker.h @@ -196,15 +196,88 @@ cn10k_process_vwqe(uintptr_t vwqe, uint16_t port_id, const uint32_t flags, } } +static __rte_always_inline void +cn10k_sso_hws_post_process(struct cn10k_sso_hws *ws, uint64_t *u64, + const uint32_t flags) +{ + uint64_t tstamp_ptr; + + u64[0] = (u64[0] & (0x3ull << 32)) << 6 | +(u64[0] & (0x3FFull << 36)) << 4 | (u64[0] & 0x); + if ((flags & CPT_RX_WQE_F) && + (CNXK_EVENT_TYPE_FROM_TAG(u64[0]) == RTE_EVENT_TYPE_CRYPTODEV)) { + u64[1] = cn10k_cpt_crypto_adapter_dequeue(u64[1]); + } else if (CNXK_EVENT_TYPE_FROM_TAG(u64[0]) == RTE_EVENT_TYPE_ETHDEV) { + uint8_t port = CNXK_SUB_EVENT_FROM_TAG(u64[0]); + uint64_t mbuf; + + mbuf = u64[1] - sizeof(struct rte_mbuf); + rte_prefetch0((void *)mbuf); + if (flags & NIX_RX_OFFLOAD_SECURITY_F) { + const uint64_t mbuf_init = + 0x10001ULL | RTE_PKTMBUF_HEADROOM | + (flags & NIX_RX_OFFLOAD_TSTAMP_F ? 8 : 0); + struct rte_mbuf *m; + uintptr_t sa_base; + uint64_t iova = 0; + uint8_t loff = 0; + uint16_t d_off; + uint64_t cq_w1; + uint64_t cq_w5; + + m = (struct rte_mbuf *)mbuf; + d_off = (uintptr_t)(m->buf_addr) - (uintptr_t)m; + d_off += RTE_PKTMBUF_HEADROOM; + + cq_w1 = *(uint64_t *)(u64[1] + 8); + cq_w5 = *(uint64_t *)(u64[1] + 40); + + sa_base = cnxk_nix_sa_base_get(port, ws->lookup_mem); + sa_base &= ~(ROC_NIX_INL_SA_BASE_ALIGN - 1); + + mbuf = (uint64_t)nix_sec_meta_to_mbuf_sc( + cq_w1, cq_w5, sa_base, (uintptr_t)&iova, &loff, + (struct rte_mbuf *)mbuf, d_off, flags, + mbuf_init | ((uint64_t)port) << 48); + if (loff) + roc_npa_aura_op_free(m->pool->pool_id, 0, iova); + } + + u64[0] = CNXK_CLR_SUB_EVENT(u64[0]); + cn10k_wqe_to_mbuf(u64[1], mbuf, port, u64[0] & 0xF, flags, + ws->lookup_mem); + /* Extracting tstamp, if PTP enabled*/ + tstamp_ptr = *(uint64_t *)(((struct nix_wqe_hdr_s *)u64[1]) + + CNXK_SSO_WQE_SG_PTR); + cn10k_nix_mbuf_to_tstamp((struct rte_mbuf *)mbuf, ws->tstamp, +flags & NIX_RX_OFFLOAD_TSTAMP_F, +(uint64_t *)tstamp_ptr); + u64[1] = mbuf; + } else if (CNXK_EVENT_TYPE_FROM_TAG(u64[0]) == + RTE_EVENT_TYPE_ETHDEV_VECTOR) { + uint8_t port = CNXK_SUB_EVENT_FROM_TAG(u64[0]); + __uint128_t vwqe_hdr = *(__uint128_t *)u64[1]; + + vwqe_hdr = ((vwqe_hdr >> 64) & 0xFFF) | BIT_ULL(31) | + ((vwqe_hdr & 0x) << 48) | ((uint64_t)port << 32); + *(uint64_t *)u64[1] = (uint64_t)vwqe_hdr; + cn10k_process_vwqe(u64[1], port, flags, ws->lookup_mem, + ws->tstamp, ws->lmt_base
Re: [Patch v2] net/netvsc: report correct stats values
On 3/24/2022 5:45 PM, lon...@linuxonhyperv.com wrote: From: Long Li The netvsc should add to the values from the VF and report the sum. Per port stats already accumulated, like: 'stats->opackets += txq->stats.packets;' Fixes: 4e9c73e96e ("net/netvsc: add Hyper-V network device") Cc: sta...@dpdk.org Signed-off-by: Long Li --- drivers/net/netvsc/hn_ethdev.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/net/netvsc/hn_ethdev.c b/drivers/net/netvsc/hn_ethdev.c index 0a357d3645..a6202d898b 100644 --- a/drivers/net/netvsc/hn_ethdev.c +++ b/drivers/net/netvsc/hn_ethdev.c @@ -804,8 +804,8 @@ static int hn_dev_stats_get(struct rte_eth_dev *dev, stats->oerrors += txq->stats.errors; if (i < RTE_ETHDEV_QUEUE_STAT_CNTRS) { - stats->q_opackets[i] = txq->stats.packets; - stats->q_obytes[i] = txq->stats.bytes; + stats->q_opackets[i] += txq->stats.packets; + stats->q_obytes[i] += txq->stats.bytes; This is per queue stats, 'stats->q_opackets[i]', in next iteration of the loop, 'i' will be increased and 'txq' will be updated, so as far as I can see the above change has no affect. } } @@ -821,12 +821,12 @@ static int hn_dev_stats_get(struct rte_eth_dev *dev, stats->imissed += rxq->stats.ring_full; if (i < RTE_ETHDEV_QUEUE_STAT_CNTRS) { - stats->q_ipackets[i] = rxq->stats.packets; - stats->q_ibytes[i] = rxq->stats.bytes; + stats->q_ipackets[i] += rxq->stats.packets; + stats->q_ibytes[i] += rxq->stats.bytes; } } - stats->rx_nombuf = dev->data->rx_mbuf_alloc_failed; + stats->rx_nombuf += dev->data->rx_mbuf_alloc_failed; Why '+='? Is 'dev->data->rx_mbuf_alloc_failed' reset somewhere between two consecutive stats get call? Anyway, above line has no affect, since the 'stats->rx_nombuf' is overwritten by 'rte_eth_stats_get()'. So above line can be removed. return 0; }
Re: [Patch v2] net/netvsc: fix the calculation of checksums based on mbuf flag
On 3/24/2022 5:46 PM, lon...@linuxonhyperv.com wrote: From: Long Li The netvsc should use RTE_MBUF_F_TX_L4_MASK and check the masked value to decide the correct way to calculate checksums. Not checking for RTE_MBUF_F_TX_L4_MASK results in incorrect RNDIS packets sent to VSP and incorrect checksums calculated by the VSP. Fixes: 4e9c73e96e ("net/netvsc: add Hyper-V network device") Cc: sta...@dpdk.org Signed-off-by: Long Li Reviewed-by: Ferruh Yigit Moving ack from previous version: Acked-by: Stephen Hemminger Applied to dpdk-next-net/main, thanks.
Re: [Patch v2] net/netvsc: fix the calculation of checksums based on mbuf flag
On 3/24/2022 5:46 PM, lon...@linuxonhyperv.com wrote: From: Long Li The netvsc should use RTE_MBUF_F_TX_L4_MASK and check the masked value to decide the correct way to calculate checksums. Not checking for RTE_MBUF_F_TX_L4_MASK results in incorrect RNDIS packets sent to VSP and incorrect checksums calculated by the VSP. Fixes: 4e9c73e96e ("net/netvsc: add Hyper-V network device") Cc: sta...@dpdk.org Signed-off-by: Long Li --- drivers/net/netvsc/hn_rxtx.c | 13 + 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/drivers/net/netvsc/hn_rxtx.c b/drivers/net/netvsc/hn_rxtx.c index 028f176c7e..34f40be5b8 100644 --- a/drivers/net/netvsc/hn_rxtx.c +++ b/drivers/net/netvsc/hn_rxtx.c @@ -1348,8 +1348,11 @@ static void hn_encap(struct rndis_packet_msg *pkt, *pi_data = NDIS_LSO2_INFO_MAKEIPV4(hlen, m->tso_segsz); } - } else if (m->ol_flags & - (RTE_MBUF_F_TX_TCP_CKSUM | RTE_MBUF_F_TX_UDP_CKSUM | RTE_MBUF_F_TX_IP_CKSUM)) { + } else if ((m->ol_flags & RTE_MBUF_F_TX_L4_MASK) == + RTE_MBUF_F_TX_TCP_CKSUM || + (m->ol_flags & RTE_MBUF_F_TX_L4_MASK) == + RTE_MBUF_F_TX_UDP_CKSUM || + (m->ol_flags & RTE_MBUF_F_TX_IP_CKSUM)) { As far as I can see following drivers also has similar issue, can maintainers (cc'ed) of below drivers check: bnxt dpaa hnic ionic liquidio mlx4 mvneta mvpp2 qede
Re: [Patch v2] net/netvsc: fix the calculation of checksums based on mbuf flag
On Tue, Apr 26, 2022 at 2:57 PM Ferruh Yigit wrote: > > On 3/24/2022 5:46 PM, lon...@linuxonhyperv.com wrote: > > From: Long Li > > > > The netvsc should use RTE_MBUF_F_TX_L4_MASK and check the masked value to > > decide the correct way to calculate checksums. > > > > Not checking for RTE_MBUF_F_TX_L4_MASK results in incorrect RNDIS packets > > sent to VSP and incorrect checksums calculated by the VSP. > > > > Fixes: 4e9c73e96e ("net/netvsc: add Hyper-V network device") > > Cc: sta...@dpdk.org > > Signed-off-by: Long Li > > --- > > drivers/net/netvsc/hn_rxtx.c | 13 + > > 1 file changed, 9 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/net/netvsc/hn_rxtx.c b/drivers/net/netvsc/hn_rxtx.c > > index 028f176c7e..34f40be5b8 100644 > > --- a/drivers/net/netvsc/hn_rxtx.c > > +++ b/drivers/net/netvsc/hn_rxtx.c > > @@ -1348,8 +1348,11 @@ static void hn_encap(struct rndis_packet_msg *pkt, > > *pi_data = NDIS_LSO2_INFO_MAKEIPV4(hlen, > > m->tso_segsz); > > } > > - } else if (m->ol_flags & > > -(RTE_MBUF_F_TX_TCP_CKSUM | RTE_MBUF_F_TX_UDP_CKSUM | > > RTE_MBUF_F_TX_IP_CKSUM)) { > > + } else if ((m->ol_flags & RTE_MBUF_F_TX_L4_MASK) == > > + RTE_MBUF_F_TX_TCP_CKSUM || > > +(m->ol_flags & RTE_MBUF_F_TX_L4_MASK) == > > + RTE_MBUF_F_TX_UDP_CKSUM || > > +(m->ol_flags & RTE_MBUF_F_TX_IP_CKSUM)) { > > As far as I can see following drivers also has similar issue, can > maintainers (cc'ed) of below drivers check: > > bnxt ACK > dpaa > hnic > ionic > liquidio > mlx4 > mvneta > mvpp2 > qede smime.p7s Description: S/MIME Cryptographic Signature
Re: [Patch v2] net/netvsc: report correct stats values
On Tue, 26 Apr 2022 22:56:14 +0100 Ferruh Yigit wrote: > > if (i < RTE_ETHDEV_QUEUE_STAT_CNTRS) { > > - stats->q_opackets[i] = txq->stats.packets; > > - stats->q_obytes[i] = txq->stats.bytes; > > + stats->q_opackets[i] += txq->stats.packets; > > + stats->q_obytes[i] += txq->stats.bytes; > > This is per queue stats, 'stats->q_opackets[i]', in next iteration of > the loop, 'i' will be increased and 'txq' will be updated, so as far as > I can see the above change has no affect. Agree, that is why it was just assignment originally.
RE: [PATCH v2] net/ice: optimize max queue number calculation
> -Original Message- > From: Wu, Wenjun1 > Sent: Tuesday, April 26, 2022 9:14 PM > To: Zhang, Qi Z ; Yang, Qiming > Cc: dev@dpdk.org > Subject: RE: [PATCH v2] net/ice: optimize max queue number calculation > > > > > -Original Message- > > From: Zhang, Qi Z > > Sent: Friday, April 8, 2022 7:24 PM > > To: Yang, Qiming ; Wu, Wenjun1 > > > > Cc: dev@dpdk.org; Zhang, Qi Z > > Subject: [PATCH v2] net/ice: optimize max queue number calculation > > > > Remove the limitation that max queue pair number must be 2^n. > > With this patch, even on a 8 ports device, the max queue pair number > > increased from 128 to 254. > > > > Signed-off-by: Qi Zhang > > --- > > > > v2: > > - fix check patch warning > > > > drivers/net/ice/ice_ethdev.c | 24 > > 1 file changed, 20 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/net/ice/ice_ethdev.c > > b/drivers/net/ice/ice_ethdev.c index > > 73e550f5fb..ff2b3e45d9 100644 > > --- a/drivers/net/ice/ice_ethdev.c > > +++ b/drivers/net/ice/ice_ethdev.c > > @@ -819,10 +819,26 @@ ice_vsi_config_tc_queue_mapping(struct ice_vsi > > *vsi, > > return -ENOTSUP; > > } > > > > - vsi->nb_qps = RTE_MIN(vsi->nb_qps, ICE_MAX_Q_PER_TC); > > - fls = (vsi->nb_qps == 0) ? 0 : rte_fls_u32(vsi->nb_qps) - 1; > > - /* Adjust the queue number to actual queues that can be applied */ > > - vsi->nb_qps = (vsi->nb_qps == 0) ? 0 : 0x1 << fls; > > + /* vector 0 is reserved and 1 vector for ctrl vsi */ > > + if (vsi->adapter->hw.func_caps.common_cap.num_msix_vectors < 2) > > + vsi->nb_qps = 0; > > + else > > + vsi->nb_qps = RTE_MIN > > + ((uint16_t)vsi->adapter- > > >hw.func_caps.common_cap.num_msix_vectors - 2, > > + RTE_MIN(vsi->nb_qps, ICE_MAX_Q_PER_TC)); > > + > > + /* nb_qps(hex) -> fls */ > > + /* -> 0 */ > > + /* 0001 -> 0 */ > > + /* 0002 -> 1 */ > > + /* 0003 ~ 0004 -> 2 */ > > + /* 0005 ~ 0008 -> 3 */ > > + /* 0009 ~ 0010 -> 4 */ > > + /* 0011 ~ 0020 -> 5 */ > > + /* 0021 ~ 0040 -> 6 */ > > + /* 0041 ~ 0080 -> 7 */ > > + /* 0081 ~ 0100 -> 8 */ > > + fls = (vsi->nb_qps == 0) ? 0 : rte_fls_u32(vsi->nb_qps - 1); > > > > qp_idx = 0; > > /* Set tc and queue mapping with VSI */ > > -- > > 2.26.2 > > Acked-by: Wenjun Wu < wenjun1...@intel.com> > > Thanks > Wenjun > Applied to dpdk-next-net-intel. Thanks Qi
RE: [RFC] eal: allow worker lcore stacks to be allocated from hugepage memory
> > Add support for using hugepages for worker lcore stack memory. The intent is > to improve performance by reducing stack memory related TLB misses and also > by using memory local to the NUMA node of each lcore. This is a good idea. Have you measured any performance differences with this patch? What kind of benefits do you see? > > Platforms desiring to make use of this capability must enable the associated > option flag and stack size settings in platform config files. > --- > lib/eal/linux/eal.c | 39 +++ > 1 file changed, 39 insertions(+) > > diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c index > 1ef263434a..4e1e5b6915 100644 > --- a/lib/eal/linux/eal.c > +++ b/lib/eal/linux/eal.c > @@ -1143,9 +1143,48 @@ rte_eal_init(int argc, char **argv) > > lcore_config[i].state = WAIT; > > +#ifdef RTE_EAL_NUMA_AWARE_LCORE_STACK > + /* Allocate NUMA aware stack memory and set pthread > attributes */ > + pthread_attr_t attr; > + void *stack_ptr = > + rte_zmalloc_socket("lcore_stack", > + > RTE_EAL_NUMA_AWARE_LCORE_STACK_SIZE, > + > RTE_EAL_NUMA_AWARE_LCORE_STACK_SIZE, > +rte_lcore_to_socket_id(i)); > + > + if (stack_ptr == NULL) { > + rte_eal_init_alert("Cannot allocate stack memory"); May be worth adding more details to the error message, like lcore id. > + rte_errno = ENOMEM; > + return -1; > + } > + > + if (pthread_attr_init(&attr) != 0) { > + rte_eal_init_alert("Cannot init pthread attributes"); > + rte_errno = EINVAL; EFAULT would be better. > + return -1; > + } > + if (pthread_attr_setstack(&attr, > + stack_ptr, > + > RTE_EAL_NUMA_AWARE_LCORE_STACK_SIZE) != 0) { > + rte_eal_init_alert("Cannot set pthread stack > attributes"); > + rte_errno = ENOTSUP; EFAULT would be better. > + return -1; > + } > + > + /* create a thread for each lcore */ > + ret = pthread_create(&lcore_config[i].thread_id, &attr, > + eal_thread_loop, (void *)(uintptr_t)i); > + > + if (pthread_attr_destroy(&attr) != 0) { > + rte_eal_init_alert("Cannot destroy pthread > attributes"); > + rte_errno = EFAULT; > + return -1; > + } > +#else > /* create a thread for each lcore */ > ret = pthread_create(&lcore_config[i].thread_id, NULL, >eal_thread_loop, (void *)(uintptr_t)i); > +#endif > if (ret != 0) > rte_panic("Cannot create thread\n"); > > -- > 2.17.1
RE: [PATCH] doc: update matching versions in ice guide
> -Original Message- > From: Yang, Qiming > Sent: Tuesday, April 26, 2022 1:36 PM > To: dev@dpdk.org > Cc: Zhang, Qi Z ; Yang, Qiming > ; sta...@dpdk.org > Subject: [PATCH] doc: update matching versions in ice guide > > Add recommended matching list for ice PMD in DPDK 22.03. > > Cc: sta...@dpdk.org > > Signed-off-by: Qiming Yang > --- > doc/guides/nics/ice.rst | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/doc/guides/nics/ice.rst b/doc/guides/nics/ice.rst index > a1780c46c3..6b903b9bbc 100644 > --- a/doc/guides/nics/ice.rst > +++ b/doc/guides/nics/ice.rst > @@ -62,6 +62,8 @@ The detailed information can refer to chapter Tested > Platforms/Tested NICs in re > > +---+---+-+---+--+---+ > |21.11 | 1.7.16| 1.3.27 | 1.3.31 |1.3.7 > |3.1| > > +---+---+-+---+--+---+ > + |22.03 | 1.8.3 | 1.3.28 | 1.3.35 |1.3.8 > |3.2| > + > + +---+---+-+---+--- > + ---+---+ > > Pre-Installation Configuration > -- > -- > 2.17.1 Acked-by: Qi Zhang Applied to dpdk-next-net-intel. Thanks Qi
RE: [DPDK v4] net/ixgbe: promote MDIO API
> -Original Message- > From: Ray Kinsella > Sent: Tuesday, April 26, 2022 6:12 PM > To: Zeng, ZhichaoX > Cc: dev@dpdk.org; Yang, Qiming ; Wang, Haiyue > ; David Marchand > Subject: Re: [DPDK v4] net/ixgbe: promote MDIO API > > > Zeng, ZhichaoX writes: > > > Hi, Ray, David: > > > > What is your opinion on this patch? > > > > Regards, > > Zhichao > > > > -Original Message- > > From: Zeng, ZhichaoX > > Sent: Tuesday, April 19, 2022 7:06 PM > > To: dev@dpdk.org > > Cc: Yang, Qiming ; Wang, Haiyue > ; m...@ashroe.eu; Zeng, ZhichaoX > > > Subject: [DPDK v4] net/ixgbe: promote MDIO API > > > > From: Zhichao Zeng > > > > Promote the MDIO APIs to be stable. > > > > Signed-off-by: Zhichao Zeng > > --- > > drivers/net/ixgbe/rte_pmd_ixgbe.h | 5 - > > drivers/net/ixgbe/version.map | 10 +- > > 2 files changed, 5 insertions(+), 10 deletions(-) > > > > Acked-by: Ray Kinsella Applied to dpdk-next-net-intel. Thanks Qi
RE: [PATCH v7 0/9] Enable ETS-based TX QoS on PF
Hi, > -Original Message- > From: Wu, Wenjun1 > Sent: 2022年4月22日 8:58 > To: dev@dpdk.org; Yang, Qiming ; Zhang, Qi Z > > Subject: [PATCH v7 0/9] Enable ETS-based TX QoS on PF > > This patch set enables ETS-based TX QoS on PF. It is supported to configure > bandwidth and priority in both queue and queue group level, and weight > only in queue level. > > v2: fix code style issue. > v3: fix uninitialization issue. > v4: fix logical issue. > v5: fix CI testing issue. Add explicit cast. > v6: add release note. > v7: merge the release note with the previous patch. > > Ting Xu (1): > net/ice: support queue bandwidth limit > > Wenjun Wu (8): > net/ice/base: fix dead lock issue when getting node from ID type > net/ice/base: support priority configuration of the exact node > net/ice/base: support queue BW allocation configuration > net/ice: support queue group bandwidth limit > net/ice: support queue priority configuration > net/ice: support queue weight configuration > net/ice: support queue group priority configuration > net/ice: add warning log for unsupported configuration > > doc/guides/rel_notes/release_22_07.rst | 4 + > drivers/net/ice/base/ice_sched.c | 89 ++- > drivers/net/ice/base/ice_sched.h | 6 + > drivers/net/ice/ice_ethdev.c | 19 + > drivers/net/ice/ice_ethdev.h | 55 ++ > drivers/net/ice/ice_tm.c | 844 + > drivers/net/ice/meson.build| 1 + > 7 files changed, 1016 insertions(+), 2 deletions(-) create mode 100644 > drivers/net/ice/ice_tm.c > > -- > 2.25.1 Acked-by: Qiming Yang
RE: [PATCH v6 07/16] examples/vdpa: add vDPA blk support in example
Hi Chenbo, Thanks for your reply. My reply is inline. > -Original Message- > From: Xia, Chenbo > Sent: Monday, April 25, 2022 9:39 PM > To: Pei, Andy ; dev@dpdk.org > Cc: maxime.coque...@redhat.com; Cao, Gang ; Liu, > Changpeng > Subject: RE: [PATCH v6 07/16] examples/vdpa: add vDPA blk support in > example > > Hi Andy, > > > -Original Message- > > From: Pei, Andy > > Sent: Thursday, April 21, 2022 4:34 PM > > To: dev@dpdk.org > > Cc: Xia, Chenbo ; maxime.coque...@redhat.com; > > Cao, Gang ; Liu, Changpeng > > > > Subject: [PATCH v6 07/16] examples/vdpa: add vDPA blk support in > > example > > > > Add virtio blk device support to vDPA example. > > > > Signed-off-by: Andy Pei > > --- > > examples/vdpa/main.c | 61 +- > > examples/vdpa/vdpa_blk_compact.h | 72 + > > examples/vdpa/vhost_user.h | 169 > > +++ > > 3 files changed, 301 insertions(+), 1 deletion(-) create mode 100644 > > examples/vdpa/vdpa_blk_compact.h create mode 100644 > > examples/vdpa/vhost_user.h > > > > diff --git a/examples/vdpa/main.c b/examples/vdpa/main.c index > > 5ab0765..1c809ab 100644 > > --- a/examples/vdpa/main.c > > +++ b/examples/vdpa/main.c > > @@ -20,6 +20,7 @@ > > #include > > #include > > #include > > +#include "vdpa_blk_compact.h" > > > > #define MAX_PATH_LEN 128 > > #define MAX_VDPA_SAMPLE_PORTS 1024 > > @@ -41,6 +42,7 @@ struct vdpa_port { > > static int devcnt; > > static int interactive; > > static int client_mode; > > +static int isblk; > > > > /* display usage */ > > static void > > @@ -49,7 +51,8 @@ struct vdpa_port { > > printf("Usage: %s [EAL options] -- " > > " --interactive|-i: run in interactive > > mode.\n" > > " --iface : specify the path prefix > of > > the socket files, e.g. /tmp/vhost-user-.\n" > > -" --client: register a vhost-user socket > as > > client mode.\n", > > +" --client: register a vhost-user socket > as > > client mode.\n" > > +" --isblk: device is a block device, e.g. > > virtio_blk device.\n", > > prgname); > > } > > > > @@ -61,6 +64,7 @@ struct vdpa_port { > > {"iface", required_argument, NULL, 0}, > > {"interactive", no_argument, &interactive, 1}, > > {"client", no_argument, &client_mode, 1}, > > + {"isblk", no_argument, &isblk, 1}, > > I think a new API for get_device_type will be better than asking user to > specify the device type. > Good suggestion. I will send out a version of patch set and try to do this. > > {NULL, 0, 0, 0}, > > }; > > int opt, idx; > > @@ -159,6 +163,52 @@ struct vdpa_port { }; > > > > static int > > +vdpa_blk_device_set_features_and_protocol(const char *path) { > > + uint64_t protocol_features = 0; > > + int ret; > > + > > + ret = rte_vhost_driver_set_features(path, > VHOST_BLK_FEATURES_BASE); > > + if (ret != 0) { > > + RTE_LOG(ERR, VDPA, > > + "rte_vhost_driver_set_features for %s failed.\n", > > + path); > > + goto out; > > + } > > + > > + ret = rte_vhost_driver_disable_features(path, > > + VHOST_VDPA_BLK_DISABLED_FEATURES); > > + if (ret != 0) { > > + RTE_LOG(ERR, VDPA, > > + "rte_vhost_driver_disable_features for %s failed.\n", > > + path); > > + goto out; > > + } > > + > > + ret = rte_vhost_driver_get_protocol_features(path, > > &protocol_features); > > + if (ret != 0) { > > + RTE_LOG(ERR, VDPA, > > + "rte_vhost_driver_get_protocol_features for %s > > failed.\n", > > + path); > > + goto out; > > + } > > + > > + protocol_features |= (1ULL << VHOST_USER_PROTOCOL_F_CONFIG); > > + protocol_features |= (1ULL << > VHOST_USER_PROTOCOL_F_LOG_SHMFD); > > + > > + ret = rte_vhost_driver_set_protocol_features(path, > > protocol_features); > > + if (ret != 0) { > > + RTE_LOG(ERR, VDPA, > > + "rte_vhost_driver_set_protocol_features for %s > > failed.\n", > > + path); > > + goto out; > > + } > > + > > +out: > > + return ret; > > +} > > + > > +static int > > start_vdpa(struct vdpa_port *vport) > > { > > int ret; > > @@ -192,6 +242,15 @@ struct vdpa_port { > > "attach vdpa device failed: %s\n", > > socket_path); > > > > + if (isblk) { > > + RTE_LOG(NOTICE, VDPA, "is a blk device\n"); > > + ret = > vdpa_blk_device_set_features_and_protocol(socket_path); > > + if (ret != 0) > > + rte_exit(EXIT_FAILURE, > > + "set vhost blk driver features and protocol > > features failed: %s\n", > > +
RE: [PATCH v6 03/16] vhost: add vhost msg support
Hi Chenbo, Thanks for your reply. My reply is inline. > -Original Message- > From: Xia, Chenbo > Sent: Tuesday, April 26, 2022 5:17 PM > To: Pei, Andy ; dev@dpdk.org > Cc: maxime.coque...@redhat.com; Cao, Gang ; Liu, > Changpeng > Subject: RE: [PATCH v6 03/16] vhost: add vhost msg support > > > -Original Message- > > From: Pei, Andy > > Sent: Tuesday, April 26, 2022 4:56 PM > > To: Xia, Chenbo ; dev@dpdk.org > > Cc: maxime.coque...@redhat.com; Cao, Gang ; Liu, > > Changpeng > > Subject: RE: [PATCH v6 03/16] vhost: add vhost msg support > > > > HI Chenbo, > > > > Thanks for your reply. > > My reply is inline. > > > > > -Original Message- > > > From: Xia, Chenbo > > > Sent: Monday, April 25, 2022 8:42 PM > > > To: Pei, Andy ; dev@dpdk.org > > > Cc: maxime.coque...@redhat.com; Cao, Gang ; Liu, > > > Changpeng > > > Subject: RE: [PATCH v6 03/16] vhost: add vhost msg support > > > > > > Hi Andy, > > > > > > > -Original Message- > > > > From: Pei, Andy > > > > Sent: Thursday, April 21, 2022 4:34 PM > > > > To: dev@dpdk.org > > > > Cc: Xia, Chenbo ; > > > > maxime.coque...@redhat.com; Cao, Gang ; Liu, > > > > Changpeng > > > > Subject: [PATCH v6 03/16] vhost: add vhost msg support > > > > > > > > Add support for VHOST_USER_GET_CONFIG and > > > VHOST_USER_SET_CONFIG. > > > > VHOST_USER_GET_CONFIG and VHOST_USER_SET_CONFIG message is > only > > > > supported by virtio blk VDPA device. > > > > > > > > Signed-off-by: Andy Pei > > > > --- > > > > lib/vhost/vhost_user.c | 69 > > > > ++ > > > > lib/vhost/vhost_user.h | 13 ++ > > > > 2 files changed, 82 insertions(+) > > > > > > > > diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c index > > > > 1d39067..3780804 100644 > > > > --- a/lib/vhost/vhost_user.c > > > > +++ b/lib/vhost/vhost_user.c > > > > @@ -80,6 +80,8 @@ > > > > [VHOST_USER_NET_SET_MTU] = "VHOST_USER_NET_SET_MTU", > > > > [VHOST_USER_SET_SLAVE_REQ_FD] = > > > "VHOST_USER_SET_SLAVE_REQ_FD", > > > > [VHOST_USER_IOTLB_MSG] = "VHOST_USER_IOTLB_MSG", > > > > +[VHOST_USER_GET_CONFIG] = "VHOST_USER_GET_CONFIG", > > > > +[VHOST_USER_SET_CONFIG] = "VHOST_USER_SET_CONFIG", > > > > [VHOST_USER_CRYPTO_CREATE_SESS] = > > > "VHOST_USER_CRYPTO_CREATE_SESS", > > > > [VHOST_USER_CRYPTO_CLOSE_SESS] = > > > "VHOST_USER_CRYPTO_CLOSE_SESS", > > > > [VHOST_USER_POSTCOPY_ADVISE] = > > > "VHOST_USER_POSTCOPY_ADVISE", @@ > > > > -2542,6 +2544,71 @@ static int is_vring_iotlb(struct virtio_net > > > > *dev, } > > > > > > > > static int > > > > +vhost_user_get_config(struct virtio_net **pdev, struct > > > > +vhu_msg_context *ctx, int main_fd __rte_unused) { struct > > > > +virtio_net *dev = *pdev; struct rte_vdpa_device *vdpa_dev = > > > > +dev->vdpa_dev; int ret = 0; > > > > + > > > > +if (vdpa_dev->ops->get_config) { > > > > +ret = vdpa_dev->ops->get_config(dev->vid, > > > > + ctx->msg.payload.cfg.region, > > > > + ctx->msg.payload.cfg.size); > > > > +if (ret != 0) { > > > > +ctx->msg.size = 0; > > > > +VHOST_LOG_CONFIG(ERR, > > > > + "(%s) get_config() return error!\n", > > > > + dev->ifname); > > > > +} > > > > +} else { > > > > +VHOST_LOG_CONFIG(ERR, "(%s) get_config() not > > > supportted!\n", > > > > > > Supported > > > > > I will send out a new version to fix this. > > > > + dev->ifname); > > > > +} > > > > + > > > > +return RTE_VHOST_MSG_RESULT_REPLY; } > > > > + > > > > +static int > > > > +vhost_user_set_config(struct virtio_net **pdev, struct > > > > +vhu_msg_context *ctx, int main_fd __rte_unused) { struct > > > > +virtio_net *dev = *pdev; struct rte_vdpa_device *vdpa_dev = > > > > +dev->vdpa_dev; int ret = 0; > > > > + > > > > +if (ctx->msg.size != sizeof(struct vhost_user_config)) { > > > > > > I think you should do sanity check on payload.cfg.size and make sure > > it's > > > smaller than VHOST_USER_MAX_CONFIG_SIZE > > > > > > and same check for offset > > > > > I think payload.cfg.size can be smaller than or equal to > > VHOST_USER_MAX_CONFIG_SIZE. > > payload.cfg.ofset can be smaller than or equal to > > VHOST_USER_MAX_CONFIG_SIZE as well > > After double check: offset is the config space offset, so this should be > checked in vdpa driver. Size check on vhost lib layer should be just <= > MAX_you_defined > OK. > Thanks, > Chenbo > > > > > > > +VHOST_LOG_CONFIG(ERR, > > > > +"(%s) invalid set config msg size: %"PRId32" != %d\n", > > > > +dev->ifname, ctx->msg.size, > > > > > > Based on you will change the log too, payload.cfg.size is uint32_t, > > > so > > PRId32 -> > > > PRIu32 > > > > > > > +(int)sizeof(struct vhost_user_config)); > > > > > > So this can be %u > > > > > Sure. > > > > +goto OUT; > > > > +} > > > > + > > > > +if (vdpa_dev->ops->set_config) { > > > > +ret = vdpa_dev->ops->set_config(dev->vid, > > > > +ctx->msg.payload.cfg.region, > > > > +ctx->msg.payload.cfg.offset, > > > > +ctx->msg.payload.cfg.size, > > > > +ctx->msg.payload.cfg.flags); > > > > +if (ret) > > > > +VHOST
[Bug 1001] [meson test] Debug-tests/dump_* all meson test time out because commands are not registered to command list
https://bugs.dpdk.org/show_bug.cgi?id=1001 Bug ID: 1001 Summary: [meson test] Debug-tests/dump_* all meson test time out because commands are not registered to command list Product: DPDK Version: unspecified Hardware: All OS: All Status: UNCONFIRMED Severity: normal Priority: Normal Component: meson Assignee: dev@dpdk.org Reporter: weiyuanx...@intel.com Target Milestone: --- [Environment] DPDK version: Use make show version or for a non-released version: git remote -v && git show-ref --heads 22.07.0-rc0 55ae8965bf8eecd5ebec36663bb0f36018abf64b OS: Red Hat Enterprise Linux 8.4 (Ootpa)/4.18.0-305.19.1.el8_4.x86_64 Compiler: gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-4) Hardware platform: Intel(R) Xeon(R) Gold 6252N CPU @ 2.30GHz [Test Setup] Steps to reproduce List the steps to reproduce the issue. 1. Use the following command to build DPDK: CC=gcc meson -Denable_kmods=True -Dlibdir=lib --default-library=static x86_64-native-linuxapp-gcc/ ninja -C x86_64-native-linuxapp-gcc/ 2. Execute the following command in the dpdk directory. meson test -C x86_64-native-linuxapp-gcc dump_struct_sizes [show the output] root@localhost dpdk]# meson test -C x86_64-native-linuxapp-gcc dump_struct_sizes ninja: Entering directory `/root/dpdk/x86_64-native-linuxapp-gcc' ninja: no work to do. 1/1 DPDK:debug-tests / dump_struct_sizesTIMEOUT600.02s killed by signal 15 SIGTERM >>> MALLOC_PERTURB_=155 DPDK_TEST=dump_struct_sizes >>> /root/dpdk/x86_64-native-linuxapp-gcc/app/test/dpdk-test Ok: 0 Expected Fail: 0 Fail: 0 Unexpected Pass:0 Skipped:0 Timeout:1 Full log written to /root/dpdk/x86_64-native-linuxapp-gcc/meson-logs/testlog.txt show log from the testlog.txt. 1/1 DPDK:debug-tests / dump_struct_sizes TIMEOUT600.02s killed by signal 15 SIGTERM 05:28:14 MALLOC_PERTURB_=155 DPDK_TEST=dump_struct_sizes /root/dpdk/x86_64-native-linuxapp-gcc/app/test/dpdk-test --- output --- stdout: RTE>> stderr: EAL: Detected CPU lcores: 96 EAL: Detected NUMA nodes: 2 EAL: Detected static linkage of DPDK EAL: Multi-process socket /var/run/dpdk/rte/mp_socket EAL: Selected IOVA mode 'VA' EAL: 1024 hugepages of size 2097152 reserved, but no mounted hugetlbfs found for that size APP: HPET is not enabled, using TSC as default timer APP: Invalid DPDK_TEST value 'dump_struct_sizes' -- [Expected Result] Test ok. [Affected Test cases] DPDK:debug-tests / dump_struct_sizes DPDK:debug-tests / dump_mempool DPDK:debug-tests / dump_malloc_stats DPDK:debug-tests / dump_devargs DPDK:debug-tests / dump_log_types DPDK:debug-tests / dump_ring DPDK:debug-tests / dump_physmem DPDK:debug-tests / dump_memzone -- You are receiving this mail because: You are the assignee for the bug.
[Bug 1002] [meson test] Debug-tests/dump_* all meson test time out because commands are not registered to command list
https://bugs.dpdk.org/show_bug.cgi?id=1002 Bug ID: 1002 Summary: [meson test] Debug-tests/dump_* all meson test time out because commands are not registered to command list Product: DPDK Version: unspecified Hardware: All OS: All Status: UNCONFIRMED Severity: normal Priority: Normal Component: meson Assignee: dev@dpdk.org Reporter: weiyuanx...@intel.com Target Milestone: --- [Environment] DPDK version: Use make show version or for a non-released version: git remote -v && git show-ref --heads 22.07.0-rc0 55ae8965bf8eecd5ebec36663bb0f36018abf64b OS: Red Hat Enterprise Linux 8.4 (Ootpa)/4.18.0-305.19.1.el8_4.x86_64 Compiler: gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-4) Hardware platform: Intel(R) Xeon(R) Gold 6252N CPU @ 2.30GHz [Test Setup] Steps to reproduce List the steps to reproduce the issue. 1. Use the following command to build DPDK: CC=gcc meson -Denable_kmods=True -Dlibdir=lib --default-library=static x86_64-native-linuxapp-gcc/ ninja -C x86_64-native-linuxapp-gcc/ 2. Execute the following command in the dpdk directory. meson test -C x86_64-native-linuxapp-gcc dump_struct_sizes [show the output] root@localhost dpdk]# meson test -C x86_64-native-linuxapp-gcc dump_struct_sizes ninja: Entering directory `/root/dpdk/x86_64-native-linuxapp-gcc' ninja: no work to do. 1/1 DPDK:debug-tests / dump_struct_sizesTIMEOUT600.02s killed by signal 15 SIGTERM >>> MALLOC_PERTURB_=155 DPDK_TEST=dump_struct_sizes >>> /root/dpdk/x86_64-native-linuxapp-gcc/app/test/dpdk-test Ok: 0 Expected Fail: 0 Fail: 0 Unexpected Pass:0 Skipped:0 Timeout:1 Full log written to /root/dpdk/x86_64-native-linuxapp-gcc/meson-logs/testlog.txt show log from the testlog.txt. 1/1 DPDK:debug-tests / dump_struct_sizes TIMEOUT600.02s killed by signal 15 SIGTERM 05:28:14 MALLOC_PERTURB_=155 DPDK_TEST=dump_struct_sizes /root/dpdk/x86_64-native-linuxapp-gcc/app/test/dpdk-test --- output --- stdout: RTE>> stderr: EAL: Detected CPU lcores: 96 EAL: Detected NUMA nodes: 2 EAL: Detected static linkage of DPDK EAL: Multi-process socket /var/run/dpdk/rte/mp_socket EAL: Selected IOVA mode 'VA' EAL: 1024 hugepages of size 2097152 reserved, but no mounted hugetlbfs found for that size APP: HPET is not enabled, using TSC as default timer APP: Invalid DPDK_TEST value 'dump_struct_sizes' -- [Expected Result] Test ok. [Affected Test cases] DPDK:debug-tests / dump_struct_sizes DPDK:debug-tests / dump_mempool DPDK:debug-tests / dump_malloc_stats DPDK:debug-tests / dump_devargs DPDK:debug-tests / dump_log_types DPDK:debug-tests / dump_ring DPDK:debug-tests / dump_physmem DPDK:debug-tests / dump_memzone -- You are receiving this mail because: You are the assignee for the bug.