[PATCH] ci: do not dump error logs in GHA containers

2022-04-26 Thread David Marchand
On error, the build logs are displayed in GHA console and logs unless
the GITHUB_WORKFLOW env variable is set.
However, containers in GHA do not automatically inherit this variable.
We could pass this variable in the container environment, but in the
end, dumping those logs is only for Travis which we don't really care
about anymore.

Let's make the linux-build.sh more generic and dump logs from Travis
yaml itself.

Fixes: b35c4b0aa2bc ("ci: add Fedora 35 container in GHA")

Signed-off-by: David Marchand 
---
 .ci/linux-build.sh | 19 ---
 .travis.yml|  4 
 2 files changed, 4 insertions(+), 19 deletions(-)

diff --git a/.ci/linux-build.sh b/.ci/linux-build.sh
index 774a1441bf..6a937611fa 100755
--- a/.ci/linux-build.sh
+++ b/.ci/linux-build.sh
@@ -3,25 +3,6 @@
 # Builds are run as root in containers, no need for sudo
 [ "$(id -u)" != '0' ] || alias sudo=
 
-on_error() {
-if [ $? = 0 ]; then
-exit
-fi
-FILES_TO_PRINT="build/meson-logs/testlog.txt"
-FILES_TO_PRINT="$FILES_TO_PRINT build/.ninja_log"
-FILES_TO_PRINT="$FILES_TO_PRINT build/meson-logs/meson-log.txt"
-FILES_TO_PRINT="$FILES_TO_PRINT build/gdb.log"
-
-for pr_file in $FILES_TO_PRINT; do
-if [ -e "$pr_file" ]; then
-cat "$pr_file"
-fi
-done
-}
-# We capture the error logs as artifacts in Github Actions, no need to dump
-# them via a EXIT handler.
-[ -n "$GITHUB_WORKFLOW" ] || trap on_error EXIT
-
 install_libabigail() {
 version=$1
 instdir=$2
diff --git a/.travis.yml b/.travis.yml
index 5f46dccb54..e4e70fa560 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -38,6 +38,10 @@ _doc_packages: &doc_packages
 
 before_install: ./.ci/${TRAVIS_OS_NAME}-setup.sh
 script: ./.ci/${TRAVIS_OS_NAME}-build.sh
+after_failure:
+- [ ! -e build/meson-logs/testlog.txt ] || cat build/meson-logs/testlog.txt
+- [ ! -e build/.ninja_log ] || cat build/.ninja_log
+- [ ! -e build/meson-logs/meson-log.txt ] || cat build/meson-logs/meson-log.txt
 
 env:
   global:
-- 
2.23.0



Re: [PATCH] ci: do not dump error logs in GHA containers

2022-04-26 Thread David Marchand
On Tue, Apr 26, 2022 at 9:09 AM David Marchand
 wrote:
>
> On error, the build logs are displayed in GHA console and logs unless
> the GITHUB_WORKFLOW env variable is set.
> However, containers in GHA do not automatically inherit this variable.
> We could pass this variable in the container environment, but in the
> end, dumping those logs is only for Travis which we don't really care
> about anymore.
>
> Let's make the linux-build.sh more generic and dump logs from Travis
> yaml itself.
>
> Fixes: b35c4b0aa2bc ("ci: add Fedora 35 container in GHA")
>
> Signed-off-by: David Marchand 

TBH, I did not test Travis by lack of interest (plus I don't want to
be bothered with their ui / credit stuff).
We could consider dropping Travis in the near future.

Opinions?


-- 
David Marchand



[PATCH 2/2] ci: add mingw cross compilation in GHA

2022-04-26 Thread David Marchand
Add mingw cross compilation in our public CI so that users with their
own github repository have a first level of checks for Windows compilation
before submitting to the mailing list.
This does not replace our better checks in other entities of the CI.

Only the helloworld example is compiled (same as what is tested in
test-meson-builds.sh).

Note: the mingw cross compilation toolchain (version 5.0) in Ubuntu
18.04 was broken (missing a ENOMSG definition).

Signed-off-by: David Marchand 
---
 .ci/linux-build.sh  | 22 +-
 .github/workflows/build.yml |  8 
 2 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/.ci/linux-build.sh b/.ci/linux-build.sh
index 30119b61ba..06dd20772d 100755
--- a/.ci/linux-build.sh
+++ b/.ci/linux-build.sh
@@ -37,16 +37,26 @@ catch_coredump() {
 return 1
 }
 
+cross_file=
+
 if [ "$AARCH64" = "true" ]; then
 if [ "${CC%%clang}" != "$CC" ]; then
-OPTS="$OPTS --cross-file config/arm/arm64_armv8_linux_clang_ubuntu2004"
+cross_file=config/arm/arm64_armv8_linux_clang_ubuntu2004
 else
-OPTS="$OPTS --cross-file config/arm/arm64_armv8_linux_gcc"
+cross_file=config/arm/arm64_armv8_linux_gcc
 fi
 fi
 
+if [ "$MINGW" = "true" ]; then
+cross_file=config/x86/cross-mingw
+fi
+
 if [ "$PPC64LE" = "true" ]; then
-OPTS="$OPTS --cross-file config/ppc/ppc64le-power8-linux-gcc-ubuntu2004"
+cross_file=config/ppc/ppc64le-power8-linux-gcc-ubuntu2004
+fi
+
+if [ -n "$cross_file" ]; then
+OPTS="$OPTS --cross-file $cross_file"
 fi
 
 if [ "$BUILD_DOCS" = "true" ]; then
@@ -59,7 +69,9 @@ if [ "$BUILD_32BIT" = "true" ]; then
 export PKG_CONFIG_LIBDIR="/usr/lib32/pkgconfig"
 fi
 
-if [ "$DEF_LIB" = "static" ]; then
+if [ "$MINGW" = "true" ]; then
+OPTS="$OPTS -Dexamples=helloworld"
+elif [ "$DEF_LIB" = "static" ]; then
 OPTS="$OPTS -Dexamples=l2fwd,l3fwd"
 else
 OPTS="$OPTS -Dexamples=all"
@@ -76,7 +88,7 @@ fi
 meson build --werror $OPTS
 ninja -C build
 
-if [ "$AARCH64" != "true" ] && [ "$PPC64LE" != "true" ]; then
+if [ -z "$cross_file" ]; then
 failed=
 configure_coredump
 devtools/test-null.sh || failed="true"
diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
index 812aa7055d..e2f94d786b 100644
--- a/.github/workflows/build.yml
+++ b/.github/workflows/build.yml
@@ -21,6 +21,7 @@ jobs:
   CC: ccache ${{ matrix.config.compiler }}
   DEF_LIB: ${{ matrix.config.library }}
   LIBABIGAIL_VERSION: libabigail-1.8
+  MINGW: ${{ matrix.config.cross == 'mingw' }}
   MINI: ${{ matrix.config.mini != '' }}
   PPC64LE: ${{ matrix.config.cross == 'ppc64le' }}
   REF_GIT_TAG: v22.03
@@ -52,6 +53,10 @@ jobs:
 compiler: gcc
 library: static
 cross: i386
+  - os: ubuntu-20.04
+compiler: gcc
+library: static
+cross: mingw
   - os: ubuntu-20.04
 compiler: gcc
 library: static
@@ -119,6 +124,9 @@ jobs:
   if: env.AARCH64 == 'true'
   run: sudo apt install -y gcc-aarch64-linux-gnu libc6-dev-arm64-cross
 pkg-config-aarch64-linux-gnu
+- name: Install mingw cross compiling packages
+  if: env.MINGW == 'true'
+  run: sudo apt install -y mingw-w64 mingw-w64-tools
 - name: Install ppc64le cross compiling packages
   if: env.PPC64LE == 'true'
   run: sudo apt install -y gcc-powerpc64le-linux-gnu 
libc6-dev-ppc64el-cross
-- 
2.23.0



[PATCH 1/2] ci: switch to Ubuntu 20.04

2022-04-26 Thread David Marchand
Ubuntu 18.04 is now rather old.
Besides, other entities in our CI are also testing this distribution.

Switch to a newer Ubuntu release and benefit from more recent
tool(chain)s: for example, net/cnxk now builds fine and can be
re-enabled.

Signed-off-by: David Marchand 
---
 .ci/linux-build.sh|  7 ++
 .github/workflows/build.yml   | 22 +--
 config/arm/arm64_armv8_linux_clang_ubuntu2004 |  1 +
 .../ppc/ppc64le-power8-linux-gcc-ubuntu2004   |  1 +
 4 files changed, 14 insertions(+), 17 deletions(-)
 create mode 12 config/arm/arm64_armv8_linux_clang_ubuntu2004
 create mode 12 config/ppc/ppc64le-power8-linux-gcc-ubuntu2004

diff --git a/.ci/linux-build.sh b/.ci/linux-build.sh
index 6a937611fa..30119b61ba 100755
--- a/.ci/linux-build.sh
+++ b/.ci/linux-build.sh
@@ -38,18 +38,15 @@ catch_coredump() {
 }
 
 if [ "$AARCH64" = "true" ]; then
-# Note: common/cnxk is disabled for Ubuntu 18.04
-# https://bugs.dpdk.org/show_bug.cgi?id=697
-OPTS="$OPTS -Ddisable_drivers=common/cnxk"
 if [ "${CC%%clang}" != "$CC" ]; then
-OPTS="$OPTS --cross-file config/arm/arm64_armv8_linux_clang_ubuntu1804"
+OPTS="$OPTS --cross-file config/arm/arm64_armv8_linux_clang_ubuntu2004"
 else
 OPTS="$OPTS --cross-file config/arm/arm64_armv8_linux_gcc"
 fi
 fi
 
 if [ "$PPC64LE" = "true" ]; then
-OPTS="$OPTS --cross-file config/ppc/ppc64le-power8-linux-gcc-ubuntu1804"
+OPTS="$OPTS --cross-file config/ppc/ppc64le-power8-linux-gcc-ubuntu2004"
 fi
 
 if [ "$BUILD_DOCS" = "true" ]; then
diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
index 22daaabb91..812aa7055d 100644
--- a/.github/workflows/build.yml
+++ b/.github/workflows/build.yml
@@ -30,43 +30,41 @@ jobs:
   fail-fast: false
   matrix:
 config:
-  - os: ubuntu-18.04
+  - os: ubuntu-20.04
 compiler: gcc
 library: static
-  - os: ubuntu-18.04
+  - os: ubuntu-20.04
 compiler: gcc
 library: shared
 mini: mini
-  - os: ubuntu-18.04
+  - os: ubuntu-20.04
 compiler: gcc
 library: shared
 checks: abi+doc+tests
-  - os: ubuntu-18.04
+  - os: ubuntu-20.04
 compiler: clang
 library: static
-  - os: ubuntu-18.04
+  - os: ubuntu-20.04
 compiler: clang
 library: shared
 checks: doc+tests
-  - os: ubuntu-18.04
+  - os: ubuntu-20.04
 compiler: gcc
 library: static
 cross: i386
-  # Note: common/cnxk is disabled for Ubuntu 18.04
-  # https://bugs.dpdk.org/show_bug.cgi?id=697
-  - os: ubuntu-18.04
+  - os: ubuntu-20.04
 compiler: gcc
 library: static
 cross: aarch64
-  - os: ubuntu-18.04
+  - os: ubuntu-20.04
 compiler: gcc
 library: shared
 cross: aarch64
-  - os: ubuntu-18.04
+  - os: ubuntu-20.04
 compiler: gcc
 library: static
 cross: ppc64le
-  - os: ubuntu-18.04
+  - os: ubuntu-20.04
 compiler: gcc
 library: shared
 cross: ppc64le
diff --git a/config/arm/arm64_armv8_linux_clang_ubuntu2004 
b/config/arm/arm64_armv8_linux_clang_ubuntu2004
new file mode 12
index 00..01f5b7643e
--- /dev/null
+++ b/config/arm/arm64_armv8_linux_clang_ubuntu2004
@@ -0,0 +1 @@
+arm64_armv8_linux_clang_ubuntu1804
\ No newline at end of file
diff --git a/config/ppc/ppc64le-power8-linux-gcc-ubuntu2004 
b/config/ppc/ppc64le-power8-linux-gcc-ubuntu2004
new file mode 12
index 00..9d6139a19b
--- /dev/null
+++ b/config/ppc/ppc64le-power8-linux-gcc-ubuntu2004
@@ -0,0 +1 @@
+ppc64le-power8-linux-gcc-ubuntu1804
\ No newline at end of file
-- 
2.23.0



Re: [PATCH v2] test/bpf: skip test if libpcap is unavailable

2022-04-26 Thread David Marchand
On Tue, Mar 22, 2022 at 8:12 AM Tyler Retzlaff
 wrote:
>
> test_bpf_convert is being conditionally registered depending on the
> presence of RTE_HAS_LIBPCAP except the UT unconditionally lists it as a
> test to run.
>
> when the UT runs test_bpf_convert test-dpdk can't find the registration
> and assumes the DPDK_TEST environment variable hasn't been defined
> resulting in test-dpdk dropping to interactive mode and subsequently
> waiting for the remainder of the UT fast-test timeout period before
> reporting the test as having timed out.
>
> * unconditionally register test_bpf_convert
> * if ! RTE_HAS_LIBPCAP provide a stub test_bpf_convert that reports the
>   test is skipped similar to that done with the test_bpf test.
>
> Fixes: 2eccf6afbea9 ("bpf: add function to convert classic BPF to DPDK BPF")
> Cc: sta...@dpdk.org
>
> Signed-off-by: Tyler Retzlaff 
Acked-by: Stephen Hemminger 
Acked-by: Konstantin Ananyev 

Applied, thanks.


-- 
David Marchand



Re: [PATCH v2] test/bpf: skip test if libpcap is unavailable

2022-04-26 Thread Tyler Retzlaff
On Tue, Apr 26, 2022 at 09:41:08AM +0200, David Marchand wrote:
> On Tue, Mar 22, 2022 at 8:12 AM Tyler Retzlaff
>  wrote:
> >
> > test_bpf_convert is being conditionally registered depending on the
> > presence of RTE_HAS_LIBPCAP except the UT unconditionally lists it as a
> > test to run.
> >
> > when the UT runs test_bpf_convert test-dpdk can't find the registration
> > and assumes the DPDK_TEST environment variable hasn't been defined
> > resulting in test-dpdk dropping to interactive mode and subsequently
> > waiting for the remainder of the UT fast-test timeout period before
> > reporting the test as having timed out.
> >
> > * unconditionally register test_bpf_convert
> > * if ! RTE_HAS_LIBPCAP provide a stub test_bpf_convert that reports the
> >   test is skipped similar to that done with the test_bpf test.
> >
> > Fixes: 2eccf6afbea9 ("bpf: add function to convert classic BPF to DPDK BPF")
> > Cc: sta...@dpdk.org
> >
> > Signed-off-by: Tyler Retzlaff 
> Acked-by: Stephen Hemminger 
> Acked-by: Konstantin Ananyev 
> 
> Applied, thanks.

thanks mate!

> 
> 
> -- 
> David Marchand


[PATCH v4 0/3] add eal functions for thread affinity and self

2022-04-26 Thread Tyler Retzlaff
this series provides basic dependencies for additional eal thread api
additions. series includes

* basic platform error number conversion.
* function to get current thread identifier.
* functions to get and set affinity with platform agnostic thread
  identifier.
* minimal unit test of get and set affinity demonstrating usage.

note: previous series introducing these functions is now superseded by
this series.
http://patches.dpdk.org/project/dpdk/list/?series=20472&state=*

v4:
* combine patch eal/windows: translate Windows errors to errno-style
  errors into eal: implement functions for get/set thread affinity
  patch. the former introduced static functions that were not used
  without eal: implement functions for get/set thread affinity which
  would cause a build break when applied standalone.
* remove struct tag from rte_thread_t struct typedef.
* remove rte_ prefix from rte_convert_cpuset_to_affinity static
  function.

v3:
* fix memory leak on eal_create_cpu_map error paths.

v2:
* add missing boilerplate comments warning of experimental api
  for rte_thread_{set,get}_affinity_by_id().
* don't break literal format string to log_early to improve
  searchability.
* fix multi-line comment style to match file.
* return ENOTSUP instead of EINVAL from rte_convert_cpuset_to_affinity()
  if cpus in set are not part of the same processor group and note
  limitation in commit message.
* expand series to include rte_thread_self().
* modify unit test to remove use of implementation detail and
  get thread identifier use added rte_thread_self().
* move literal value to rhs when using memcmp in RTE_TEST_ASSERT

Tyler Retzlaff (3):
  eal: add basic thread ID and current thread identifier API
  eal: implement functions for get/set thread affinity
  test/threads: add unit test for thread API

 app/test/meson.build |   2 +
 app/test/test_threads.c  |  89 ++
 lib/eal/include/rte_thread.h |  64 +
 lib/eal/unix/rte_thread.c|  27 ++
 lib/eal/version.map  |   5 ++
 lib/eal/windows/eal_lcore.c  | 181 +++--
 lib/eal/windows/eal_windows.h|  10 +++
 lib/eal/windows/include/rte_os.h |   2 +
 lib/eal/windows/rte_thread.c | 190 ++-
 9 files changed, 522 insertions(+), 48 deletions(-)
 create mode 100644 app/test/test_threads.c

-- 
1.8.3.1



[PATCH v4 1/3] eal: add basic thread ID and current thread identifier API

2022-04-26 Thread Tyler Retzlaff
Provide a portable type-safe thread identifier.
Provide rte_thread_self for obtaining current thread identifier.

Signed-off-by: Narcisa Vasile 
Signed-off-by: Tyler Retzlaff 
---
 lib/eal/include/rte_thread.h | 22 ++
 lib/eal/unix/rte_thread.c| 11 +++
 lib/eal/version.map  |  3 +++
 lib/eal/windows/rte_thread.c | 10 ++
 4 files changed, 46 insertions(+)

diff --git a/lib/eal/include/rte_thread.h b/lib/eal/include/rte_thread.h
index 8be8ed8..14478ba 100644
--- a/lib/eal/include/rte_thread.h
+++ b/lib/eal/include/rte_thread.h
@@ -1,7 +1,10 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2021 Mellanox Technologies, Ltd
+ * Copyright (C) 2022 Microsoft Corporation
  */
 
+#include 
+
 #include 
 #include 
 
@@ -21,10 +24,29 @@
 #endif
 
 /**
+ * Thread id descriptor.
+ */
+typedef struct {
+   uintptr_t opaque_id; /**< thread identifier */
+} rte_thread_t;
+
+/**
  * TLS key type, an opaque pointer.
  */
 typedef struct eal_tls_key *rte_thread_key;
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Get the id of the calling thread.
+ *
+ * @return
+ *   Return the thread id of the calling thread.
+ */
+__rte_experimental
+rte_thread_t rte_thread_self(void);
+
 #ifdef RTE_HAS_CPUSET
 
 /**
diff --git a/lib/eal/unix/rte_thread.c b/lib/eal/unix/rte_thread.c
index c34ede9..82e008f 100644
--- a/lib/eal/unix/rte_thread.c
+++ b/lib/eal/unix/rte_thread.c
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright 2021 Mellanox Technologies, Ltd
+ * Copyright (C) 2022 Microsoft Corporation
  */
 
 #include 
@@ -15,6 +16,16 @@ struct eal_tls_key {
pthread_key_t thread_index;
 };
 
+rte_thread_t
+rte_thread_self(void)
+{
+   rte_thread_t thread_id;
+
+   thread_id.opaque_id = (uintptr_t)pthread_self();
+
+   return thread_id;
+}
+
 int
 rte_thread_key_create(rte_thread_key *key, void (*destructor)(void *))
 {
diff --git a/lib/eal/version.map b/lib/eal/version.map
index b53eeb3..05ce8f9 100644
--- a/lib/eal/version.map
+++ b/lib/eal/version.map
@@ -420,6 +420,9 @@ EXPERIMENTAL {
rte_intr_instance_free;
rte_intr_type_get;
rte_intr_type_set;
+
+   # added in 22.07
+   rte_thread_self;
 };
 
 INTERNAL {
diff --git a/lib/eal/windows/rte_thread.c b/lib/eal/windows/rte_thread.c
index 667287c..59fed3c 100644
--- a/lib/eal/windows/rte_thread.c
+++ b/lib/eal/windows/rte_thread.c
@@ -11,6 +11,16 @@ struct eal_tls_key {
DWORD thread_index;
 };
 
+rte_thread_t
+rte_thread_self(void)
+{
+   rte_thread_t thread_id;
+
+   thread_id.opaque_id = GetCurrentThreadId();
+
+   return thread_id;
+}
+
 int
 rte_thread_key_create(rte_thread_key *key,
__rte_unused void (*destructor)(void *))
-- 
1.8.3.1



[PATCH v4 2/3] eal: implement functions for get/set thread affinity

2022-04-26 Thread Tyler Retzlaff
Implement functions for getting/setting thread affinity.
Threads can be pinned to specific cores by setting their
affinity attribute.

Windows error codes are translated to errno-style error codes.
The possible return values are chosen so that we have as
much semantic compatibility between platforms as possible.

note: convert_cpuset_to_affinity has a limitation that all cpus of
the set belong to the same processor group.

Signed-off-by: Narcisa Vasile 
Signed-off-by: Tyler Retzlaff 
---
 lib/eal/include/rte_thread.h |  42 +
 lib/eal/unix/rte_thread.c|  16 
 lib/eal/version.map  |   2 +
 lib/eal/windows/eal_lcore.c  | 181 +--
 lib/eal/windows/eal_windows.h|  10 +++
 lib/eal/windows/include/rte_os.h |   2 +
 lib/eal/windows/rte_thread.c | 180 +-
 7 files changed, 385 insertions(+), 48 deletions(-)

diff --git a/lib/eal/include/rte_thread.h b/lib/eal/include/rte_thread.h
index 14478ba..7888f7a 100644
--- a/lib/eal/include/rte_thread.h
+++ b/lib/eal/include/rte_thread.h
@@ -50,6 +50,48 @@
 #ifdef RTE_HAS_CPUSET
 
 /**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Set the affinity of thread 'thread_id' to the cpu set
+ * specified by 'cpuset'.
+ *
+ * @param thread_id
+ *Id of the thread for which to set the affinity.
+ *
+ * @param cpuset
+ *   Pointer to CPU affinity to set.
+ *
+ * @return
+ *   On success, return 0.
+ *   On failure, return a positive errno-style error number.
+ */
+__rte_experimental
+int rte_thread_set_affinity_by_id(rte_thread_t thread_id,
+   const rte_cpuset_t *cpuset);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Get the affinity of thread 'thread_id' and store it
+ * in 'cpuset'.
+ *
+ * @param thread_id
+ *Id of the thread for which to get the affinity.
+ *
+ * @param cpuset
+ *   Pointer for storing the affinity value.
+ *
+ * @return
+ *   On success, return 0.
+ *   On failure, return a positive errno-style error number.
+ */
+__rte_experimental
+int rte_thread_get_affinity_by_id(rte_thread_t thread_id,
+   rte_cpuset_t *cpuset);
+
+/**
  * Set core affinity of the current thread.
  * Support both EAL and non-EAL thread and update TLS.
  *
diff --git a/lib/eal/unix/rte_thread.c b/lib/eal/unix/rte_thread.c
index 82e008f..c64198f 100644
--- a/lib/eal/unix/rte_thread.c
+++ b/lib/eal/unix/rte_thread.c
@@ -100,3 +100,19 @@ struct eal_tls_key {
}
return pthread_getspecific(key->thread_index);
 }
+
+int
+rte_thread_set_affinity_by_id(rte_thread_t thread_id,
+   const rte_cpuset_t *cpuset)
+{
+   return pthread_setaffinity_np((pthread_t)thread_id.opaque_id,
+   sizeof(*cpuset), cpuset);
+}
+
+int
+rte_thread_get_affinity_by_id(rte_thread_t thread_id,
+   rte_cpuset_t *cpuset)
+{
+   return pthread_getaffinity_np((pthread_t)thread_id.opaque_id,
+   sizeof(*cpuset), cpuset);
+}
diff --git a/lib/eal/version.map b/lib/eal/version.map
index 05ce8f9..d49e30b 100644
--- a/lib/eal/version.map
+++ b/lib/eal/version.map
@@ -422,7 +422,9 @@ EXPERIMENTAL {
rte_intr_type_set;
 
# added in 22.07
+   rte_thread_get_affinity_by_id;
rte_thread_self;
+   rte_thread_set_affinity_by_id;
 };
 
 INTERNAL {
diff --git a/lib/eal/windows/eal_lcore.c b/lib/eal/windows/eal_lcore.c
index 476c2d2..aa2fad9 100644
--- a/lib/eal/windows/eal_lcore.c
+++ b/lib/eal/windows/eal_lcore.c
@@ -1,8 +1,8 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2019 Intel Corporation
+ * Copyright (C) 2022 Microsoft Corporation
  */
 
-#include 
 #include 
 #include 
 
@@ -27,13 +27,15 @@ struct socket_map {
 };
 
 struct cpu_map {
-   unsigned int socket_count;
unsigned int lcore_count;
+   unsigned int socket_count;
+   unsigned int cpu_count;
struct lcore_map lcores[RTE_MAX_LCORE];
struct socket_map sockets[RTE_MAX_NUMA_NODES];
+   GROUP_AFFINITY cpus[CPU_SETSIZE];
 };
 
-static struct cpu_map cpu_map = { 0 };
+static struct cpu_map cpu_map;
 
 /* eal_create_cpu_map() is called before logging is initialized */
 static void
@@ -47,13 +49,115 @@ struct cpu_map {
va_end(va);
 }
 
+static int
+eal_query_group_affinity(void)
+{
+   SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX *infos = NULL;
+   unsigned int *cpu_count = &cpu_map.cpu_count;
+   DWORD infos_size = 0;
+   int ret = 0;
+   USHORT group_count;
+   KAFFINITY affinity;
+   USHORT group_no;
+   unsigned int i;
+
+   if (!GetLogicalProcessorInformationEx(RelationGroup, NULL,
+   &infos_size)) {
+   DWORD error = GetLastError();
+   if (error != ERROR_INSUFFICIENT_BUFFER) {
+   log_early("Cannot get group information size, error 
%lu\n", error);
+   rte_errno = EINVAL;
+ 

[PATCH v4 3/3] test/threads: add unit test for thread API

2022-04-26 Thread Tyler Retzlaff
Establish unit test for testing thread api. Initial unit tests
for rte_thread_{get,set}_affinity_by_id().

Signed-off-by: Narcisa Vasile 
Signed-off-by: Tyler Retzlaff 
---
 app/test/meson.build|  2 ++
 app/test/test_threads.c | 89 +
 2 files changed, 91 insertions(+)
 create mode 100644 app/test/test_threads.c

diff --git a/app/test/meson.build b/app/test/meson.build
index 5fc1dd1..5a9d69b 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -133,6 +133,7 @@ test_sources = files(
 'test_tailq.c',
 'test_thash.c',
 'test_thash_perf.c',
+'test_threads.c',
 'test_timer.c',
 'test_timer_perf.c',
 'test_timer_racecond.c',
@@ -238,6 +239,7 @@ fast_tests = [
 ['reorder_autotest', true],
 ['service_autotest', true],
 ['thash_autotest', true],
+['threads_autotest', true],
 ['trace_autotest', true],
 ]
 
diff --git a/app/test/test_threads.c b/app/test/test_threads.c
new file mode 100644
index 000..0ca6745
--- /dev/null
+++ b/app/test/test_threads.c
@@ -0,0 +1,89 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (C) 2022 Microsoft Corporation
+ */
+
+#include 
+#include 
+
+#include 
+#include 
+
+#include "test.h"
+
+RTE_LOG_REGISTER(threads_logtype_test, test.threads, INFO);
+
+static uint32_t thread_id_ready;
+
+static void *
+thread_main(void *arg)
+{
+   *(rte_thread_t *)arg = rte_thread_self();
+   __atomic_store_n(&thread_id_ready, 1, __ATOMIC_RELEASE);
+
+   return NULL;
+}
+
+static int
+test_thread_affinity(void)
+{
+   pthread_t id;
+   rte_thread_t thread_id;
+
+   RTE_TEST_ASSERT(pthread_create(&id, NULL, thread_main, &thread_id) == 0,
+   "Failed to create thread");
+
+   while (__atomic_load_n(&thread_id_ready, __ATOMIC_ACQUIRE) == 0)
+   ;
+
+   rte_cpuset_t cpuset0;
+   RTE_TEST_ASSERT(rte_thread_get_affinity_by_id(thread_id, &cpuset0) == 0,
+   "Failed to get thread affinity");
+
+   rte_cpuset_t cpuset1;
+   RTE_TEST_ASSERT(rte_thread_get_affinity_by_id(thread_id, &cpuset1) == 0,
+   "Failed to get thread affinity");
+   RTE_TEST_ASSERT(memcmp(&cpuset0, &cpuset1, sizeof(rte_cpuset_t)) == 0,
+   "Affinity should be stable");
+
+   RTE_TEST_ASSERT(rte_thread_set_affinity_by_id(thread_id, &cpuset1) == 0,
+   "Failed to set thread affinity");
+   RTE_TEST_ASSERT(rte_thread_get_affinity_by_id(thread_id, &cpuset0) == 0,
+   "Failed to get thread affinity");
+   RTE_TEST_ASSERT(memcmp(&cpuset0, &cpuset1, sizeof(rte_cpuset_t)) == 0,
+   "Affinity should be stable");
+
+   size_t i;
+   for (i = 1; i < CPU_SETSIZE; i++)
+   if (CPU_ISSET(i, &cpuset0)) {
+   CPU_ZERO(&cpuset0);
+   CPU_SET(i, &cpuset0);
+
+   break;
+   }
+   RTE_TEST_ASSERT(rte_thread_set_affinity_by_id(thread_id, &cpuset0) == 0,
+   "Failed to set thread affinity");
+   RTE_TEST_ASSERT(rte_thread_get_affinity_by_id(thread_id, &cpuset1) == 0,
+   "Failed to get thread affinity");
+   RTE_TEST_ASSERT(memcmp(&cpuset0, &cpuset1, sizeof(rte_cpuset_t)) == 0,
+   "Affinity should be stable");
+
+   return 0;
+}
+
+static struct unit_test_suite threads_test_suite = {
+   .suite_name = "threads autotest",
+   .setup = NULL,
+   .teardown = NULL,
+   .unit_test_cases = {
+   TEST_CASE(test_thread_affinity),
+   TEST_CASES_END()
+   }
+};
+
+static int
+test_threads(void)
+{
+   return unit_test_suite_runner(&threads_test_suite);
+}
+
+REGISTER_TEST_COMMAND(threads_autotest, test_threads);
-- 
1.8.3.1



RE: [PATCH v6 03/16] vhost: add vhost msg support

2022-04-26 Thread Pei, Andy
HI David,

Thanks for your reply.
I will send out a version to address that.

> -Original Message-
> From: David Marchand 
> Sent: Monday, April 25, 2022 9:05 PM
> To: Pei, Andy 
> Cc: dev ; Xia, Chenbo ; Maxime
> Coquelin ; Cao, Gang
> ; Liu, Changpeng 
> Subject: Re: [PATCH v6 03/16] vhost: add vhost msg support
> 
> On Thu, Apr 21, 2022 at 11:20 AM Andy Pei  wrote:
> > diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c index
> > 1d39067..3780804 100644
> > --- a/lib/vhost/vhost_user.c
> > +++ b/lib/vhost/vhost_user.c
> > @@ -80,6 +80,8 @@
> > [VHOST_USER_NET_SET_MTU]  = "VHOST_USER_NET_SET_MTU",
> > [VHOST_USER_SET_SLAVE_REQ_FD]  =
> "VHOST_USER_SET_SLAVE_REQ_FD",
> > [VHOST_USER_IOTLB_MSG]  = "VHOST_USER_IOTLB_MSG",
> > +   [VHOST_USER_GET_CONFIG]  = "VHOST_USER_GET_CONFIG",
> > +   [VHOST_USER_SET_CONFIG]  = "VHOST_USER_SET_CONFIG",
> > [VHOST_USER_CRYPTO_CREATE_SESS] =
> "VHOST_USER_CRYPTO_CREATE_SESS",
> > [VHOST_USER_CRYPTO_CLOSE_SESS] =
> "VHOST_USER_CRYPTO_CLOSE_SESS",
> > [VHOST_USER_POSTCOPY_ADVISE]  =
> "VHOST_USER_POSTCOPY_ADVISE",
> > @@ -2542,6 +2544,71 @@ static int is_vring_iotlb(struct virtio_net
> > *dev,  }
> >
> >  static int
> > +vhost_user_get_config(struct virtio_net **pdev,
> > +   struct vhu_msg_context *ctx,
> > +   int main_fd __rte_unused) {
> > +   struct virtio_net *dev = *pdev;
> > +   struct rte_vdpa_device *vdpa_dev = dev->vdpa_dev;
> > +   int ret = 0;
> 
> You must check if there is any fd attached to this message.
> 
> 
> > +
> > +   if (vdpa_dev->ops->get_config) {
> > +   ret = vdpa_dev->ops->get_config(dev->vid,
> > +  ctx->msg.payload.cfg.region,
> > +  ctx->msg.payload.cfg.size);
> > +   if (ret != 0) {
> > +   ctx->msg.size = 0;
> > +   VHOST_LOG_CONFIG(ERR,
> > +"(%s) get_config() return 
> > error!\n",
> > +dev->ifname);
> > +   }
> > +   } else {
> > +   VHOST_LOG_CONFIG(ERR, "(%s) get_config() not supportted!\n",
> > +dev->ifname);
> > +   }
> > +
> > +   return RTE_VHOST_MSG_RESULT_REPLY; }
> > +
> > +static int
> > +vhost_user_set_config(struct virtio_net **pdev,
> > +   struct vhu_msg_context *ctx,
> > +   int main_fd __rte_unused) {
> > +   struct virtio_net *dev = *pdev;
> > +   struct rte_vdpa_device *vdpa_dev = dev->vdpa_dev;
> > +   int ret = 0;
> 
> Idem.
> 
> 
> > +
> > +   if (ctx->msg.size != sizeof(struct vhost_user_config)) {
> > +   VHOST_LOG_CONFIG(ERR,
> > +   "(%s) invalid set config msg size: %"PRId32" != 
> > %d\n",
> > +   dev->ifname, ctx->msg.size,
> > +   (int)sizeof(struct vhost_user_config));
> > +   goto OUT;
> > +   }
> 
> 
> For info, I posted a series to make this kind of check more systematic.
> See:
> https://patchwork.dpdk.org/project/dpdk/patch/20220425125431.26464-2-
> david.march...@redhat.com/
> 
> 
> 
> --
> David Marchand



Re: [PATCH v4 01/14] bus/vmbus: move independent code from Linux

2022-04-26 Thread Srikanth K
Sure Stephen. I will change it to unix.

On Tue, 19 Apr 2022, 8:19 pm Stephen Hemminger, 
wrote:

> On Mon, 18 Apr 2022 09:59:02 +0530
> Srikanth Kaka  wrote:
>
> > Move the OS independent code from Linux dir in-order to be used
> > by FreeBSD
> >
> > Signed-off-by: Srikanth Kaka 
> > Signed-off-by: Vag Singh 
> > Signed-off-by: Anand Thulasiram 
> > ---
> >  drivers/bus/vmbus/linux/vmbus_bus.c   | 13 +
> >  drivers/bus/vmbus/meson.build |  5 +
> >  drivers/bus/vmbus/osi/vmbus_osi.h | 11 +++
> >  drivers/bus/vmbus/osi/vmbus_osi_bus.c | 20 
> >  4 files changed, 37 insertions(+), 12 deletions(-)
> >  create mode 100644 drivers/bus/vmbus/osi/vmbus_osi.h
> >  create mode 100644 drivers/bus/vmbus/osi/vmbus_osi_bus.c
> >
> > diff --git a/drivers/bus/vmbus/linux/vmbus_bus.c
> b/drivers/bus/vmbus/linux/vmbus_bus.c
> > index f502783f7a..c9a07041a7 100644
> > --- a/drivers/bus/vmbus/linux/vmbus_bus.c
> > +++ b/drivers/bus/vmbus/linux/vmbus_bus.c
> > @@ -21,22 +21,11 @@
> >
> >  #include "eal_filesystem.h"
> >  #include "private.h"
> > +#include "vmbus_osi.h"
> >
> >  /** Pathname of VMBUS devices directory. */
> >  #define SYSFS_VMBUS_DEVICES "/sys/bus/vmbus/devices"
> >
> > -/*
> > - * GUID associated with network devices
> > - * {f8615163-df3e-46c5-913f-f2d2f965ed0e}
> > - */
> > -static const rte_uuid_t vmbus_nic_uuid = {
> > - 0xf8, 0x61, 0x51, 0x63,
> > - 0xdf, 0x3e,
> > - 0x46, 0xc5,
> > - 0x91, 0x3f,
> > - 0xf2, 0xd2, 0xf9, 0x65, 0xed, 0xe
> > -};
> > -
> >  extern struct rte_vmbus_bus rte_vmbus_bus;
> >
> >  /* Read sysfs file to get UUID */
> > diff --git a/drivers/bus/vmbus/meson.build
> b/drivers/bus/vmbus/meson.build
> > index 3892cbf67f..cbcba44e16 100644
> > --- a/drivers/bus/vmbus/meson.build
> > +++ b/drivers/bus/vmbus/meson.build
> > @@ -16,6 +16,11 @@ sources = files(
> >  'vmbus_common_uio.c',
> >  )
> >
> > +includes += include_directories('osi')
> > +sources += files(
> > + 'osi/vmbus_osi_bus.c'
> > +)
> > +
> >  if is_linux
> >  sources += files('linux/vmbus_bus.c',
> >  'linux/vmbus_uio.c')
> > diff --git a/drivers/bus/vmbus/osi/vmbus_osi.h
> b/drivers/bus/vmbus/osi/vmbus_osi.h
> > new file mode 100644
> > index 00..2db9399181
> > --- /dev/null
> > +++ b/drivers/bus/vmbus/osi/vmbus_osi.h
>
> Having common code is good, we are already doing it now in DPDK EAL.
> But the name osi seems odd to me.
> Could you use unix instead (same as EAL)
>
>drivers/bus/vmbus/unix/vmbus.h
>
> Or drivers/bus/vmbus/common
>


Reuse Of lcore after returning from its worker thread

2022-04-26 Thread Ansar Kannankattil
Hi,
As per my understanding "*rte_eal_wait_lcore" *is a blocking call in case
of lcore state running.
1. Is there any direct way to reuse the lcore which we returned from a
worker thread?
2. Technically is there any issue in reusing the lcore by some means?


Re: [PATCH] net/nfp: remove unneeded header inclusion

2022-04-26 Thread Niklas Soderlund
Hi David,

Thanks for your work.

On 2022-04-08 11:41:16 +0200, David Marchand wrote:
> Looking at this driver history, there was never a need for including
> execinfo.h.
> 
> Signed-off-by: David Marchand 

Reviewed-by: Niklas Söderlund 

> ---
>  drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c 
> b/drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c
> index bad80a5a1c..08bc4e8ef2 100644
> --- a/drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c
> +++ b/drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c
> @@ -16,9 +16,6 @@
>  
>  #include 
>  #include 
> -#if defined(RTE_BACKTRACE)
> -#include 
> -#endif
>  #include 
>  #include 
>  #include 
> -- 
> 2.23.0
> 

-- 
Kind Regards,
Niklas Söderlund


[PATCH v5 00/14] add FreeBSD support to VMBUS & NetVSC PMDs

2022-04-26 Thread Srikanth Kaka
This patchset requires FreeBSD VMBus kernel changes and
HV_UIO driver. Both are currently under review at
https://reviews.freebsd.org/D32184

Changelog:
v5: - renamed dir osi to unix
- marked a newly added API as experimental
- removed camel case variables
v4: - moved OS independent code out of Linux
v3: - split the patches into further logical parts
- updated docs
v2: - replaced strncpy with memcpy
- replaced malloc.h with stdlib.h
- added comment in linux/vmbus_uio.c
v1: Intial release

Srikanth Kaka (14):
  bus/vmbus: move independent code from Linux
  bus/vmbus: move independent bus functions
  bus/vmbus: move OS independent UIO functions
  bus/vmbus: scan and get the network device on FreeBSD
  bus/vmbus: handle mapping of device resources
  bus/vmbus: get device resource values using sysctl
  net/netvsc: make event monitor OS dependent
  bus/vmbus: add sub-channel mapping support
  bus/vmbus: open subchannels
  net/netvsc: make IOCTL call to open subchannels
  bus/vmbus: get subchannel info
  net/netvsc: moving hotplug retry to OS dir
  bus/vmbus: add meson support for FreeBSD
  bus/vmbus: update MAINTAINERS and docs

 MAINTAINERS |   2 +
 doc/guides/nics/netvsc.rst  |  11 ++
 drivers/bus/vmbus/freebsd/vmbus_bus.c   | 286 
 drivers/bus/vmbus/freebsd/vmbus_uio.c   | 256 +
 drivers/bus/vmbus/linux/vmbus_bus.c |  28 +--
 drivers/bus/vmbus/linux/vmbus_uio.c | 320 
 drivers/bus/vmbus/meson.build   |  12 +-
 drivers/bus/vmbus/private.h |   1 +
 drivers/bus/vmbus/rte_bus_vmbus.h   |  11 ++
 drivers/bus/vmbus/unix/vmbus_unix.h |  27 +++
 drivers/bus/vmbus/unix/vmbus_unix_bus.c |  37 
 drivers/bus/vmbus/unix/vmbus_unix_uio.c | 310 +++
 drivers/bus/vmbus/version.map   |   6 +
 drivers/bus/vmbus/vmbus_channel.c   |   5 +
 drivers/net/netvsc/freebsd/hn_os.c  |  21 +++
 drivers/net/netvsc/freebsd/meson.build  |   6 +
 drivers/net/netvsc/hn_ethdev.c  |  95 +-
 drivers/net/netvsc/hn_os.h  |   8 +
 drivers/net/netvsc/linux/hn_os.c| 111 +++
 drivers/net/netvsc/linux/meson.build|   6 +
 drivers/net/netvsc/meson.build  |   3 +
 21 files changed, 1164 insertions(+), 398 deletions(-)
 create mode 100644 drivers/bus/vmbus/freebsd/vmbus_bus.c
 create mode 100644 drivers/bus/vmbus/freebsd/vmbus_uio.c
 create mode 100644 drivers/bus/vmbus/unix/vmbus_unix.h
 create mode 100644 drivers/bus/vmbus/unix/vmbus_unix_bus.c
 create mode 100644 drivers/bus/vmbus/unix/vmbus_unix_uio.c
 create mode 100644 drivers/net/netvsc/freebsd/hn_os.c
 create mode 100644 drivers/net/netvsc/freebsd/meson.build
 create mode 100644 drivers/net/netvsc/hn_os.h
 create mode 100644 drivers/net/netvsc/linux/hn_os.c
 create mode 100644 drivers/net/netvsc/linux/meson.build

-- 
1.8.3.1



[PATCH v5 01/14] bus/vmbus: move independent code from Linux

2022-04-26 Thread Srikanth Kaka
Move the OS independent code from Linux dir in-order to be used
by FreeBSD

Signed-off-by: Srikanth Kaka 
Signed-off-by: Vag Singh 
Signed-off-by: Anand Thulasiram 
---
 drivers/bus/vmbus/linux/vmbus_bus.c | 13 +
 drivers/bus/vmbus/meson.build   |  5 +
 drivers/bus/vmbus/unix/vmbus_unix.h | 11 +++
 drivers/bus/vmbus/unix/vmbus_unix_bus.c | 20 
 4 files changed, 37 insertions(+), 12 deletions(-)
 create mode 100644 drivers/bus/vmbus/unix/vmbus_unix.h
 create mode 100644 drivers/bus/vmbus/unix/vmbus_unix_bus.c

diff --git a/drivers/bus/vmbus/linux/vmbus_bus.c 
b/drivers/bus/vmbus/linux/vmbus_bus.c
index f502783..e649537 100644
--- a/drivers/bus/vmbus/linux/vmbus_bus.c
+++ b/drivers/bus/vmbus/linux/vmbus_bus.c
@@ -21,22 +21,11 @@
 
 #include "eal_filesystem.h"
 #include "private.h"
+#include "vmbus_unix.h"
 
 /** Pathname of VMBUS devices directory. */
 #define SYSFS_VMBUS_DEVICES "/sys/bus/vmbus/devices"
 
-/*
- * GUID associated with network devices
- * {f8615163-df3e-46c5-913f-f2d2f965ed0e}
- */
-static const rte_uuid_t vmbus_nic_uuid = {
-   0xf8, 0x61, 0x51, 0x63,
-   0xdf, 0x3e,
-   0x46, 0xc5,
-   0x91, 0x3f,
-   0xf2, 0xd2, 0xf9, 0x65, 0xed, 0xe
-};
-
 extern struct rte_vmbus_bus rte_vmbus_bus;
 
 /* Read sysfs file to get UUID */
diff --git a/drivers/bus/vmbus/meson.build b/drivers/bus/vmbus/meson.build
index 3892cbf..01ef01f 100644
--- a/drivers/bus/vmbus/meson.build
+++ b/drivers/bus/vmbus/meson.build
@@ -16,6 +16,11 @@ sources = files(
 'vmbus_common_uio.c',
 )
 
+includes += include_directories('unix')
+sources += files(
+   'unix/vmbus_unix_bus.c'
+)
+
 if is_linux
 sources += files('linux/vmbus_bus.c',
 'linux/vmbus_uio.c')
diff --git a/drivers/bus/vmbus/unix/vmbus_unix.h 
b/drivers/bus/vmbus/unix/vmbus_unix.h
new file mode 100644
index 000..2db9399
--- /dev/null
+++ b/drivers/bus/vmbus/unix/vmbus_unix.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2018, Microsoft Corporation.
+ * All Rights Reserved.
+ */
+
+#ifndef _VMBUS_BUS_UNIX_H_
+#define _VMBUS_BUS_UNIX_H_
+
+extern const rte_uuid_t vmbus_nic_uuid;
+
+#endif /* _VMBUS_BUS_UNIX_H_ */
diff --git a/drivers/bus/vmbus/unix/vmbus_unix_bus.c 
b/drivers/bus/vmbus/unix/vmbus_unix_bus.c
new file mode 100644
index 000..f76a361
--- /dev/null
+++ b/drivers/bus/vmbus/unix/vmbus_unix_bus.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2018, Microsoft Corporation.
+ * All Rights Reserved.
+ */
+
+#include 
+
+#include "vmbus_unix.h"
+
+/*
+ * GUID associated with network devices
+ * {f8615163-df3e-46c5-913f-f2d2f965ed0e}
+ */
+const rte_uuid_t vmbus_nic_uuid = {
+   0xf8, 0x61, 0x51, 0x63,
+   0xdf, 0x3e,
+   0x46, 0xc5,
+   0x91, 0x3f,
+   0xf2, 0xd2, 0xf9, 0x65, 0xed, 0xe
+};
-- 
1.8.3.1



[PATCH v5 02/14] bus/vmbus: move independent bus functions

2022-04-26 Thread Srikanth Kaka
move independent Linux bus functions to OS independent file

Signed-off-by: Srikanth Kaka 
Signed-off-by: Vag Singh 
Signed-off-by: Anand Thulasiram 
---
 drivers/bus/vmbus/linux/vmbus_bus.c | 15 ---
 drivers/bus/vmbus/unix/vmbus_unix_bus.c | 17 +
 2 files changed, 17 insertions(+), 15 deletions(-)

diff --git a/drivers/bus/vmbus/linux/vmbus_bus.c 
b/drivers/bus/vmbus/linux/vmbus_bus.c
index e649537..18233a5 100644
--- a/drivers/bus/vmbus/linux/vmbus_bus.c
+++ b/drivers/bus/vmbus/linux/vmbus_bus.c
@@ -358,18 +358,3 @@
closedir(dir);
return -1;
 }
-
-void rte_vmbus_irq_mask(struct rte_vmbus_device *device)
-{
-   vmbus_uio_irq_control(device, 1);
-}
-
-void rte_vmbus_irq_unmask(struct rte_vmbus_device *device)
-{
-   vmbus_uio_irq_control(device, 0);
-}
-
-int rte_vmbus_irq_read(struct rte_vmbus_device *device)
-{
-   return vmbus_uio_irq_read(device);
-}
diff --git a/drivers/bus/vmbus/unix/vmbus_unix_bus.c 
b/drivers/bus/vmbus/unix/vmbus_unix_bus.c
index f76a361..96cb968 100644
--- a/drivers/bus/vmbus/unix/vmbus_unix_bus.c
+++ b/drivers/bus/vmbus/unix/vmbus_unix_bus.c
@@ -3,8 +3,10 @@
  * All Rights Reserved.
  */
 
+#include 
 #include 
 
+#include "private.h"
 #include "vmbus_unix.h"
 
 /*
@@ -18,3 +20,18 @@
0x91, 0x3f,
0xf2, 0xd2, 0xf9, 0x65, 0xed, 0xe
 };
+
+void rte_vmbus_irq_mask(struct rte_vmbus_device *device)
+{
+   vmbus_uio_irq_control(device, 1);
+}
+
+void rte_vmbus_irq_unmask(struct rte_vmbus_device *device)
+{
+   vmbus_uio_irq_control(device, 0);
+}
+
+int rte_vmbus_irq_read(struct rte_vmbus_device *device)
+{
+   return vmbus_uio_irq_read(device);
+}
-- 
1.8.3.1



[PATCH v5 03/14] bus/vmbus: move OS independent UIO functions

2022-04-26 Thread Srikanth Kaka
Moved all Linux independent UIO functions to unix dir.
Split the vmbus_uio_map_subchan() by keeping OS dependent
code in vmbus_uio_map_subchan_os() function

Signed-off-by: Srikanth Kaka 
Signed-off-by: Vag Singh 
Signed-off-by: Anand Thulasiram 
---
 drivers/bus/vmbus/linux/vmbus_uio.c | 292 ++
 drivers/bus/vmbus/meson.build   |   3 +-
 drivers/bus/vmbus/unix/vmbus_unix.h |  12 ++
 drivers/bus/vmbus/unix/vmbus_unix_uio.c | 306 
 4 files changed, 330 insertions(+), 283 deletions(-)
 create mode 100644 drivers/bus/vmbus/unix/vmbus_unix_uio.c

diff --git a/drivers/bus/vmbus/linux/vmbus_uio.c 
b/drivers/bus/vmbus/linux/vmbus_uio.c
index 5db70f8..b5d15c9 100644
--- a/drivers/bus/vmbus/linux/vmbus_uio.c
+++ b/drivers/bus/vmbus/linux/vmbus_uio.c
@@ -21,233 +21,18 @@
 #include 
 
 #include "private.h"
+#include "vmbus_unix.h"
 
 /** Pathname of VMBUS devices directory. */
 #define SYSFS_VMBUS_DEVICES "/sys/bus/vmbus/devices"
 
-static void *vmbus_map_addr;
-
-/* Control interrupts */
-void vmbus_uio_irq_control(struct rte_vmbus_device *dev, int32_t onoff)
-{
-   if ((rte_intr_fd_get(dev->intr_handle) < 0) ||
-   write(rte_intr_fd_get(dev->intr_handle), &onoff,
- sizeof(onoff)) < 0) {
-   VMBUS_LOG(ERR, "cannot write to %d:%s",
- rte_intr_fd_get(dev->intr_handle),
- strerror(errno));
-   }
-}
-
-int vmbus_uio_irq_read(struct rte_vmbus_device *dev)
-{
-   int32_t count;
-   int cc;
-
-   if (rte_intr_fd_get(dev->intr_handle) < 0)
-   return -1;
-
-   cc = read(rte_intr_fd_get(dev->intr_handle), &count,
- sizeof(count));
-   if (cc < (int)sizeof(count)) {
-   if (cc < 0) {
-   VMBUS_LOG(ERR, "IRQ read failed %s",
- strerror(errno));
-   return -errno;
-   }
-   VMBUS_LOG(ERR, "can't read IRQ count");
-   return -EINVAL;
-   }
-
-   return count;
-}
-
-void
-vmbus_uio_free_resource(struct rte_vmbus_device *dev,
-   struct mapped_vmbus_resource *uio_res)
-{
-   rte_free(uio_res);
-
-   if (rte_intr_dev_fd_get(dev->intr_handle) >= 0) {
-   close(rte_intr_dev_fd_get(dev->intr_handle));
-   rte_intr_dev_fd_set(dev->intr_handle, -1);
-   }
-
-   if (rte_intr_fd_get(dev->intr_handle) >= 0) {
-   close(rte_intr_fd_get(dev->intr_handle));
-   rte_intr_fd_set(dev->intr_handle, -1);
-   rte_intr_type_set(dev->intr_handle, RTE_INTR_HANDLE_UNKNOWN);
-   }
-}
-
-int
-vmbus_uio_alloc_resource(struct rte_vmbus_device *dev,
-struct mapped_vmbus_resource **uio_res)
-{
-   char devname[PATH_MAX]; /* contains the /dev/uioX */
-   int fd;
-
-   /* save fd if in primary process */
-   snprintf(devname, sizeof(devname), "/dev/uio%u", dev->uio_num);
-   fd = open(devname, O_RDWR);
-   if (fd < 0) {
-   VMBUS_LOG(ERR, "Cannot open %s: %s",
-   devname, strerror(errno));
-   goto error;
-   }
-
-   if (rte_intr_fd_set(dev->intr_handle, fd))
-   goto error;
-
-   if (rte_intr_type_set(dev->intr_handle, RTE_INTR_HANDLE_UIO_INTX))
-   goto error;
-
-   /* allocate the mapping details for secondary processes*/
-   *uio_res = rte_zmalloc("UIO_RES", sizeof(**uio_res), 0);
-   if (*uio_res == NULL) {
-   VMBUS_LOG(ERR, "cannot store uio mmap details");
-   goto error;
-   }
-
-   strlcpy((*uio_res)->path, devname, PATH_MAX);
-   rte_uuid_copy((*uio_res)->id, dev->device_id);
-
-   return 0;
-
-error:
-   vmbus_uio_free_resource(dev, *uio_res);
-   return -1;
-}
-
-static int
-find_max_end_va(const struct rte_memseg_list *msl, void *arg)
-{
-   size_t sz = msl->memseg_arr.len * msl->page_sz;
-   void *end_va = RTE_PTR_ADD(msl->base_va, sz);
-   void **max_va = arg;
-
-   if (*max_va < end_va)
-   *max_va = end_va;
-   return 0;
-}
-
-/*
- * TODO: this should be part of memseg api.
- *   code is duplicated from PCI.
- */
-static void *
-vmbus_find_max_end_va(void)
-{
-   void *va = NULL;
-
-   rte_memseg_list_walk(find_max_end_va, &va);
-   return va;
-}
-
-int
-vmbus_uio_map_resource_by_index(struct rte_vmbus_device *dev, int idx,
-   struct mapped_vmbus_resource *uio_res,
-   int flags)
-{
-   size_t size = dev->resource[idx].len;
-   struct vmbus_map *maps = uio_res->maps;
-   void *mapaddr;
-   off_t offset;
-   int fd;
-
-   /* devname for mmap  */
-   fd = open(uio_res->path, O_RDWR);
-   if (fd < 0) {
-   VMBUS_LOG(ERR, "Cannot open %s: %s",
- uio_res->path, stre

[PATCH v5 04/14] bus/vmbus: scan and get the network device on FreeBSD

2022-04-26 Thread Srikanth Kaka
Using sysctl, all the devices on the VMBUS are identified by the PMD.
On finding the Network device's device id, it is added to VMBUS dev
list.

Signed-off-by: Srikanth Kaka 
Signed-off-by: Vag Singh 
Signed-off-by: Anand Thulasiram 
---
 drivers/bus/vmbus/freebsd/vmbus_bus.c | 268 ++
 1 file changed, 268 insertions(+)
 create mode 100644 drivers/bus/vmbus/freebsd/vmbus_bus.c

diff --git a/drivers/bus/vmbus/freebsd/vmbus_bus.c 
b/drivers/bus/vmbus/freebsd/vmbus_bus.c
new file mode 100644
index 000..c1a3a5f
--- /dev/null
+++ b/drivers/bus/vmbus/freebsd/vmbus_bus.c
@@ -0,0 +1,268 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2018, Microsoft Corporation.
+ * All Rights Reserved.
+ */
+
+#include 
+#include 
+
+#include "private.h"
+#include "vmbus_unix.h"
+
+#include 
+#include 
+#include 
+
+/* Parse UUID. Caller must pass NULL terminated string */
+static int
+parse_sysfs_uuid(const char *filename, rte_uuid_t uu)
+{
+   char in[BUFSIZ];
+
+   memcpy(in, filename, BUFSIZ);
+   if (rte_uuid_parse(in, uu) < 0) {
+   VMBUS_LOG(ERR, "%s not a valid UUID", in);
+   return -1;
+   }
+
+   return 0;
+}
+
+/* Scan one vmbus entry, and fill the devices list from it. */
+static int
+vmbus_scan_one(const char *name, unsigned int unit_num)
+{
+   struct rte_vmbus_device *dev, *dev2;
+   char sysctl_buf[PATH_MAX], sysctl_var[PATH_MAX];
+   size_t guid_len = 36, len = PATH_MAX;
+   char classid[guid_len + 1], deviceid[guid_len + 1];
+
+   dev = calloc(1, sizeof(*dev));
+   if (dev == NULL)
+   return -1;
+
+   /* get class id and device id */
+   snprintf(sysctl_var, len, "dev.%s.%u.%%pnpinfo", name, unit_num);
+   if (sysctlbyname(sysctl_var, &sysctl_buf, &len, NULL, 0) < 0)
+   goto error;
+
+   /* pnpinfo: classid=f912ad6d-2b17-48ea-bd65-f927a61c7684
+* deviceid=d34b2567-b9b6-42b9-8778-0a4ec0b955bf
+*/
+   if (sysctl_buf[0] == 'c' && sysctl_buf[1] == 'l' &&
+   sysctl_buf[7] == '=') {
+   memcpy(classid, &sysctl_buf[8], guid_len);
+   classid[guid_len] = '\0';
+   }
+   if (parse_sysfs_uuid(classid, dev->class_id) < 0)
+   goto error;
+
+   /* skip non-network devices */
+   if (rte_uuid_compare(dev->class_id, vmbus_nic_uuid) != 0) {
+   free(dev);
+   return 0;
+   }
+
+   if (sysctl_buf[45] == 'd' && sysctl_buf[46] == 'e' &&
+   sysctl_buf[47] == 'v' && sysctl_buf[53] == '=') {
+   memcpy(deviceid, &sysctl_buf[54], guid_len);
+   deviceid[guid_len] = '\0';
+   }
+   if (parse_sysfs_uuid(deviceid, dev->device_id) < 0)
+   goto error;
+
+   if (!strcmp(name, "hv_uio"))
+   dev->uio_num = unit_num;
+   else
+   dev->uio_num = -1;
+   dev->device.bus = &rte_vmbus_bus.bus;
+   dev->device.numa_node = 0;
+   dev->device.name = strdup(deviceid);
+   if (!dev->device.name)
+   goto error;
+
+   dev->device.devargs = vmbus_devargs_lookup(dev);
+
+   dev->intr_handle = rte_intr_instance_alloc(RTE_INTR_INSTANCE_F_PRIVATE);
+   if (dev->intr_handle == NULL)
+   goto error;
+
+   /* device is valid, add in list (sorted) */
+   VMBUS_LOG(DEBUG, "Adding vmbus device %s", name);
+
+   TAILQ_FOREACH(dev2, &rte_vmbus_bus.device_list, next) {
+   int ret;
+
+   ret = rte_uuid_compare(dev->device_id, dev2->device_id);
+   if (ret > 0)
+   continue;
+
+   if (ret < 0) {
+   vmbus_insert_device(dev2, dev);
+   } else { /* already registered */
+   VMBUS_LOG(NOTICE,
+   "%s already registered", name);
+   free(dev);
+   }
+   return 0;
+   }
+
+   vmbus_add_device(dev);
+   return 0;
+error:
+   VMBUS_LOG(DEBUG, "failed");
+
+   free(dev);
+   return -1;
+}
+
+static int
+vmbus_unpack(char *walker, char *ep, char **str)
+{
+   int ret = 0;
+
+   *str = strdup(walker);
+   if (*str == NULL) {
+   ret = -ENOMEM;
+   goto exit;
+   }
+
+   if (walker + strnlen(walker, ep - walker) >= ep) {
+   ret = -EINVAL;
+   goto exit;
+   }
+exit:
+   return ret;
+}
+
+/*
+ * Scan the content of the vmbus, and the devices in the devices list
+ */
+int
+rte_vmbus_scan(void)
+{
+   struct u_device udev;
+   struct u_businfo ubus;
+   int dev_idx, dev_ptr, name2oid[2], oid[CTL_MAXNAME + 12], error;
+   size_t oidlen, rlen, ub_size;
+   uintptr_t vmbus_handle = 0;
+   char *walker, *ep;
+   char name[16] = "hw.bus.devices";
+   char *dd_name, *dd_desc, *dd_drivername, *dd_pnpinfo, *dd_location;
+
+   /*
+* devinfo F

[PATCH v5 05/14] bus/vmbus: handle mapping of device resources

2022-04-26 Thread Srikanth Kaka
All resource values are published by HV_UIO driver as sysctl key
value pairs and they are read at a later point of the code flow

Signed-off-by: Srikanth Kaka 
Signed-off-by: Vag Singh 
Signed-off-by: Anand Thulasiram 
---
 drivers/bus/vmbus/freebsd/vmbus_bus.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/drivers/bus/vmbus/freebsd/vmbus_bus.c 
b/drivers/bus/vmbus/freebsd/vmbus_bus.c
index c1a3a5f..28f5ff4 100644
--- a/drivers/bus/vmbus/freebsd/vmbus_bus.c
+++ b/drivers/bus/vmbus/freebsd/vmbus_bus.c
@@ -28,6 +28,24 @@
return 0;
 }
 
+/* map the resources of a vmbus device in virtual memory */
+int
+rte_vmbus_map_device(struct rte_vmbus_device *dev)
+{
+   if (dev->uio_num < 0) {
+   VMBUS_LOG(DEBUG, "Not managed by UIO driver, skipped");
+   return 1;
+   }
+
+   return vmbus_uio_map_resource(dev);
+}
+
+void
+rte_vmbus_unmap_device(struct rte_vmbus_device *dev)
+{
+   vmbus_uio_unmap_resource(dev);
+}
+
 /* Scan one vmbus entry, and fill the devices list from it. */
 static int
 vmbus_scan_one(const char *name, unsigned int unit_num)
-- 
1.8.3.1



[PATCH v5 06/14] bus/vmbus: get device resource values using sysctl

2022-04-26 Thread Srikanth Kaka
The UIO device's attribute (relid, monitor id, etc) values are
retrieved using following sysctl variables:
$ sysctl dev.hv_uio.0
dev.hv_uio.0.send_buf.gpadl: 925241
dev.hv_uio.0.send_buf.size: 16777216
dev.hv_uio.0.recv_buf.gpadl: 925240
dev.hv_uio.0.recv_buf.size: 32505856
dev.hv_uio.0.monitor_page.size: 4096
dev.hv_uio.0.int_page.size: 4096

Signed-off-by: Srikanth Kaka 
Signed-off-by: Vag Singh 
Signed-off-by: Anand Thulasiram 
---
 drivers/bus/vmbus/freebsd/vmbus_uio.c   | 105 
 drivers/bus/vmbus/linux/vmbus_uio.c |  16 +
 drivers/bus/vmbus/unix/vmbus_unix.h |   4 ++
 drivers/bus/vmbus/unix/vmbus_unix_uio.c |   6 +-
 4 files changed, 130 insertions(+), 1 deletion(-)
 create mode 100644 drivers/bus/vmbus/freebsd/vmbus_uio.c

diff --git a/drivers/bus/vmbus/freebsd/vmbus_uio.c 
b/drivers/bus/vmbus/freebsd/vmbus_uio.c
new file mode 100644
index 000..0544371
--- /dev/null
+++ b/drivers/bus/vmbus/freebsd/vmbus_uio.c
@@ -0,0 +1,105 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2018, Microsoft Corporation.
+ * All Rights Reserved.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+
+#include "private.h"
+#include "vmbus_unix.h"
+
+const char *driver_name = "hv_uio";
+
+/* Check map names with kernel names */
+static const char *map_names[VMBUS_MAX_RESOURCE] = {
+   [HV_TXRX_RING_MAP] = "txrx_rings",
+   [HV_INT_PAGE_MAP]  = "int_page",
+   [HV_MON_PAGE_MAP]  = "monitor_page",
+   [HV_RECV_BUF_MAP]  = "recv_buf",
+   [HV_SEND_BUF_MAP]  = "send_buf",
+};
+
+static int
+sysctl_get_vmbus_device_info(struct rte_vmbus_device *dev)
+{
+   char sysctl_buf[PATH_MAX];
+   char sysctl_var[PATH_MAX];
+   size_t len = PATH_MAX, sysctl_len;
+   unsigned long tmp;
+   int i;
+
+   snprintf(sysctl_buf, len, "dev.%s.%d", driver_name, dev->uio_num);
+
+   sysctl_len = sizeof(unsigned long);
+   /* get relid */
+   snprintf(sysctl_var, len, "%s.channel.ch_id", sysctl_buf);
+   if (sysctlbyname(sysctl_var, &tmp, &sysctl_len, NULL, 0) < 0) {
+   VMBUS_LOG(ERR, "could not read %s", sysctl_var);
+   goto error;
+   }
+   dev->relid = tmp;
+
+   /* get monitor id */
+   snprintf(sysctl_var, len, "%s.channel.%u.monitor_id", sysctl_buf,
+dev->relid);
+   if (sysctlbyname(sysctl_var, &tmp, &sysctl_len, NULL, 0) < 0) {
+   VMBUS_LOG(ERR, "could not read %s", sysctl_var);
+   goto error;
+   }
+   dev->monitor_id = tmp;
+
+   /* Extract resource value */
+   for (i = 0; i < VMBUS_MAX_RESOURCE; i++) {
+   struct rte_mem_resource *res = &dev->resource[i];
+   unsigned long size, gpad = 0;
+   size_t sizelen = sizeof(len);
+
+   snprintf(sysctl_var, sizeof(sysctl_var), "%s.%s.size",
+sysctl_buf, map_names[i]);
+   if (sysctlbyname(sysctl_var, &size, &sizelen, NULL, 0) < 0) {
+   VMBUS_LOG(ERR,
+   "could not read %s", sysctl_var);
+   goto error;
+   }
+   res->len = size;
+
+   if (i == HV_RECV_BUF_MAP || i == HV_SEND_BUF_MAP) {
+   snprintf(sysctl_var, sizeof(sysctl_var), "%s.%s.gpadl",
+sysctl_buf, map_names[i]);
+   if (sysctlbyname(sysctl_var, &gpad, &sizelen, NULL, 0) 
< 0) {
+   VMBUS_LOG(ERR,
+   "could not read %s", sysctl_var);
+   goto error;
+   }
+   /* put the GPAD value in physical address */
+   res->phys_addr = gpad;
+   }
+   }
+   return 0;
+error:
+   return -1;
+}
+
+/*
+ * On FreeBSD, the device is opened first to ensure kernel UIO driver
+ * is properly initialized before reading device attributes
+ */
+int vmbus_get_device_info_os(struct rte_vmbus_device *dev)
+{
+   return sysctl_get_vmbus_device_info(dev);
+}
+
+const char *get_devname_os(void)
+{
+   return "/dev/hv_uio";
+}
diff --git a/drivers/bus/vmbus/linux/vmbus_uio.c 
b/drivers/bus/vmbus/linux/vmbus_uio.c
index b5d15c9..69f0b26 100644
--- a/drivers/bus/vmbus/linux/vmbus_uio.c
+++ b/drivers/bus/vmbus/linux/vmbus_uio.c
@@ -199,3 +199,19 @@ int vmbus_uio_get_subchan(struct vmbus_channel *primary,
closedir(chan_dir);
return err;
 }
+
+/*
+ * In Linux the device info is fetched from SYSFS and doesn't need
+ * opening of the device before reading its attributes
+ * This is a stub function and it should always succeed.
+ */
+int vmbus_get_device_info_os(struct rte_vmbus_device *dev)
+{
+   RTE_SET_USED(dev);
+   return 0;
+}
+
+const char *get_devname_os(void)
+{
+   return "/dev/uio";
+}
diff --git a/drivers/bus/v

[PATCH v5 07/14] net/netvsc: make event monitor OS dependent

2022-04-26 Thread Srikanth Kaka
- Event monitoring is not yet supported on FreeBSD, hence moving it
to the OS specific files
- Add meson support to OS environment

Signed-off-by: Srikanth Kaka 
Signed-off-by: Vag Singh 
Signed-off-by: Anand Thulasiram 
---
 drivers/net/netvsc/freebsd/hn_os.c | 16 
 drivers/net/netvsc/freebsd/meson.build |  6 ++
 drivers/net/netvsc/hn_ethdev.c |  7 +++
 drivers/net/netvsc/hn_os.h |  6 ++
 drivers/net/netvsc/linux/hn_os.c   | 21 +
 drivers/net/netvsc/linux/meson.build   |  6 ++
 drivers/net/netvsc/meson.build |  3 +++
 7 files changed, 61 insertions(+), 4 deletions(-)
 create mode 100644 drivers/net/netvsc/freebsd/hn_os.c
 create mode 100644 drivers/net/netvsc/freebsd/meson.build
 create mode 100644 drivers/net/netvsc/hn_os.h
 create mode 100644 drivers/net/netvsc/linux/hn_os.c
 create mode 100644 drivers/net/netvsc/linux/meson.build

diff --git a/drivers/net/netvsc/freebsd/hn_os.c 
b/drivers/net/netvsc/freebsd/hn_os.c
new file mode 100644
index 000..4c6a798
--- /dev/null
+++ b/drivers/net/netvsc/freebsd/hn_os.c
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2016-2021 Microsoft Corporation
+ */
+
+#include 
+
+#include 
+
+#include "hn_logs.h"
+#include "hn_os.h"
+
+int eth_hn_os_dev_event(void)
+{
+   PMD_DRV_LOG(DEBUG, "rte_dev_event_monitor_start not supported on 
FreeBSD");
+   return 0;
+}
diff --git a/drivers/net/netvsc/freebsd/meson.build 
b/drivers/net/netvsc/freebsd/meson.build
new file mode 100644
index 000..78f824f
--- /dev/null
+++ b/drivers/net/netvsc/freebsd/meson.build
@@ -0,0 +1,6 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2018 Microsoft Corporation
+
+sources += files(
+   'hn_os.c',
+)
diff --git a/drivers/net/netvsc/hn_ethdev.c b/drivers/net/netvsc/hn_ethdev.c
index 8a95040..8b1e07b 100644
--- a/drivers/net/netvsc/hn_ethdev.c
+++ b/drivers/net/netvsc/hn_ethdev.c
@@ -39,6 +39,7 @@
 #include "hn_rndis.h"
 #include "hn_nvs.h"
 #include "ndis.h"
+#include "hn_os.h"
 
 #define HN_TX_OFFLOAD_CAPS (RTE_ETH_TX_OFFLOAD_IPV4_CKSUM | \
RTE_ETH_TX_OFFLOAD_TCP_CKSUM  | \
@@ -1240,11 +1241,9 @@ static int eth_hn_probe(struct rte_vmbus_driver *drv 
__rte_unused,
 
PMD_INIT_FUNC_TRACE();
 
-   ret = rte_dev_event_monitor_start();
-   if (ret) {
-   PMD_DRV_LOG(ERR, "Failed to start device event monitoring");
+   ret = eth_hn_os_dev_event();
+   if (ret)
return ret;
-   }
 
eth_dev = eth_dev_vmbus_allocate(dev, sizeof(struct hn_data));
if (!eth_dev)
diff --git a/drivers/net/netvsc/hn_os.h b/drivers/net/netvsc/hn_os.h
new file mode 100644
index 000..618c53c
--- /dev/null
+++ b/drivers/net/netvsc/hn_os.h
@@ -0,0 +1,6 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2009-2021 Microsoft Corp.
+ * All rights reserved.
+ */
+
+int eth_hn_os_dev_event(void);
diff --git a/drivers/net/netvsc/linux/hn_os.c b/drivers/net/netvsc/linux/hn_os.c
new file mode 100644
index 000..1ea12ce
--- /dev/null
+++ b/drivers/net/netvsc/linux/hn_os.c
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2016-2021 Microsoft Corporation
+ */
+
+#include 
+
+#include 
+
+#include "hn_logs.h"
+#include "hn_os.h"
+
+int eth_hn_os_dev_event(void)
+{
+   int ret;
+
+   ret = rte_dev_event_monitor_start();
+   if (ret)
+   PMD_DRV_LOG(ERR, "Failed to start device event monitoring");
+
+   return ret;
+}
diff --git a/drivers/net/netvsc/linux/meson.build 
b/drivers/net/netvsc/linux/meson.build
new file mode 100644
index 000..78f824f
--- /dev/null
+++ b/drivers/net/netvsc/linux/meson.build
@@ -0,0 +1,6 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2018 Microsoft Corporation
+
+sources += files(
+   'hn_os.c',
+)
diff --git a/drivers/net/netvsc/meson.build b/drivers/net/netvsc/meson.build
index bb6225d..c50414d 100644
--- a/drivers/net/netvsc/meson.build
+++ b/drivers/net/netvsc/meson.build
@@ -8,6 +8,7 @@ if is_windows
 endif
 
 deps += 'bus_vmbus'
+includes += include_directories(exec_env)
 sources = files(
 'hn_ethdev.c',
 'hn_nvs.c',
@@ -15,3 +16,5 @@ sources = files(
 'hn_rxtx.c',
 'hn_vf.c',
 )
+
+subdir(exec_env)
-- 
1.8.3.1



[PATCH v5 08/14] bus/vmbus: add sub-channel mapping support

2022-04-26 Thread Srikanth Kaka
To map the subchannels, an mmap request is directly made after
determining the subchan memory offset

Signed-off-by: Srikanth Kaka 
Signed-off-by: Vag Singh 
Signed-off-by: Anand Thulasiram 
---
 drivers/bus/vmbus/freebsd/vmbus_uio.c | 48 +++
 1 file changed, 48 insertions(+)

diff --git a/drivers/bus/vmbus/freebsd/vmbus_uio.c 
b/drivers/bus/vmbus/freebsd/vmbus_uio.c
index 0544371..55b8f18 100644
--- a/drivers/bus/vmbus/freebsd/vmbus_uio.c
+++ b/drivers/bus/vmbus/freebsd/vmbus_uio.c
@@ -18,6 +18,13 @@
 #include "private.h"
 #include "vmbus_unix.h"
 
+/*
+ * Macros to distinguish mmap request
+ * [7-0] - Device memory region
+ * [15-8]- Sub-channel id
+ */
+#define UH_SUBCHAN_MASK_SHIFT  8
+
 const char *driver_name = "hv_uio";
 
 /* Check map names with kernel names */
@@ -99,6 +106,47 @@ int vmbus_get_device_info_os(struct rte_vmbus_device *dev)
return sysctl_get_vmbus_device_info(dev);
 }
 
+int vmbus_uio_map_subchan_os(const struct rte_vmbus_device *dev,
+const struct vmbus_channel *chan,
+void **mapaddr, size_t *size)
+{
+   char ring_path[PATH_MAX];
+   off_t offset;
+   int fd;
+
+   snprintf(ring_path, sizeof(ring_path),
+"/dev/hv_uio%d", dev->uio_num);
+
+   fd = open(ring_path, O_RDWR);
+   if (fd < 0) {
+   VMBUS_LOG(ERR, "Cannot open %s: %s",
+ ring_path, strerror(errno));
+   return -errno;
+   }
+
+   /* subchannel rings are of the same size as primary */
+   *size = dev->resource[HV_TXRX_RING_MAP].len;
+   offset = (chan->relid << UH_SUBCHAN_MASK_SHIFT) * PAGE_SIZE;
+
+   *mapaddr = vmbus_map_resource(vmbus_map_addr, fd,
+ offset, *size, 0);
+   close(fd);
+
+   if (*mapaddr == MAP_FAILED)
+   return -EIO;
+
+   return 0;
+}
+
+/* This function should always succeed */
+bool vmbus_uio_subchannels_supported(const struct rte_vmbus_device *dev,
+const struct vmbus_channel *chan)
+{
+   RTE_SET_USED(dev);
+   RTE_SET_USED(chan);
+   return true;
+}
+
 const char *get_devname_os(void)
 {
return "/dev/hv_uio";
-- 
1.8.3.1



[PATCH v5 09/14] bus/vmbus: open subchannels

2022-04-26 Thread Srikanth Kaka
In FreeBSD, unlike Linux there is no sub-channel open callback that
could be called by HV_UIO driver upon their grant by the hypervisor.
Thus the PMD makes an IOCTL to the HV_UIO to open the sub-channels

On Linux, the vmbus_uio_subchan_open() will always return success
as the Linux HV_UIO opens them implicitly.

Signed-off-by: Srikanth Kaka 
Signed-off-by: Vag Singh 
Signed-off-by: Anand Thulasiram 
---
 drivers/bus/vmbus/freebsd/vmbus_uio.c | 30 ++
 drivers/bus/vmbus/linux/vmbus_uio.c   | 12 
 drivers/bus/vmbus/private.h   |  1 +
 drivers/bus/vmbus/rte_bus_vmbus.h | 11 +++
 drivers/bus/vmbus/version.map |  6 ++
 drivers/bus/vmbus/vmbus_channel.c |  5 +
 6 files changed, 65 insertions(+)

diff --git a/drivers/bus/vmbus/freebsd/vmbus_uio.c 
b/drivers/bus/vmbus/freebsd/vmbus_uio.c
index 55b8f18..438db41 100644
--- a/drivers/bus/vmbus/freebsd/vmbus_uio.c
+++ b/drivers/bus/vmbus/freebsd/vmbus_uio.c
@@ -25,6 +25,9 @@
  */
 #define UH_SUBCHAN_MASK_SHIFT  8
 
+/* ioctl */
+#define HVIOOPENSUBCHAN _IOW('h', 14, uint32_t)
+
 const char *driver_name = "hv_uio";
 
 /* Check map names with kernel names */
@@ -151,3 +154,30 @@ const char *get_devname_os(void)
 {
return "/dev/hv_uio";
 }
+
+int vmbus_uio_subchan_open(struct rte_vmbus_device *dev, uint32_t subchan)
+{
+   struct mapped_vmbus_resource *uio_res;
+   int fd, err = 0;
+
+   uio_res = vmbus_uio_find_resource(dev);
+   if (!uio_res) {
+   VMBUS_LOG(ERR, "cannot find uio resource");
+   return -EINVAL;
+   }
+
+   fd = open(uio_res->path, O_RDWR);
+   if (fd < 0) {
+   VMBUS_LOG(ERR, "Cannot open %s: %s",
+   uio_res->path, strerror(errno));
+   return -1;
+   }
+
+   if (ioctl(fd, HVIOOPENSUBCHAN, &subchan)) {
+   VMBUS_LOG(ERR, "open subchan ioctl failed %s: %s",
+   uio_res->path, strerror(errno));
+   err = -1;
+   }
+   close(fd);
+   return err;
+}
diff --git a/drivers/bus/vmbus/linux/vmbus_uio.c 
b/drivers/bus/vmbus/linux/vmbus_uio.c
index 69f0b26..b9616bd 100644
--- a/drivers/bus/vmbus/linux/vmbus_uio.c
+++ b/drivers/bus/vmbus/linux/vmbus_uio.c
@@ -215,3 +215,15 @@ const char *get_devname_os(void)
 {
return "/dev/uio";
 }
+
+/*
+ * This is a stub function and it should always succeed.
+ * The Linux UIO kernel driver opens the subchannels implicitly.
+ */
+int vmbus_uio_subchan_open(struct rte_vmbus_device *dev,
+  uint32_t subchan)
+{
+   RTE_SET_USED(dev);
+   RTE_SET_USED(subchan);
+   return 0;
+}
diff --git a/drivers/bus/vmbus/private.h b/drivers/bus/vmbus/private.h
index 1bca147..ea0276a 100644
--- a/drivers/bus/vmbus/private.h
+++ b/drivers/bus/vmbus/private.h
@@ -116,6 +116,7 @@ bool vmbus_uio_subchannels_supported(const struct 
rte_vmbus_device *dev,
 int vmbus_uio_get_subchan(struct vmbus_channel *primary,
  struct vmbus_channel **subchan);
 int vmbus_uio_map_rings(struct vmbus_channel *chan);
+int vmbus_uio_subchan_open(struct rte_vmbus_device *device, uint32_t subchan);
 
 void vmbus_br_setup(struct vmbus_br *br, void *buf, unsigned int blen);
 
diff --git a/drivers/bus/vmbus/rte_bus_vmbus.h 
b/drivers/bus/vmbus/rte_bus_vmbus.h
index a24bad8..06b2ffc 100644
--- a/drivers/bus/vmbus/rte_bus_vmbus.h
+++ b/drivers/bus/vmbus/rte_bus_vmbus.h
@@ -404,6 +404,17 @@ void rte_vmbus_set_latency(const struct rte_vmbus_device 
*dev,
  */
 void rte_vmbus_unregister(struct rte_vmbus_driver *driver);
 
+/**
+ * Perform IOCTL to VMBUS device
+ *
+ * @param device
+ * A pointer to a rte_vmbus_device structure
+ * @param subchan
+ * Count of subchannels to open
+ */
+__rte_experimental
+int rte_vmbus_ioctl(struct rte_vmbus_device *device, uint32_t subchan);
+
 /** Helper for VMBUS device registration from driver instance */
 #define RTE_PMD_REGISTER_VMBUS(nm, vmbus_drv)  \
RTE_INIT(vmbusinitfn_ ##nm) \
diff --git a/drivers/bus/vmbus/version.map b/drivers/bus/vmbus/version.map
index 3cadec7..e5b7218 100644
--- a/drivers/bus/vmbus/version.map
+++ b/drivers/bus/vmbus/version.map
@@ -26,3 +26,9 @@ DPDK_22 {
 
local: *;
 };
+
+EXPERIMENTAL {
+   global:
+
+   rte_vmbus_ioctl;
+};
diff --git a/drivers/bus/vmbus/vmbus_channel.c 
b/drivers/bus/vmbus/vmbus_channel.c
index 119b9b3..9a8f6e3 100644
--- a/drivers/bus/vmbus/vmbus_channel.c
+++ b/drivers/bus/vmbus/vmbus_channel.c
@@ -365,6 +365,11 @@ int rte_vmbus_max_channels(const struct rte_vmbus_device 
*device)
return 1;
 }
 
+int rte_vmbus_ioctl(struct rte_vmbus_device *device, uint32_t subchan)
+{
+   return vmbus_uio_subchan_open(device, subchan);
+}
+
 /* Setup secondary channel */
 int rte_vmbus_subchan_open(struct vmbus_channel *primary,
   struct vmbus_channel **new_chan)
-- 
1.8.3.

[PATCH v5 10/14] net/netvsc: make IOCTL call to open subchannels

2022-04-26 Thread Srikanth Kaka
make IOCTL call to open subchannels

Signed-off-by: Srikanth Kaka 
Signed-off-by: Vag Singh 
Signed-off-by: Anand Thulasiram 
---
 drivers/net/netvsc/hn_ethdev.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/netvsc/hn_ethdev.c b/drivers/net/netvsc/hn_ethdev.c
index 8b1e07b..104c7ae 100644
--- a/drivers/net/netvsc/hn_ethdev.c
+++ b/drivers/net/netvsc/hn_ethdev.c
@@ -516,6 +516,10 @@ static int hn_subchan_configure(struct hn_data *hv,
if (err)
return  err;
 
+   err = rte_vmbus_ioctl(hv->vmbus, subchan);
+   if (err)
+   return  err;
+
while (subchan > 0) {
struct vmbus_channel *new_sc;
uint16_t chn_index;
-- 
1.8.3.1



[PATCH v5 11/14] bus/vmbus: get subchannel info

2022-04-26 Thread Srikanth Kaka
Using sysctl, all the subchannel's attributes are fetched

Signed-off-by: Srikanth Kaka 
Signed-off-by: Vag Singh 
Signed-off-by: Anand Thulasiram 
---
 drivers/bus/vmbus/freebsd/vmbus_uio.c | 73 +++
 1 file changed, 73 insertions(+)

diff --git a/drivers/bus/vmbus/freebsd/vmbus_uio.c 
b/drivers/bus/vmbus/freebsd/vmbus_uio.c
index 438db41..6a9a196 100644
--- a/drivers/bus/vmbus/freebsd/vmbus_uio.c
+++ b/drivers/bus/vmbus/freebsd/vmbus_uio.c
@@ -155,6 +155,79 @@ const char *get_devname_os(void)
return "/dev/hv_uio";
 }
 
+int vmbus_uio_get_subchan(struct vmbus_channel *primary,
+ struct vmbus_channel **subchan)
+{
+   const struct rte_vmbus_device *dev = primary->device;
+   char sysctl_buf[PATH_MAX], sysctl_var[PATH_MAX];
+   size_t len = PATH_MAX, sysctl_len;
+   /* nr_schan, relid, subid & monid datatype must match kernel's for 
sysctl */
+   uint32_t relid, subid, nr_schan, i;
+   uint8_t monid;
+   int err;
+
+   /* get no. of sub-channels opened by hv_uio
+* dev.hv_uio.0.subchan_cnt
+*/
+   snprintf(sysctl_var, len, "dev.%s.%d.subchan_cnt", driver_name,
+dev->uio_num);
+   sysctl_len = sizeof(nr_schan);
+   if (sysctlbyname(sysctl_var, &nr_schan, &sysctl_len, NULL, 0) < 0) {
+   VMBUS_LOG(ERR, "could not read %s : %s", sysctl_var,
+   strerror(errno));
+   return -1;
+   }
+
+   /* dev.hv_uio.0.channel.14.sub */
+   snprintf(sysctl_buf, len, "dev.%s.%d.channel.%u.sub", driver_name,
+dev->uio_num, primary->relid);
+   for (i = 1; i <= nr_schan; i++) {
+   /* get relid */
+   snprintf(sysctl_var, len, "%s.%u.chanid", sysctl_buf, i);
+   sysctl_len = sizeof(relid);
+   if (sysctlbyname(sysctl_var, &relid, &sysctl_len, NULL, 0) < 0) 
{
+   VMBUS_LOG(ERR, "could not read %s : %s", sysctl_var,
+   strerror(errno));
+   goto error;
+   }
+
+   if (!vmbus_isnew_subchannel(primary, (uint16_t)relid)) {
+   VMBUS_LOG(DEBUG, "skip already found channel: %u",
+   relid);
+   continue;
+   }
+
+   /* get sub-channel id */
+   snprintf(sysctl_var, len, "%s.%u.ch_subidx", sysctl_buf, i);
+   sysctl_len = sizeof(subid);
+   if (sysctlbyname(sysctl_var, &subid, &sysctl_len, NULL, 0) < 0) 
{
+   VMBUS_LOG(ERR, "could not read %s : %s", sysctl_var,
+   strerror(errno));
+   goto error;
+   }
+
+   /* get monitor id */
+   snprintf(sysctl_var, len, "%s.%u.monitor_id", sysctl_buf, i);
+   sysctl_len = sizeof(monid);
+   if (sysctlbyname(sysctl_var, &monid, &sysctl_len, NULL, 0) < 0) 
{
+   VMBUS_LOG(ERR, "could not read %s : %s", sysctl_var,
+   strerror(errno));
+   goto error;
+   }
+
+   err = vmbus_chan_create(dev, (uint16_t)relid, (uint16_t)subid,
+   monid, subchan);
+   if (err) {
+   VMBUS_LOG(ERR, "subchannel setup failed");
+   return err;
+   }
+   break;
+   }
+   return 0;
+error:
+   return -1;
+}
+
 int vmbus_uio_subchan_open(struct rte_vmbus_device *dev, uint32_t subchan)
 {
struct mapped_vmbus_resource *uio_res;
-- 
1.8.3.1



[PATCH v5 12/14] net/netvsc: moving hotplug retry to OS dir

2022-04-26 Thread Srikanth Kaka
Moved netvsc_hotplug_retry to respective OS dir as it contains OS
dependent code. For Linux, it is copied as is and for FreeBSD it
is not supported yet.

Signed-off-by: Srikanth Kaka 
Signed-off-by: Vag Singh 
Signed-off-by: Anand Thulasiram 
---
 drivers/net/netvsc/freebsd/hn_os.c |  5 +++
 drivers/net/netvsc/hn_ethdev.c | 84 ---
 drivers/net/netvsc/hn_os.h |  2 +
 drivers/net/netvsc/linux/hn_os.c   | 90 ++
 4 files changed, 97 insertions(+), 84 deletions(-)

diff --git a/drivers/net/netvsc/freebsd/hn_os.c 
b/drivers/net/netvsc/freebsd/hn_os.c
index 4c6a798..fece1be 100644
--- a/drivers/net/netvsc/freebsd/hn_os.c
+++ b/drivers/net/netvsc/freebsd/hn_os.c
@@ -14,3 +14,8 @@ int eth_hn_os_dev_event(void)
PMD_DRV_LOG(DEBUG, "rte_dev_event_monitor_start not supported on 
FreeBSD");
return 0;
 }
+
+void netvsc_hotplug_retry(void *args)
+{
+   RTE_SET_USED(args);
+}
diff --git a/drivers/net/netvsc/hn_ethdev.c b/drivers/net/netvsc/hn_ethdev.c
index 104c7ae..dd4b872 100644
--- a/drivers/net/netvsc/hn_ethdev.c
+++ b/drivers/net/netvsc/hn_ethdev.c
@@ -57,9 +57,6 @@
 #define NETVSC_ARG_TXBREAK "tx_copybreak"
 #define NETVSC_ARG_RX_EXTMBUF_ENABLE "rx_extmbuf_enable"
 
-/* The max number of retry when hot adding a VF device */
-#define NETVSC_MAX_HOTADD_RETRY 10
-
 struct hn_xstats_name_off {
char name[RTE_ETH_XSTATS_NAME_SIZE];
unsigned int offset;
@@ -556,87 +553,6 @@ static int hn_subchan_configure(struct hn_data *hv,
return err;
 }
 
-static void netvsc_hotplug_retry(void *args)
-{
-   int ret;
-   struct hn_data *hv = args;
-   struct rte_eth_dev *dev = &rte_eth_devices[hv->port_id];
-   struct rte_devargs *d = &hv->devargs;
-   char buf[256];
-
-   DIR *di;
-   struct dirent *dir;
-   struct ifreq req;
-   struct rte_ether_addr eth_addr;
-   int s;
-
-   PMD_DRV_LOG(DEBUG, "%s: retry count %d",
-   __func__, hv->eal_hot_plug_retry);
-
-   if (hv->eal_hot_plug_retry++ > NETVSC_MAX_HOTADD_RETRY)
-   return;
-
-   snprintf(buf, sizeof(buf), "/sys/bus/pci/devices/%s/net", d->name);
-   di = opendir(buf);
-   if (!di) {
-   PMD_DRV_LOG(DEBUG, "%s: can't open directory %s, "
-   "retrying in 1 second", __func__, buf);
-   goto retry;
-   }
-
-   while ((dir = readdir(di))) {
-   /* Skip . and .. directories */
-   if (!strcmp(dir->d_name, ".") || !strcmp(dir->d_name, ".."))
-   continue;
-
-   /* trying to get mac address if this is a network device*/
-   s = socket(PF_INET, SOCK_DGRAM, IPPROTO_IP);
-   if (s == -1) {
-   PMD_DRV_LOG(ERR, "Failed to create socket errno %d",
-   errno);
-   break;
-   }
-   strlcpy(req.ifr_name, dir->d_name, sizeof(req.ifr_name));
-   ret = ioctl(s, SIOCGIFHWADDR, &req);
-   close(s);
-   if (ret == -1) {
-   PMD_DRV_LOG(ERR,
-   "Failed to send SIOCGIFHWADDR for device 
%s",
-   dir->d_name);
-   break;
-   }
-   if (req.ifr_hwaddr.sa_family != ARPHRD_ETHER) {
-   closedir(di);
-   return;
-   }
-   memcpy(eth_addr.addr_bytes, req.ifr_hwaddr.sa_data,
-  RTE_DIM(eth_addr.addr_bytes));
-
-   if (rte_is_same_ether_addr(ð_addr, dev->data->mac_addrs)) {
-   PMD_DRV_LOG(NOTICE,
-   "Found matching MAC address, adding device 
%s network name %s",
-   d->name, dir->d_name);
-   ret = rte_eal_hotplug_add(d->bus->name, d->name,
- d->args);
-   if (ret) {
-   PMD_DRV_LOG(ERR,
-   "Failed to add PCI device %s",
-   d->name);
-   break;
-   }
-   }
-   /* When the code reaches here, we either have already added
-* the device, or its MAC address did not match.
-*/
-   closedir(di);
-   return;
-   }
-   closedir(di);
-retry:
-   /* The device is still being initialized, retry after 1 second */
-   rte_eal_alarm_set(100, netvsc_hotplug_retry, hv);
-}
-
 static void
 netvsc_hotadd_callback(const char *device_name, enum rte_dev_event_type type,
   void *arg)
diff --git a/drivers/net/netvsc/hn_os.h b/drivers/net/netvsc/hn_os.h
index 618c53c..1fb7292 100644
--- a/drivers/net/

[PATCH v5 13/14] bus/vmbus: add meson support for FreeBSD

2022-04-26 Thread Srikanth Kaka
add meson support for FreeBSD OS

Signed-off-by: Srikanth Kaka 
Signed-off-by: Vag Singh 
Signed-off-by: Anand Thulasiram 
---
 drivers/bus/vmbus/meson.build | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/bus/vmbus/meson.build b/drivers/bus/vmbus/meson.build
index 60913d0..77f18ce 100644
--- a/drivers/bus/vmbus/meson.build
+++ b/drivers/bus/vmbus/meson.build
@@ -26,7 +26,11 @@ if is_linux
 sources += files('linux/vmbus_bus.c',
 'linux/vmbus_uio.c')
 includes += include_directories('linux')
+elif is_freebsd
+sources += files('freebsd/vmbus_bus.c',
+ 'freebsd/vmbus_uio.c')
+includes += include_directories('freebsd')
 else
 build = false
-reason = 'only supported on Linux'
+reason = 'only supported on Linux & FreeBSD'
 endif
-- 
1.8.3.1



[PATCH v5 14/14] bus/vmbus: update MAINTAINERS and docs

2022-04-26 Thread Srikanth Kaka
updated MAINTAINERS and doc files for FreeBSD support

Signed-off-by: Srikanth Kaka 
Signed-off-by: Vag Singh 
Signed-off-by: Anand Thulasiram 
---
 MAINTAINERS|  2 ++
 doc/guides/nics/netvsc.rst | 11 +++
 2 files changed, 13 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 7c4f541..01a494e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -567,6 +567,7 @@ F: app/test/test_vdev.c
 VMBUS bus driver
 M: Stephen Hemminger 
 M: Long Li 
+M: Srikanth Kaka 
 F: drivers/bus/vmbus/
 
 
@@ -823,6 +824,7 @@ F: doc/guides/nics/vdev_netvsc.rst
 Microsoft Hyper-V netvsc
 M: Stephen Hemminger 
 M: Long Li 
+M: Srikanth Kaka 
 F: drivers/net/netvsc/
 F: doc/guides/nics/netvsc.rst
 F: doc/guides/nics/features/netvsc.ini
diff --git a/doc/guides/nics/netvsc.rst b/doc/guides/nics/netvsc.rst
index 77efe1d..12d1702 100644
--- a/doc/guides/nics/netvsc.rst
+++ b/doc/guides/nics/netvsc.rst
@@ -91,6 +91,12 @@ operations:
 
The dpdk-devbind.py script can not be used since it only handles PCI 
devices.
 
+On FreeBSD, with hv_uio kernel driver loaded, do the following:
+
+.. code-block:: console
+
+devctl set driver -f hn1 hv_uio
+
 
 Prerequisites
 -
@@ -101,6 +107,11 @@ The following prerequisites apply:
 Full support of multiple queues requires the 4.17 kernel. It is possible
 to use the netvsc PMD with 4.16 kernel but it is limited to a single queue.
 
+*   FreeBSD support for UIO on vmbus is done with hv_uio driver and it is still
+in `review`_
+
+.. _`review`: https://reviews.freebsd.org/D32184
+
 
 Netvsc PMD arguments
 
-- 
1.8.3.1



Re: [dpdk-dev] [PATCH v3 6/7] app/proc-info: provide way to request info on owned ports

2022-04-26 Thread Subendu Santra
Hi Stephen,

We were going through the patch set: 
https://inbox.dpdk.org/dev/20200715212228.28010-7-step...@networkplumber.org/ 
and hoping to get clarification on the behaviour if post mask is not specified 
in the input to `dpdk-proc-info` tool.

Specifically, In PATCH v3 6/7, we see this:
+   /* If no port mask was specified, one will be provided */
+   if (enabled_port_mask == 0) {
+   RTE_ETH_FOREACH_DEV(i) {
+   enabled_port_mask |= 1u << i;

However, in PATCH v4 8/8, we see this:
+   /* If no port mask was specified, then show non-owned ports */
+   if (enabled_port_mask == 0) {
+   RTE_ETH_FOREACH_DEV(i)
+   enabled_port_mask = 1ul << i;
+   }

Was there any specific reason to show just the last non-owned port in case the 
port mask was not specified?
Should we show all non-owned ports in case the user doesn’t specify any port 
mask?

Regards,
Subendu.





librte_bpf: roadmap or any specific plans for this library

2022-04-26 Thread Björn Svensson A
Hi all,
I hope this is the correct maillist for this topic.

DPDK provides the nice library `librte_bpf` to load and execute eBPF bytecode
and we would like to broaden our usage of this library.

Today there are hints that this library might have been purpose built to enable 
inspection or modification of packets;
for example the eBPF program is expected to only use a single input argument, 
pointing to data of some sort.
We believe it would be beneficial to be able to use this library to run generic 
eBPF programs as well,
as an alternative to run them as RX- TX-port/queue callbacks (i.e. generic 
programs which only uses supported features)

I have seen some discussions regarding moving towards using a common library 
with the kernel implementation of bpf,
but I couldn't figure out the outcome.
My question is if there any plans to evolve this library or would improvements 
possibly be accepted?

Here are some improvements we are interested to look into:

* Add additional API for loading eBPF code.
  Today it's possible to load eBPF code from an ELF file, but having an API to 
load code from an ELF image from memory
  would open up for other ways to manage eBPF code.

  Example of the new API:
struct rte_bpf *
rte_bpf_elf_image_load(const struct rte_bpf_prm *prm, char *image,
   size_t size, const char *sname);

* Add support of more than a single input argument.
  There are cases when additional information is needed. Being able to use more 
than a single input argument
  would help when running generic eBPF programs.

  Example of change:
   struct rte_bpf_prm {
   ...
-struct rte_bpf_arg prog_arg; /**< eBPF program input arg description */
+uint32_t nb_args;
+struct rte_bpf_arg prog_args[EBPF_FUNC_MAX_ARGS]; /**< eBPF program 
input args */
   };

Any feedback regarding this is welcomed.
Best regard
Bjorn



RE: [PATCH v6 03/16] vhost: add vhost msg support

2022-04-26 Thread Pei, Andy
HI Chenbo, 

Thanks for your reply.
My reply is inline.

> -Original Message-
> From: Xia, Chenbo 
> Sent: Monday, April 25, 2022 8:42 PM
> To: Pei, Andy ; dev@dpdk.org
> Cc: maxime.coque...@redhat.com; Cao, Gang ; Liu,
> Changpeng 
> Subject: RE: [PATCH v6 03/16] vhost: add vhost msg support
> 
> Hi Andy,
> 
> > -Original Message-
> > From: Pei, Andy 
> > Sent: Thursday, April 21, 2022 4:34 PM
> > To: dev@dpdk.org
> > Cc: Xia, Chenbo ; maxime.coque...@redhat.com;
> > Cao, Gang ; Liu, Changpeng
> > 
> > Subject: [PATCH v6 03/16] vhost: add vhost msg support
> >
> > Add support for VHOST_USER_GET_CONFIG and
> VHOST_USER_SET_CONFIG.
> > VHOST_USER_GET_CONFIG and VHOST_USER_SET_CONFIG message is only
> > supported by virtio blk VDPA device.
> >
> > Signed-off-by: Andy Pei 
> > ---
> >  lib/vhost/vhost_user.c | 69
> > ++
> >  lib/vhost/vhost_user.h | 13 ++
> >  2 files changed, 82 insertions(+)
> >
> > diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c index
> > 1d39067..3780804 100644
> > --- a/lib/vhost/vhost_user.c
> > +++ b/lib/vhost/vhost_user.c
> > @@ -80,6 +80,8 @@
> > [VHOST_USER_NET_SET_MTU]  = "VHOST_USER_NET_SET_MTU",
> > [VHOST_USER_SET_SLAVE_REQ_FD]  =
> "VHOST_USER_SET_SLAVE_REQ_FD",
> > [VHOST_USER_IOTLB_MSG]  = "VHOST_USER_IOTLB_MSG",
> > +   [VHOST_USER_GET_CONFIG]  = "VHOST_USER_GET_CONFIG",
> > +   [VHOST_USER_SET_CONFIG]  = "VHOST_USER_SET_CONFIG",
> > [VHOST_USER_CRYPTO_CREATE_SESS] =
> "VHOST_USER_CRYPTO_CREATE_SESS",
> > [VHOST_USER_CRYPTO_CLOSE_SESS] =
> "VHOST_USER_CRYPTO_CLOSE_SESS",
> > [VHOST_USER_POSTCOPY_ADVISE]  =
> "VHOST_USER_POSTCOPY_ADVISE", @@
> > -2542,6 +2544,71 @@ static int is_vring_iotlb(struct virtio_net *dev,
> > }
> >
> >  static int
> > +vhost_user_get_config(struct virtio_net **pdev,
> > +   struct vhu_msg_context *ctx,
> > +   int main_fd __rte_unused)
> > +{
> > +   struct virtio_net *dev = *pdev;
> > +   struct rte_vdpa_device *vdpa_dev = dev->vdpa_dev;
> > +   int ret = 0;
> > +
> > +   if (vdpa_dev->ops->get_config) {
> > +   ret = vdpa_dev->ops->get_config(dev->vid,
> > +  ctx->msg.payload.cfg.region,
> > +  ctx->msg.payload.cfg.size);
> > +   if (ret != 0) {
> > +   ctx->msg.size = 0;
> > +   VHOST_LOG_CONFIG(ERR,
> > +"(%s) get_config() return error!\n",
> > +dev->ifname);
> > +   }
> > +   } else {
> > +   VHOST_LOG_CONFIG(ERR, "(%s) get_config() not
> supportted!\n",
> 
> Supported
> 
I will send out a new version to fix this.
> > +dev->ifname);
> > +   }
> > +
> > +   return RTE_VHOST_MSG_RESULT_REPLY;
> > +}
> > +
> > +static int
> > +vhost_user_set_config(struct virtio_net **pdev,
> > +   struct vhu_msg_context *ctx,
> > +   int main_fd __rte_unused)
> > +{
> > +   struct virtio_net *dev = *pdev;
> > +   struct rte_vdpa_device *vdpa_dev = dev->vdpa_dev;
> > +   int ret = 0;
> > +
> > +   if (ctx->msg.size != sizeof(struct vhost_user_config)) {
> 
> I think you should do sanity check on payload.cfg.size and make sure it's
> smaller than VHOST_USER_MAX_CONFIG_SIZE
> 
> and same check for offset
> 
I think payload.cfg.size can be smaller than or equal to 
VHOST_USER_MAX_CONFIG_SIZE.
payload.cfg.ofset can be smaller than or equal to VHOST_USER_MAX_CONFIG_SIZE as 
well

> > +   VHOST_LOG_CONFIG(ERR,
> > +   "(%s) invalid set config msg size: %"PRId32" != %d\n",
> > +   dev->ifname, ctx->msg.size,
> 
> Based on you will change the log too, payload.cfg.size is uint32_t, so PRId32 
> ->
> PRIu32
> 
> > +   (int)sizeof(struct vhost_user_config));
> 
> So this can be %u
> 
Sure.
> > +   goto OUT;
> > +   }
> > +
> > +   if (vdpa_dev->ops->set_config) {
> > +   ret = vdpa_dev->ops->set_config(dev->vid,
> > +   ctx->msg.payload.cfg.region,
> > +   ctx->msg.payload.cfg.offset,
> > +   ctx->msg.payload.cfg.size,
> > +   ctx->msg.payload.cfg.flags);
> > +   if (ret)
> > +   VHOST_LOG_CONFIG(ERR,
> > +"(%s) set_config() return error!\n",
> > +dev->ifname);
> > +   } else {
> > +   VHOST_LOG_CONFIG(ERR, "(%s) set_config() not
> supportted!\n",
> 
> Supported
> 
I will send out a new version to fix this.
> > +dev->ifname);
> > +   }
> > +
> > +   return RTE_VHOST_MSG_RESULT_OK;
> > +
> > +OUT:
> 
> Lower case looks better
> 
OK. I will send out a new version to fix this.
> > +   return RTE_VHOST_MSG_RESULT_ERR;
> > +}
> 
> Almost all handlers need check on expected fd num (this case is 0), so the
> above new 

RE: [PATCH] doc: fix support table for ETH and VLAN flow items

2022-04-26 Thread Asaf Penso
>-Original Message-
>From: Ferruh Yigit 
>Sent: Wednesday, April 20, 2022 8:52 PM
>To: Ilya Maximets ; dev@dpdk.org; Asaf Penso
>
>Cc: Ajit Khaparde ; Rahul Lakkireddy
>; Hemant Agrawal
>; Haiyue Wang ; John
>Daley ; Guoyang Zhou ;
>Min Hu (Connor) ; Beilei Xing
>; Jingjing Wu ; Qi Zhang
>; Rosen Xu ; Matan Azrad
>; Slava Ovsiienko ; Liron Himi
>; Jiawen Wu ; Ori Kam
>; Dekel Peled ; NBU-Contact-
>Thomas Monjalon (EXTERNAL) ; sta...@dpdk.org;
>NBU-Contact-Thomas Monjalon (EXTERNAL) 
>Subject: Re: [PATCH] doc: fix support table for ETH and VLAN flow items
>
>On 3/16/2022 12:01 PM, Ilya Maximets wrote:
>> 'has_vlan' attribute is only supported by sfc, mlx5 and cnxk.
>> Other drivers doesn't support it.  Most of them (like i40e) just
>> ignore it silently.  Some drivers (like mlx4) never had a full support
>> of the eth item even before introduction of 'has_vlan'
>> (mlx4 allows to match on the destination MAC only).
>>
>> Same for the 'has_more_vlan' flag of the vlan item.
>>
>> Changing the support level to 'partial' for all such drivers.
>> This doesn't solve the issue, but at least marks the problematic
>> drivers.
>>
>
>Hi Asaf,
>
>This was the kind of maintanance issue I was referring to have this kind of
>capability documentation for flow API.
>
Are you referring to the fact that fields like has_vlan are not part of the 
table?
If so, you are right, but IMHO having the high level items still allows the 
users to understand what is supported quickly.
We can have another level of tables per each relevant item to address this 
specific issue.
In this case, we'll have a table for ETH that elaborates the different fields' 
support, like has_vlan.
If you are referring to a different issue, please elaborate.


>All below drivers are using 'RTE_FLOW_ITEM_TYPE_VLAN', the script verifies
>this, but are they actually supporting VLAN filter and in which case?
>
>We need comment from driver maintainers about the support level.
 
@Ori Kam, please comment for mlx driver.

>
>> Some details are available in:
>>https://bugs.dpdk.org/show_bug.cgi?id=958
>>
>> Fixes: 09315fc83861 ("ethdev: add VLAN attributes to ethernet and VLAN
>> items")
>> Cc: sta...@dpdk.org
>>
>> Signed-off-by: Ilya Maximets 
>> ---
>>
>> I added the stable in CC, but the patch should be extended while
>> backporting.  For 21.11 the cnxk driver should be also updated, for
>> 20.11, sfc driver should also be included.
>>
>>   doc/guides/nics/features/bnxt.ini   | 4 ++--
>>   doc/guides/nics/features/cxgbe.ini  | 4 ++--
>>   doc/guides/nics/features/dpaa2.ini  | 4 ++--
>>   doc/guides/nics/features/e1000.ini  | 2 +-
>>   doc/guides/nics/features/enic.ini   | 4 ++--
>>   doc/guides/nics/features/hinic.ini  | 2 +-
>>   doc/guides/nics/features/hns3.ini   | 4 ++--
>>   doc/guides/nics/features/i40e.ini   | 4 ++--
>>   doc/guides/nics/features/iavf.ini   | 4 ++--
>>   doc/guides/nics/features/ice.ini| 4 ++--
>>   doc/guides/nics/features/igc.ini| 2 +-
>>   doc/guides/nics/features/ipn3ke.ini | 4 ++--
>>   doc/guides/nics/features/ixgbe.ini  | 4 ++--
>>   doc/guides/nics/features/mlx4.ini   | 4 ++--
>>   doc/guides/nics/features/mvpp2.ini  | 4 ++--
>>   doc/guides/nics/features/tap.ini| 4 ++--
>>   doc/guides/nics/features/txgbe.ini  | 4 ++--
>>   17 files changed, 31 insertions(+), 31 deletions(-)
>>
>> diff --git a/doc/guides/nics/features/bnxt.ini
>> b/doc/guides/nics/features/bnxt.ini
>> index afb5414b49..ac682c5779 100644
>> --- a/doc/guides/nics/features/bnxt.ini
>> +++ b/doc/guides/nics/features/bnxt.ini
>> @@ -57,7 +57,7 @@ Perf doc = Y
>>
>>   [rte_flow items]
>>   any  = Y
>> -eth  = Y
>> +eth  = P
>>   ipv4 = Y
>>   ipv6 = Y
>>   gre  = Y
>> @@ -71,7 +71,7 @@ represented_port = Y
>>   tcp  = Y
>>   udp  = Y
>>   vf   = Y
>> -vlan = Y
>> +vlan = P
>>   vxlan= Y
>>
>>   [rte_flow actions]
>> diff --git a/doc/guides/nics/features/cxgbe.ini
>> b/doc/guides/nics/features/cxgbe.ini
>> index f674803ec4..f9912390fb 100644
>> --- a/doc/guides/nics/features/cxgbe.ini
>> +++ b/doc/guides/nics/features/cxgbe.ini
>> @@ -36,7 +36,7 @@ x86-64   = Y
>>   Usage doc= Y
>>
>>   [rte_flow items]
>> -eth  = Y
>> +eth  = P
>>   ipv4 = Y
>>   ipv6 = Y
>>   pf   = Y
>> @@ -44,7 +44,7 @@ phy_port = Y
>>   tcp  = Y
>>   udp  = Y
>>   vf   = Y
>> -vlan = Y
>> +vlan = P
>>
>>   [rte_flow actions]
>>   count= Y
>> diff --git a/doc/guides/nics/features/dpaa2.ini
>> b/doc/guides/nics/features/dpaa2.ini
>> index 4c06841a87..09ce66c788 100644
>> --- a/doc/guides/nics/features/dpaa2.ini
>> +++ b/doc/guides/nics/features/dpaa2.ini
>> @@ -31,7 

RE: [PATCH 1/2] ci: switch to Ubuntu 20.04

2022-04-26 Thread Ruifeng Wang
> -Original Message-
> From: David Marchand 
> Sent: Tuesday, April 26, 2022 3:18 PM
> To: dev@dpdk.org
> Cc: Aaron Conole ; Michael Santana
> ; Ruifeng Wang ;
> Jan Viktorin ; Bruce Richardson
> ; David Christensen 
> Subject: [PATCH 1/2] ci: switch to Ubuntu 20.04
> 
> Ubuntu 18.04 is now rather old.
> Besides, other entities in our CI are also testing this distribution.
> 
> Switch to a newer Ubuntu release and benefit from more recent
> tool(chain)s: for example, net/cnxk now builds fine and can be re-enabled.
> 
> Signed-off-by: David Marchand 
> ---



> diff --git a/config/arm/arm64_armv8_linux_clang_ubuntu2004
> b/config/arm/arm64_armv8_linux_clang_ubuntu2004
> new file mode 12
> index 00..01f5b7643e
> --- /dev/null
> +++ b/config/arm/arm64_armv8_linux_clang_ubuntu2004

How about naming it without '2004'? 
It is a link to ubuntu1804 crossfile because distribution dependent paths in 
the file doesn't change. And I believe the consistency will be kept across 
distribution releases.
So we can use a file name without distribution release number for the 
latest/default Ubuntu environment. 
This removes the need for a new crossfile for each Ubuntu LTS release. 

Thanks.
> @@ -0,0 +1 @@
> +arm64_armv8_linux_clang_ubuntu1804
> \ No newline at end of file
> diff --git a/config/ppc/ppc64le-power8-linux-gcc-ubuntu2004
> b/config/ppc/ppc64le-power8-linux-gcc-ubuntu2004
> new file mode 12
> index 00..9d6139a19b
> --- /dev/null
> +++ b/config/ppc/ppc64le-power8-linux-gcc-ubuntu2004
> @@ -0,0 +1 @@
> +ppc64le-power8-linux-gcc-ubuntu1804
> \ No newline at end of file
> --
> 2.23.0



RE: [PATCH v6 03/16] vhost: add vhost msg support

2022-04-26 Thread Xia, Chenbo
> -Original Message-
> From: Pei, Andy 
> Sent: Tuesday, April 26, 2022 4:56 PM
> To: Xia, Chenbo ; dev@dpdk.org
> Cc: maxime.coque...@redhat.com; Cao, Gang ; Liu,
> Changpeng 
> Subject: RE: [PATCH v6 03/16] vhost: add vhost msg support
> 
> HI Chenbo,
> 
> Thanks for your reply.
> My reply is inline.
> 
> > -Original Message-
> > From: Xia, Chenbo 
> > Sent: Monday, April 25, 2022 8:42 PM
> > To: Pei, Andy ; dev@dpdk.org
> > Cc: maxime.coque...@redhat.com; Cao, Gang ; Liu,
> > Changpeng 
> > Subject: RE: [PATCH v6 03/16] vhost: add vhost msg support
> >
> > Hi Andy,
> >
> > > -Original Message-
> > > From: Pei, Andy 
> > > Sent: Thursday, April 21, 2022 4:34 PM
> > > To: dev@dpdk.org
> > > Cc: Xia, Chenbo ; maxime.coque...@redhat.com;
> > > Cao, Gang ; Liu, Changpeng
> > > 
> > > Subject: [PATCH v6 03/16] vhost: add vhost msg support
> > >
> > > Add support for VHOST_USER_GET_CONFIG and
> > VHOST_USER_SET_CONFIG.
> > > VHOST_USER_GET_CONFIG and VHOST_USER_SET_CONFIG message is only
> > > supported by virtio blk VDPA device.
> > >
> > > Signed-off-by: Andy Pei 
> > > ---
> > >  lib/vhost/vhost_user.c | 69
> > > ++
> > >  lib/vhost/vhost_user.h | 13 ++
> > >  2 files changed, 82 insertions(+)
> > >
> > > diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c index
> > > 1d39067..3780804 100644
> > > --- a/lib/vhost/vhost_user.c
> > > +++ b/lib/vhost/vhost_user.c
> > > @@ -80,6 +80,8 @@
> > >  [VHOST_USER_NET_SET_MTU]  = "VHOST_USER_NET_SET_MTU",
> > >  [VHOST_USER_SET_SLAVE_REQ_FD]  =
> > "VHOST_USER_SET_SLAVE_REQ_FD",
> > >  [VHOST_USER_IOTLB_MSG]  = "VHOST_USER_IOTLB_MSG",
> > > +[VHOST_USER_GET_CONFIG]  = "VHOST_USER_GET_CONFIG",
> > > +[VHOST_USER_SET_CONFIG]  = "VHOST_USER_SET_CONFIG",
> > >  [VHOST_USER_CRYPTO_CREATE_SESS] =
> > "VHOST_USER_CRYPTO_CREATE_SESS",
> > >  [VHOST_USER_CRYPTO_CLOSE_SESS] =
> > "VHOST_USER_CRYPTO_CLOSE_SESS",
> > >  [VHOST_USER_POSTCOPY_ADVISE]  =
> > "VHOST_USER_POSTCOPY_ADVISE", @@
> > > -2542,6 +2544,71 @@ static int is_vring_iotlb(struct virtio_net *dev,
> > > }
> > >
> > >  static int
> > > +vhost_user_get_config(struct virtio_net **pdev,
> > > +struct vhu_msg_context *ctx,
> > > +int main_fd __rte_unused)
> > > +{
> > > +struct virtio_net *dev = *pdev;
> > > +struct rte_vdpa_device *vdpa_dev = dev->vdpa_dev;
> > > +int ret = 0;
> > > +
> > > +if (vdpa_dev->ops->get_config) {
> > > +ret = vdpa_dev->ops->get_config(dev->vid,
> > > +   ctx->msg.payload.cfg.region,
> > > +   ctx->msg.payload.cfg.size);
> > > +if (ret != 0) {
> > > +ctx->msg.size = 0;
> > > +VHOST_LOG_CONFIG(ERR,
> > > + "(%s) get_config() return error!\n",
> > > + dev->ifname);
> > > +}
> > > +} else {
> > > +VHOST_LOG_CONFIG(ERR, "(%s) get_config() not
> > supportted!\n",
> >
> > Supported
> >
> I will send out a new version to fix this.
> > > + dev->ifname);
> > > +}
> > > +
> > > +return RTE_VHOST_MSG_RESULT_REPLY;
> > > +}
> > > +
> > > +static int
> > > +vhost_user_set_config(struct virtio_net **pdev,
> > > +struct vhu_msg_context *ctx,
> > > +int main_fd __rte_unused)
> > > +{
> > > +struct virtio_net *dev = *pdev;
> > > +struct rte_vdpa_device *vdpa_dev = dev->vdpa_dev;
> > > +int ret = 0;
> > > +
> > > +if (ctx->msg.size != sizeof(struct vhost_user_config)) {
> >
> > I think you should do sanity check on payload.cfg.size and make sure
> it's
> > smaller than VHOST_USER_MAX_CONFIG_SIZE
> >
> > and same check for offset
> >
> I think payload.cfg.size can be smaller than or equal to
> VHOST_USER_MAX_CONFIG_SIZE.
> payload.cfg.ofset can be smaller than or equal to
> VHOST_USER_MAX_CONFIG_SIZE as well

After double check: offset is the config space offset, so this should be checked
in vdpa driver. Size check on vhost lib layer should be just <= MAX_you_defined

Thanks,
Chenbo

> 
> > > +VHOST_LOG_CONFIG(ERR,
> > > +"(%s) invalid set config msg size: %"PRId32" != %d\n",
> > > +dev->ifname, ctx->msg.size,
> >
> > Based on you will change the log too, payload.cfg.size is uint32_t, so
> PRId32 ->
> > PRIu32
> >
> > > +(int)sizeof(struct vhost_user_config));
> >
> > So this can be %u
> >
> Sure.
> > > +goto OUT;
> > > +}
> > > +
> > > +if (vdpa_dev->ops->set_config) {
> > > +ret = vdpa_dev->ops->set_config(dev->vid,
> > > +ctx->msg.payload.cfg.region,
> > > +ctx->msg.payload.cfg.offset,
> > > +ctx->msg.payload.cfg.size,
> > > +ctx->msg.payload.cfg.flags);
> > > +if (ret)
> > > +VHOST_LOG_CONFIG(ERR,
> > > + "(%s) set_config() return error!\n",
> > > + dev->ifname);
> > > +} else {
> > > +VHOST_LOG_CONFIG(ERR, "(%s) set_config() not
> > supportted!\n",
> >
> > Supported
> >
> I will send out a new version to fix this.
> > > + dev->ifname);
> > > +}
> > > +
> > > +return RTE_VHOST_MSG_RESULT_OK;
> > > +
> > > +OUT:
> >
> > Lower case looks better
> >
> OK. I will send out a new version to fix this.
> > > +return RTE_VHOST_MSG_RESULT_ERR;
> > > +}
> >
> > Almost all handlers need check on expected fd num (this case is 0), so

RE: [PATCH] net/mlx5: fix rxq/txq stats memory access sync

2022-04-26 Thread Raslan Darawsheh
Hi,
> -Original Message-
> From: Raja Zidane 
> Sent: Wednesday, April 20, 2022 6:32 PM
> To: dev@dpdk.org
> Cc: Matan Azrad ; Slava Ovsiienko
> ; sta...@dpdk.org
> Subject: [PATCH] net/mlx5: fix rxq/txq stats memory access sync
> 
> Queue statistics are being continuously updated in Rx/Tx burst
> routines while handling traffic. In addition to that, statistics
> can be reset (written with zeroes) on statistics reset in other
> threads, causing a race condition, which in turn could result in
> wrong stats.
> 
> The patch provides an approach with reference values, allowing
> the actual counters to be writable within Rx/Tx burst threads
> only, and updating reference values on stats reset.
> 
> Fixes: 87011737b715 ("mlx5: add software counters")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Raja Zidane 
> Acked-by: Slava Ovsiienko 
> ---

Patch applied to next-net-mlx,

Kindest regards,
Raslan Darawsheh



RE: [PATCH v6 05/16] vdpa/ifc: add vDPA interrupt for blk device

2022-04-26 Thread Pei, Andy
Hi Chenbo,

Thanks for your reply.
My reply is inline.

> -Original Message-
> From: Xia, Chenbo 
> Sent: Monday, April 25, 2022 8:58 PM
> To: Pei, Andy ; dev@dpdk.org
> Cc: maxime.coque...@redhat.com; Cao, Gang ; Liu,
> Changpeng 
> Subject: RE: [PATCH v6 05/16] vdpa/ifc: add vDPA interrupt for blk device
> 
> Hi Andy,
> 
> > -Original Message-
> > From: Pei, Andy 
> > Sent: Thursday, April 21, 2022 4:34 PM
> > To: dev@dpdk.org
> > Cc: Xia, Chenbo ; maxime.coque...@redhat.com;
> > Cao, Gang ; Liu, Changpeng
> > 
> > Subject: [PATCH v6 05/16] vdpa/ifc: add vDPA interrupt for blk device
> >
> > For the block device type, we have to relay the commands on all
> > queues.
> 
> It's a bit short... although I can understand, please add some background on
> current implementation for others to easily understand.
> 
Sure, I will send a new patch set to address this.
> >
> > Signed-off-by: Andy Pei 
> > ---
> >  drivers/vdpa/ifc/ifcvf_vdpa.c | 46
> > --
> > -
> >  1 file changed, 35 insertions(+), 11 deletions(-)
> >
> > diff --git a/drivers/vdpa/ifc/ifcvf_vdpa.c
> > b/drivers/vdpa/ifc/ifcvf_vdpa.c index 8ee041f..8d104b7 100644
> > --- a/drivers/vdpa/ifc/ifcvf_vdpa.c
> > +++ b/drivers/vdpa/ifc/ifcvf_vdpa.c
> > @@ -370,24 +370,48 @@ struct rte_vdpa_dev_info {
> > irq_set->index = VFIO_PCI_MSIX_IRQ_INDEX;
> > irq_set->start = 0;
> > fd_ptr = (int *)&irq_set->data;
> > +   /* The first interrupt is for the configure space change
> > notification */
> > fd_ptr[RTE_INTR_VEC_ZERO_OFFSET] =
> > rte_intr_fd_get(internal->pdev->intr_handle);
> >
> > for (i = 0; i < nr_vring; i++)
> > internal->intr_fd[i] = -1;
> >
> > -   for (i = 0; i < nr_vring; i++) {
> > -   rte_vhost_get_vhost_vring(internal->vid, i, &vring);
> > -   fd_ptr[RTE_INTR_VEC_RXTX_OFFSET + i] = vring.callfd;
> > -   if ((i & 1) == 0 && m_rx == true) {
> > -   fd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC);
> > -   if (fd < 0) {
> > -   DRV_LOG(ERR, "can't setup eventfd: %s",
> > -   strerror(errno));
> > -   return -1;
> > +   if (internal->device_type == IFCVF_NET) {
> > +   for (i = 0; i < nr_vring; i++) {
> > +   rte_vhost_get_vhost_vring(internal->vid, i, &vring);
> > +   fd_ptr[RTE_INTR_VEC_RXTX_OFFSET + i] = vring.callfd;
> > +   if ((i & 1) == 0 && m_rx == true) {
> > +   /* For the net we only need to relay rx queue,
> > +* which will change the mem of VM.
> > +*/
> > +   fd = eventfd(0, EFD_NONBLOCK |
> EFD_CLOEXEC);
> > +   if (fd < 0) {
> > +   DRV_LOG(ERR, "can't setup
> eventfd: %s",
> > +   strerror(errno));
> > +   return -1;
> > +   }
> > +   internal->intr_fd[i] = fd;
> > +   fd_ptr[RTE_INTR_VEC_RXTX_OFFSET + i] = fd;
> > +   }
> > +   }
> > +   } else if (internal->device_type == IFCVF_BLK) {
> > +   for (i = 0; i < nr_vring; i++) {
> > +   rte_vhost_get_vhost_vring(internal->vid, i, &vring);
> > +   fd_ptr[RTE_INTR_VEC_RXTX_OFFSET + i] = vring.callfd;
> > +   if (m_rx == true) {
> > +   /* For the blk we need to relay all the read
> cmd
> > +* of each queue
> > +*/
> > +   fd = eventfd(0, EFD_NONBLOCK |
> EFD_CLOEXEC);
> > +   if (fd < 0) {
> > +   DRV_LOG(ERR, "can't setup
> eventfd: %s",
> > +   strerror(errno));
> > +   return -1;
> > +   }
> > +   internal->intr_fd[i] = fd;
> > +   fd_ptr[RTE_INTR_VEC_RXTX_OFFSET + i] = fd;
> 
> Many duplicated code here for blk and net. What if we use this condition to
> know creating eventfd or not:
> 
> if (m_rx == true && (is_blk_dev || (i & 1) == 0)) {
>   /* create eventfd and save now */
> }
> 
Sure, I will send a new patch set to address this.
> Thanks,
> Chenbo
> 
> > }
> > -   internal->intr_fd[i] = fd;
> > -   fd_ptr[RTE_INTR_VEC_RXTX_OFFSET + i] = fd;
> > }
> > }
> >
> > --
> > 1.8.3.1
> 



RE: [PATCH v6 06/16] vdpa/ifc: add block device SW live-migration

2022-04-26 Thread Pei, Andy
Hi Chenbo,

Thanks for your reply.
My reply is inline.

> -Original Message-
> From: Xia, Chenbo 
> Sent: Monday, April 25, 2022 9:10 PM
> To: Pei, Andy ; dev@dpdk.org
> Cc: maxime.coque...@redhat.com; Cao, Gang ; Liu,
> Changpeng 
> Subject: RE: [PATCH v6 06/16] vdpa/ifc: add block device SW live-migration
> 
> > -Original Message-
> > From: Pei, Andy 
> > Sent: Thursday, April 21, 2022 4:34 PM
> > To: dev@dpdk.org
> > Cc: Xia, Chenbo ; maxime.coque...@redhat.com;
> > Cao, Gang ; Liu, Changpeng
> > 
> > Subject: [PATCH v6 06/16] vdpa/ifc: add block device SW live-migration
> >
> > Add SW live-migration support to block device.
> > Add dirty page logging to block device.
> 
> Add SW live-migration support including dirty page logging for block device.
> 
Sure, I will remove " Add dirty page logging to block device." In next version.
> >
> > Signed-off-by: Andy Pei 
> > ---
> >  drivers/vdpa/ifc/base/ifcvf.c |   4 +-
> >  drivers/vdpa/ifc/base/ifcvf.h |   6 ++
> >  drivers/vdpa/ifc/ifcvf_vdpa.c | 128
> > +++--
> > -
> >  3 files changed, 115 insertions(+), 23 deletions(-)
> >
> > diff --git a/drivers/vdpa/ifc/base/ifcvf.c
> > b/drivers/vdpa/ifc/base/ifcvf.c index d10c1fd..e417c50 100644
> > --- a/drivers/vdpa/ifc/base/ifcvf.c
> > +++ b/drivers/vdpa/ifc/base/ifcvf.c
> > @@ -191,7 +191,7 @@
> > IFCVF_WRITE_REG32(val >> 32, hi);
> >  }
> >
> > -STATIC int
> > +int
> >  ifcvf_hw_enable(struct ifcvf_hw *hw)
> >  {
> > struct ifcvf_pci_common_cfg *cfg;
> > @@ -240,7 +240,7 @@
> > return 0;
> >  }
> >
> > -STATIC void
> > +void
> >  ifcvf_hw_disable(struct ifcvf_hw *hw)  {
> > u32 i;
> > diff --git a/drivers/vdpa/ifc/base/ifcvf.h
> > b/drivers/vdpa/ifc/base/ifcvf.h index 769c603..6dd7925 100644
> > --- a/drivers/vdpa/ifc/base/ifcvf.h
> > +++ b/drivers/vdpa/ifc/base/ifcvf.h
> > @@ -179,4 +179,10 @@ struct ifcvf_hw {
> >  u64
> >  ifcvf_get_queue_notify_off(struct ifcvf_hw *hw, int qid);
> >
> > +int
> > +ifcvf_hw_enable(struct ifcvf_hw *hw);
> > +
> > +void
> > +ifcvf_hw_disable(struct ifcvf_hw *hw);
> > +
> >  #endif /* _IFCVF_H_ */
> > diff --git a/drivers/vdpa/ifc/ifcvf_vdpa.c
> > b/drivers/vdpa/ifc/ifcvf_vdpa.c index 8d104b7..a23dc2d 100644
> > --- a/drivers/vdpa/ifc/ifcvf_vdpa.c
> > +++ b/drivers/vdpa/ifc/ifcvf_vdpa.c
> > @@ -345,6 +345,56 @@ struct rte_vdpa_dev_info {
> > }
> >  }
> >
> > +static void
> > +vdpa_ifcvf_blk_pause(struct ifcvf_internal *internal) {
> > +   struct ifcvf_hw *hw = &internal->hw;
> > +   struct rte_vhost_vring vq;
> > +   int i, vid;
> > +   uint64_t features = 0;
> > +   uint64_t log_base = 0, log_size = 0;
> > +   uint64_t len;
> > +
> > +   vid = internal->vid;
> > +
> > +   if (internal->device_type == IFCVF_BLK) {
> > +   for (i = 0; i < hw->nr_vring; i++) {
> > +   rte_vhost_get_vhost_vring(internal->vid, i, &vq);
> > +   while (vq.avail->idx != vq.used->idx) {
> > +   ifcvf_notify_queue(hw, i);
> > +   usleep(10);
> > +   }
> > +   hw->vring[i].last_avail_idx = vq.avail->idx;
> > +   hw->vring[i].last_used_idx = vq.used->idx;
> > +   }
> > +   }
> > +
> > +   ifcvf_hw_disable(hw);
> > +
> > +   for (i = 0; i < hw->nr_vring; i++)
> > +   rte_vhost_set_vring_base(vid, i, hw->vring[i].last_avail_idx,
> > +   hw->vring[i].last_used_idx);
> > +
> > +   if (internal->sw_lm)
> > +   return;
> > +
> > +   rte_vhost_get_negotiated_features(vid, &features);
> > +   if (RTE_VHOST_NEED_LOG(features)) {
> > +   ifcvf_disable_logging(hw);
> > +   rte_vhost_get_log_base(internal->vid, &log_base, &log_size);
> > +   rte_vfio_container_dma_unmap(internal->vfio_container_fd,
> > +   log_base, IFCVF_LOG_BASE, log_size);
> > +   /*
> > +* IFCVF marks dirty memory pages for only packet buffer,
> > +* SW helps to mark the used ring as dirty after device stops.
> > +*/
> > +   for (i = 0; i < hw->nr_vring; i++) {
> > +   len = IFCVF_USED_RING_LEN(hw->vring[i].size);
> > +   rte_vhost_log_used_vring(vid, i, 0, len);
> > +   }
> > +   }
> > +}
> 
> Can we consider combining vdpa_ifcvf_blk_pause and vdpa_ifcvf_stop to
> one function and check device type internally to do different things? Because
> as I see, most logic is the same.
> 
OK, I will address it in next version.
> > +
> >  #define MSIX_IRQ_SET_BUF_LEN (sizeof(struct vfio_irq_set) + \
> > sizeof(int) * (IFCVF_MAX_QUEUES * 2 + 1))  static int @@ -
> 659,15
> > +709,22 @@ struct rte_vdpa_dev_info {
> > }
> > hw->vring[i].avail = gpa;
> >
> > -   /* Direct I/O for Tx queue, relay for Rx queue */
> > -   if (i & 1) {
> > -   gpa = hva_to_gpa(vid, (uint64_t)(uintptr_t)vq.used);
> > - 

Re: [DPDK v4] net/ixgbe: promote MDIO API

2022-04-26 Thread Ray Kinsella


Zeng, ZhichaoX  writes:

> Hi, Ray, David:
>
> What is your opinion on this patch?
>
> Regards,
> Zhichao
>
> -Original Message-
> From: Zeng, ZhichaoX  
> Sent: Tuesday, April 19, 2022 7:06 PM
> To: dev@dpdk.org
> Cc: Yang, Qiming ; Wang, Haiyue 
> ; m...@ashroe.eu; Zeng, ZhichaoX 
> 
> Subject: [DPDK v4] net/ixgbe: promote MDIO API
>
> From: Zhichao Zeng 
>
> Promote the MDIO APIs to be stable.
>
> Signed-off-by: Zhichao Zeng 
> ---
>  drivers/net/ixgbe/rte_pmd_ixgbe.h |  5 -
>  drivers/net/ixgbe/version.map | 10 +-
>  2 files changed, 5 insertions(+), 10 deletions(-)
>

Acked-by: Ray Kinsella 


Re: [PATCH v2 04/28] common/cnxk: support to configure the ts pkind in CPT

2022-04-26 Thread Ray Kinsella


Nithin Dabilpuram  writes:

> From: Vidya Sagar Velumuri 
>
> Add new API to configure the SA table entries with new CPT PKIND
> when timestamp is enabled.
>
> Signed-off-by: Vidya Sagar Velumuri 
> ---
>  drivers/common/cnxk/roc_nix_inl.c  | 59 
> ++
>  drivers/common/cnxk/roc_nix_inl.h  |  2 ++
>  drivers/common/cnxk/roc_nix_inl_priv.h |  1 +
>  drivers/common/cnxk/version.map|  1 +
>  4 files changed, 63 insertions(+)
>
Acked-by: Ray Kinsella 


Re: [dpdk-dev][PATCH 3/3] net/cnxk: adding cnxk support to configure custom sa index

2022-04-26 Thread Ray Kinsella


kirankum...@marvell.com writes:

> From: Kiran Kumar K 
>
> Adding cnxk device driver support to configure custom sa index.
> Custom sa index can be configured as part of the session create
> as SPI, and later original SPI can be updated using session update.
>
> Signed-off-by: Kiran Kumar K 
> ---
>  doc/api/doxy-api-index.md   |   3 +-
>  doc/api/doxy-api.conf.in|   1 +
>  drivers/net/cnxk/cn10k_ethdev_sec.c | 107 +++-
>  drivers/net/cnxk/cn9k_ethdev.c  |   6 ++
>  drivers/net/cnxk/cn9k_ethdev_sec.c  |   2 +-
>  drivers/net/cnxk/cnxk_ethdev.h  |   3 +-
>  drivers/net/cnxk/cnxk_ethdev_sec.c  |  30 +---
>  drivers/net/cnxk/cnxk_flow.c|   1 +
>  drivers/net/cnxk/meson.build|   2 +
>  drivers/net/cnxk/rte_pmd_cnxk.h |  94 
>  drivers/net/cnxk/version.map|   6 ++
>  11 files changed, 240 insertions(+), 15 deletions(-)
>  create mode 100644 drivers/net/cnxk/rte_pmd_cnxk.h
>
> diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
> index 4245b9635c..8f9564ee84 100644
> --- a/doc/api/doxy-api-index.md
> +++ b/doc/api/doxy-api-index.md
> @@ -56,7 +56,8 @@ The public API headers are grouped by topics:
>[dpaa2_qdma] (@ref rte_pmd_dpaa2_qdma.h),
>[crypto_scheduler]   (@ref rte_cryptodev_scheduler.h),
>[dlb2]   (@ref rte_pmd_dlb2.h),
> -  [ifpga]  (@ref rte_pmd_ifpga.h)
> +  [ifpga]  (@ref rte_pmd_ifpga.h),
> +  [cnxk]   (@ref rte_pmd_cnxk.h)
>  
>  - **memory**:
>[memseg] (@ref rte_memory.h),
> diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in
> index db2ca9b6ed..b49942412d 100644
> --- a/doc/api/doxy-api.conf.in
> +++ b/doc/api/doxy-api.conf.in
> @@ -12,6 +12,7 @@ INPUT   = 
> @TOPDIR@/doc/api/doxy-api-index.md \
>@TOPDIR@/drivers/net/ark \
>@TOPDIR@/drivers/net/bnxt \
>@TOPDIR@/drivers/net/bonding \
> +  @TOPDIR@/drivers/net/cnxk \
>@TOPDIR@/drivers/net/dpaa \
>@TOPDIR@/drivers/net/dpaa2 \
>@TOPDIR@/drivers/net/i40e \
> diff --git a/drivers/net/cnxk/cn10k_ethdev_sec.c 
> b/drivers/net/cnxk/cn10k_ethdev_sec.c
> index 87bb691ab4..60ae5d7d99 100644
> --- a/drivers/net/cnxk/cn10k_ethdev_sec.c
> +++ b/drivers/net/cnxk/cn10k_ethdev_sec.c
> @@ -6,6 +6,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -502,7 +503,7 @@ cn10k_eth_sec_session_create(void *device,
> ROC_NIX_INL_OT_IPSEC_OUTB_SW_RSVD);
>  
>   /* Alloc an sa index */
> - rc = cnxk_eth_outb_sa_idx_get(dev, &sa_idx);
> + rc = cnxk_eth_outb_sa_idx_get(dev, &sa_idx, ipsec->spi);
>   if (rc)
>   goto mempool_put;
>  
> @@ -657,6 +658,109 @@ cn10k_eth_sec_capabilities_get(void *device 
> __rte_unused)
>   return cn10k_eth_sec_capabilities;
>  }
>  
> +static int
> +cn10k_eth_sec_session_update(void *device, struct rte_security_session *sess,
> +  struct rte_security_session_conf *conf)
> +{
> + struct rte_eth_dev *eth_dev = (struct rte_eth_dev *)device;
> + struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
> + struct roc_ot_ipsec_inb_sa *inb_sa_dptr;
> + struct rte_security_ipsec_xform *ipsec;
> + struct rte_crypto_sym_xform *crypto;
> + struct cnxk_eth_sec_sess *eth_sec;
> + bool inbound;
> + int rc;
> +
> + if (conf->action_type != RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL ||
> + conf->protocol != RTE_SECURITY_PROTOCOL_IPSEC)
> + return -ENOENT;
> +
> + ipsec = &conf->ipsec;
> + crypto = conf->crypto_xform;
> + inbound = !!(ipsec->direction == RTE_SECURITY_IPSEC_SA_DIR_INGRESS);
> +
> + eth_sec = cnxk_eth_sec_sess_get_by_sess(dev, sess);
> + if (!eth_sec)
> + return -ENOENT;
> +
> + eth_sec->spi = conf->ipsec.spi;
> +
> + if (inbound) {
> + inb_sa_dptr = (struct roc_ot_ipsec_inb_sa *)dev->inb.sa_dptr;
> + memset(inb_sa_dptr, 0, sizeof(struct roc_ot_ipsec_inb_sa));
> +
> + rc = cnxk_ot_ipsec_inb_sa_fill(inb_sa_dptr, ipsec, crypto,
> +true);
> + if (rc)
> + return -EINVAL;
> +
> + rc = roc_nix_inl_ctx_write(&dev->nix, inb_sa_dptr, eth_sec->sa,
> +eth_sec->inb,
> +sizeof(struct roc_ot_ipsec_inb_sa));
> + if (rc)
> + return -EINVAL;
> + } else {
> + struct roc_ot_ipsec_outb_sa *outb_sa_dptr;
> +
> + outb_sa_dptr = (struct roc_ot_ipsec_outb_sa *)dev->outb.sa_dptr;
> + memset(outb_sa_dptr, 0, sizeof(struct roc_ot

Re: [dpdk-dev] [PATCH v4] ethdev: mtr: support protocol based input color selection

2022-04-26 Thread Ray Kinsella


jer...@marvell.com writes:

> From: Jerin Jacob 
>
> Currently, meter object supports only DSCP based on input color table,
> The patch enhance that to support VLAN based input color table,
> color table based on inner field for the tunnel use case, and
> support for fallback color per meter if packet based on a different field.
>
> All of the above features are exposed through capability and added
> additional capability to specify the implementation supports
> more than one input color table per ethdev port.
>
> Suggested-by: Cristian Dumitrescu 
> Signed-off-by: Jerin Jacob 
> ---
> v4..v3:
>
> - Aligned with community meeting call which is documented in
> https://patches.dpdk.org/project/dpdk/patch/20220301085824.1041009-1-sk...@marvell.com/
> as last message. With following exception, 
> - Used RTE_MTR_COLOR_IN_*_DSCP instead of RTE_MTR_COLOR_IN_*_IP as
> there is already dscp_table and rte_mtr_meter_dscp_table_update() API.
> Changing above symbols break existing application for no good.
> - Updated 22.07 release notes
> - Remove testpmd changes from series to finalize the API spec first and
>   then we can send testpmd changes.
>
> v3..v2:
>
> - Fix input color flags as a bitmask
> - Add definitions for newly added API
>
> v2..v1:
> - Fix seperate typo
>
> v1..RFC:
>
> Address the review comments by Cristian at
> https://patches.dpdk.org/project/dpdk/patch/20210820082401.3778736-1-jer...@marvell.com/
>
> - Moved to v22.07 release
> - Updated rte_mtr_input_color_method to support all VLAN, DSCP, Inner
>   cases
> - Added input_color_method
> - Removed union between vlan_table and dscp_table
> - Kept VLAN instead of PCP as HW coloring based on DEI(1bit), PCP(3
>   bits)
>
>  .../traffic_metering_and_policing.rst |  33 
>  doc/guides/rel_notes/release_22_07.rst|  10 +
>  lib/ethdev/rte_mtr.c  |  23 +++
>  lib/ethdev/rte_mtr.h  | 186 +-
>  lib/ethdev/rte_mtr_driver.h   |  19 ++
>  lib/ethdev/version.map|   4 +
>  6 files changed, 265 insertions(+), 10 deletions(-)
>
> diff --git a/doc/guides/prog_guide/traffic_metering_and_policing.rst 
> b/doc/guides/prog_guide/traffic_metering_and_policing.rst
> index ceb5a96488..75deabbaf1 100644
> --- a/doc/guides/prog_guide/traffic_metering_and_policing.rst
> +++ b/doc/guides/prog_guide/traffic_metering_and_policing.rst
> @@ -21,6 +21,7 @@ The main features are:
>  * Policer actions (per meter output color): recolor, drop
>  * Statistics (per policer output color)
>  * Chaining multiple meter objects
> +* Protocol based input color selection
>  
>  Configuration steps
>  ---
> @@ -105,3 +106,35 @@ traffic meter and policing library.
> * Adding one (or multiple) actions of the type 
> ``RTE_FLOW_ACTION_TYPE_METER``
>   to the list of meter actions (``struct 
> rte_mtr_meter_policy_params::actions``)
>   specified per color as show in :numref:`figure_rte_mtr_chaining`.
> +
> +Protocol based input color selection
> +
> +
> +The API supports selecting the input color based on the packet content.
> +Following is the API usage model for the same.
> +
> +#. Probe the protocol based input color selection device capabilities using
> +   following parameter using ``rte_mtr_capabilities_get()`` API.
> +
> +   * ``struct rte_mtr_capabilities::input_color_proto_mask;``
> +   * ``struct rte_mtr_capabilities::separate_input_color_table_per_port``
> +
> +#. When creating the meter object using ``rte_mtr_create()``, configure
> +   relevant input color selection parameters such as
> +
> +   * Input color protocols with ``struct 
> rte_mtr_params::input_color_proto_mask``
> +
> +   * If ``struct rte_mtr_params::input_color_proto_mask`` has multiple bits 
> set then
> + ``rte_mtr_color_in_protocol_priority_set()`` shall be used to set the 
> priority,
> + in the order, in which protocol to be used to find the input color.
> +
> +   * Fill the tables ``struct rte_mtr_params::dscp_table``,
> + ``struct rte_mtr_params::vlan_table`` based on input color selected.
> +
> +   * Update the ``struct rte_mtr_params::default_input_color`` to determine
> + the default input color in case the input packet does not match
> + the input color method.
> +
> +   * If needed, update the input color table at runtime using
> + ``rte_mtr_meter_vlan_table_update()`` and 
> ``rte_mtr_meter_dscp_table_update()``
> + APIs.
> diff --git a/doc/guides/rel_notes/release_22_07.rst 
> b/doc/guides/rel_notes/release_22_07.rst
> index 42a5f2d990..746622f9b3 100644
> --- a/doc/guides/rel_notes/release_22_07.rst
> +++ b/doc/guides/rel_notes/release_22_07.rst
> @@ -55,6 +55,13 @@ New Features
>   Also, make sure to start the actual text at the margin.
>   ===
>  
> +* **Added protocol based input color for meter.**
> +
> +  Added new APIs ``rte_mt

Re: [PATCH] doc: fix support table for ETH and VLAN flow items

2022-04-26 Thread Ferruh Yigit

On 4/26/2022 9:55 AM, Asaf Penso wrote:

-Original Message-
From: Ferruh Yigit 
Sent: Wednesday, April 20, 2022 8:52 PM
To: Ilya Maximets ; dev@dpdk.org; Asaf Penso

Cc: Ajit Khaparde ; Rahul Lakkireddy
; Hemant Agrawal
; Haiyue Wang ; John
Daley ; Guoyang Zhou ;
Min Hu (Connor) ; Beilei Xing
; Jingjing Wu ; Qi Zhang
; Rosen Xu ; Matan Azrad
; Slava Ovsiienko ; Liron Himi
; Jiawen Wu ; Ori Kam
; Dekel Peled ; NBU-Contact-
Thomas Monjalon (EXTERNAL) ; sta...@dpdk.org;
NBU-Contact-Thomas Monjalon (EXTERNAL) 
Subject: Re: [PATCH] doc: fix support table for ETH and VLAN flow items

On 3/16/2022 12:01 PM, Ilya Maximets wrote:

'has_vlan' attribute is only supported by sfc, mlx5 and cnxk.
Other drivers doesn't support it.  Most of them (like i40e) just
ignore it silently.  Some drivers (like mlx4) never had a full support
of the eth item even before introduction of 'has_vlan'
(mlx4 allows to match on the destination MAC only).

Same for the 'has_more_vlan' flag of the vlan item.

Changing the support level to 'partial' for all such drivers.
This doesn't solve the issue, but at least marks the problematic
drivers.



Hi Asaf,

This was the kind of maintanance issue I was referring to have this kind of
capability documentation for flow API.


Are you referring to the fact that fields like has_vlan are not part of the 
table?
If so, you are right, but IMHO having the high level items still allows the 
users to understand what is supported quickly.
We can have another level of tables per each relevant item to address this 
specific issue.
In this case, we'll have a table for ETH that elaborates the different fields' 
support, like has_vlan.
If you are referring to a different issue, please elaborate.



'vlan' in the .ini file is already to document the flow API VLAN 
support, so I am not suggesting adding more to the table.


My point was it is hard to make this kind documentation correct.




All below drivers are using 'RTE_FLOW_ITEM_TYPE_VLAN', the script verifies
this, but are they actually supporting VLAN filter and in which case?

We need comment from driver maintainers about the support level.
  
@Ori Kam, please comment for mlx driver.





Some details are available in:
https://bugs.dpdk.org/show_bug.cgi?id=958

Fixes: 09315fc83861 ("ethdev: add VLAN attributes to ethernet and VLAN
items")
Cc: sta...@dpdk.org

Signed-off-by: Ilya Maximets 
---

I added the stable in CC, but the patch should be extended while
backporting.  For 21.11 the cnxk driver should be also updated, for
20.11, sfc driver should also be included.

   doc/guides/nics/features/bnxt.ini   | 4 ++--
   doc/guides/nics/features/cxgbe.ini  | 4 ++--
   doc/guides/nics/features/dpaa2.ini  | 4 ++--
   doc/guides/nics/features/e1000.ini  | 2 +-
   doc/guides/nics/features/enic.ini   | 4 ++--
   doc/guides/nics/features/hinic.ini  | 2 +-
   doc/guides/nics/features/hns3.ini   | 4 ++--
   doc/guides/nics/features/i40e.ini   | 4 ++--
   doc/guides/nics/features/iavf.ini   | 4 ++--
   doc/guides/nics/features/ice.ini| 4 ++--
   doc/guides/nics/features/igc.ini| 2 +-
   doc/guides/nics/features/ipn3ke.ini | 4 ++--
   doc/guides/nics/features/ixgbe.ini  | 4 ++--
   doc/guides/nics/features/mlx4.ini   | 4 ++--
   doc/guides/nics/features/mvpp2.ini  | 4 ++--
   doc/guides/nics/features/tap.ini| 4 ++--
   doc/guides/nics/features/txgbe.ini  | 4 ++--
   17 files changed, 31 insertions(+), 31 deletions(-)

diff --git a/doc/guides/nics/features/bnxt.ini
b/doc/guides/nics/features/bnxt.ini
index afb5414b49..ac682c5779 100644
--- a/doc/guides/nics/features/bnxt.ini
+++ b/doc/guides/nics/features/bnxt.ini
@@ -57,7 +57,7 @@ Perf doc = Y

   [rte_flow items]
   any  = Y
-eth  = Y
+eth  = P
   ipv4 = Y
   ipv6 = Y
   gre  = Y
@@ -71,7 +71,7 @@ represented_port = Y
   tcp  = Y
   udp  = Y
   vf   = Y
-vlan = Y
+vlan = P
   vxlan= Y

   [rte_flow actions]
diff --git a/doc/guides/nics/features/cxgbe.ini
b/doc/guides/nics/features/cxgbe.ini
index f674803ec4..f9912390fb 100644
--- a/doc/guides/nics/features/cxgbe.ini
+++ b/doc/guides/nics/features/cxgbe.ini
@@ -36,7 +36,7 @@ x86-64   = Y
   Usage doc= Y

   [rte_flow items]
-eth  = Y
+eth  = P
   ipv4 = Y
   ipv6 = Y
   pf   = Y
@@ -44,7 +44,7 @@ phy_port = Y
   tcp  = Y
   udp  = Y
   vf   = Y
-vlan = Y
+vlan = P

   [rte_flow actions]
   count= Y
diff --git a/doc/guides/nics/features/dpaa2.ini
b/doc/guides/nics/features/dpaa2.ini
index 4c06841a87..09ce66c788 100644
--- a/doc/guides/nics/features/dpaa2.ini
+++ b/doc/guides/nics/features/dpaa2.ini
@@ -31,7 +31,7 @@ ARMv8

[PATCH v5 0/3] ethdev: introduce protocol based buffer split

2022-04-26 Thread wenxuanx . wu
From: Wenxuan Wu 

Protocol based buffer split consists of splitting a received packet into
two separate regions based on the packet content. It is useful in some
scenarios, such as GPU acceleration. The splitting will help to enable
true zero copy and hence improve the performance significantly.

This patchset aims to support protocol split based on current buffer split.
When Rx queue is configured with RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT
offload and corresponding protocol, packets received will be directly split
into two different mempools.

v4->v5:
* Use protocol and mbuf_offset based buffer split instead of header split.
* Use RTE_PTYPE* instead of enum rte_eth_rx_header_split_protocol_type.
* Improve the description of rte_eth_rxseg_split.proto.

v3->v4:
* Use RTE_ETH_RX_HEADER_SPLIT_NONE instead of 0.

v2->v3:
* Fix a PMD bug.
* Add rx queue header split check.
* Revise the log and doc.

v1->v2:
* Add support for all header split protocol types.

Wenxuan Wu (3):
  ethdev: introduce protocol type based buffer split
  app/testpmd: add proto based buffer split config
  net/ice: support proto based buf split in Rx path

 app/test-pmd/cmdline.c| 118 ++
 app/test-pmd/testpmd.c|   7 +-
 app/test-pmd/testpmd.h|   2 +
 drivers/net/ice/ice_ethdev.c  |  10 +-
 drivers/net/ice/ice_rxtx.c| 217 ++
 drivers/net/ice/ice_rxtx.h|  16 ++
 drivers/net/ice/ice_rxtx_vec_common.h |   3 +
 lib/ethdev/rte_ethdev.c   |  36 -
 lib/ethdev/rte_ethdev.h   |  21 ++-
 9 files changed, 388 insertions(+), 42 deletions(-)

-- 
2.25.1



[PATCH v5 1/4] lib/ethdev: introduce protocol type based buffer split

2022-04-26 Thread wenxuanx . wu
From: Wenxuan Wu 

Protocol based buffer split consists of splitting a received packet into two
separate regions based on its content. The split happens after the packet
protocol header and before the packet payload. Splitting is usually between
the packet protocol header that can be posted to a dedicated buffer and the
packet payload that can be posted to a different buffer.

Currently, Rx buffer split supports length and offset based packet split.
protocol split is based on buffer split, configuring length of buffer split
is not suitable for NICs that do split based on protocol types. Because
tunneling makes the conversion from length to protocol type impossible.

This patch extends the current buffer split to support protocol and offset
based buffer split. A new proto field is introduced in the rte_eth_rxseg_split
structure reserved field to specify header protocol type. With Rx queue
offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and corresponding protocol
type configured. PMD will split the ingress packets into two separate regions.
Currently, both inner and outer L2/L3/L4 level protocol based buffer split
can be supported.

For example, let's suppose we configured the Rx queue with the
following segments:
seg0 - pool0, off0=2B
seg1 - pool1, off1=128B

With protocol split type configured with RTE_PTYPE_L4_UDP. The packet
consists of MAC_IP_UDP_PAYLOAD will be splitted like following:
seg0 - udp header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
seg1 - payload @ 128 in mbuf from pool1

The memory attributes for the split parts may differ either - for example
the mempool0 and mempool1 belong to dpdk memory and external memory,
respectively.

Signed-off-by: Xuan Ding 
Signed-off-by: Yuan Wang 
Signed-off-by: Wenxuan Wu 
Reviewed-by: Qi Zhang 
---
 lib/ethdev/rte_ethdev.c | 36 +---
 lib/ethdev/rte_ethdev.h | 15 ++-
 2 files changed, 43 insertions(+), 8 deletions(-)

diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 29a3d80466..1a2bc172ab 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -1661,6 +1661,7 @@ rte_eth_rx_queue_check_split(const struct 
rte_eth_rxseg_split *rx_seg,
struct rte_mempool *mpl = rx_seg[seg_idx].mp;
uint32_t length = rx_seg[seg_idx].length;
uint32_t offset = rx_seg[seg_idx].offset;
+   uint32_t proto = rx_seg[seg_idx].proto;
 
if (mpl == NULL) {
RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
@@ -1694,13 +1695,34 @@ rte_eth_rx_queue_check_split(const struct 
rte_eth_rxseg_split *rx_seg,
}
offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
-   length = length != 0 ? length : *mbp_buf_size;
-   if (*mbp_buf_size < length + offset) {
-   RTE_ETHDEV_LOG(ERR,
-  "%s mbuf_data_room_size %u < %u (segment 
length=%u + segment offset=%u)\n",
-  mpl->name, *mbp_buf_size,
-  length + offset, length, offset);
-   return -EINVAL;
+   if (proto == 0) {
+   length = length != 0 ? length : *mbp_buf_size;
+   if (*mbp_buf_size < length + offset) {
+   RTE_ETHDEV_LOG(ERR,
+   "%s mbuf_data_room_size %u < %u 
(segment length=%u + segment offset=%u)\n",
+   mpl->name, *mbp_buf_size,
+   length + offset, length, offset);
+   return -EINVAL;
+   }
+   } else {
+   /* Ensure n_seg is 2 in protocol based buffer split. */
+   if (n_seg != 2) {
+   RTE_ETHDEV_LOG(ERR, "number of buffer split 
protocol segments should be 2.\n");
+   return -EINVAL;
+   }
+   /* Length and protocol are exclusive here, so make sure 
length is 0 in protocol
+   based buffer split. */
+   if (length != 0) {
+   RTE_ETHDEV_LOG(ERR, "segment length should be 
set to zero in buffer split\n");
+   return -EINVAL;
+   }
+   if (*mbp_buf_size < offset) {
+   RTE_ETHDEV_LOG(ERR,
+   "%s mbuf_data_room_size %u < %u 
segment offset)\n",
+   mpl->name, *mbp_buf_size,
+   offset);
+   return -EINVAL;
+   }
}
}
return 0;
diff --git a/li

[PATCH v5 2/4] app/testpmd: add proto based buffer split config

2022-04-26 Thread wenxuanx . wu
From: Wenxuan Wu 

This patch adds protocol based buffer split configuration in testpmd.
The protocol split feature is off by default. To enable protocol split,
you need:
1. Start testpmd with two mempools. e.g. --mbuf-size=2048,2048
2. Configure Rx queue with rx_offload buffer split on.
3. Set the protocol type of buffer split.

Testpmd View:
testpmd>port config  rx_offload buffer_split on
testpmd>port config  buffer_split mac|ipv4|ipv6|l3|tcp|udp|sctp|
l4|inner_mac|inner_ipv4|inner_ipv6|inner_l3|inner_tcp|
inner_udp|inner_sctp|inner_l4

Signed-off-by: Xuan Ding 
Signed-off-by: Yuan Wang 
Signed-off-by: Wenxuan Wu 
Reviewed-by: Qi Zhang 
---
 app/test-pmd/cmdline.c | 118 +
 app/test-pmd/testpmd.c |   7 +--
 app/test-pmd/testpmd.h |   2 +
 3 files changed, 124 insertions(+), 3 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 6ffea8e21a..5cd4beca95 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -866,6 +866,12 @@ static void cmd_help_long_parsed(void *parsed_result,
" Enable or disable a per port Rx offloading"
" on all Rx queues of a port\n\n"
 
+   "port config  buffer_split 
mac|ipv4|ipv6|l3|tcp|udp|sctp|l4|"
+   "inner_mac|inner_ipv4|inner_ipv6|inner_l3|inner_tcp|"
+   "inner_udp|inner_sctp|inner_l4\n"
+   " Configure protocol type for buffer split"
+   " on all Rx queues of a port\n\n"
+
"port (port_id) rxq (queue_id) rx_offload vlan_strip|"
"ipv4_cksum|udp_cksum|tcp_cksum|tcp_lro|qinq_strip|"
"outer_ipv4_cksum|macsec_strip|header_split|"
@@ -16353,6 +16359,117 @@ cmdline_parse_inst_t cmd_config_per_port_rx_offload = 
{
}
 };
 
+/* config a per port buffer split protocol */
+struct cmd_config_per_port_buffer_split_protocol_result {
+   cmdline_fixed_string_t port;
+   cmdline_fixed_string_t config;
+   uint16_t port_id;
+   cmdline_fixed_string_t buffer_split;
+   cmdline_fixed_string_t protocol;
+};
+
+cmdline_parse_token_string_t 
cmd_config_per_port_buffer_split_protocol_result_port =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_config_per_port_buffer_split_protocol_result,
+port, "port");
+cmdline_parse_token_string_t 
cmd_config_per_port_buffer_split_protocol_result_config =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_config_per_port_buffer_split_protocol_result,
+config, "config");
+cmdline_parse_token_num_t 
cmd_config_per_port_buffer_split_protocol_result_port_id =
+   TOKEN_NUM_INITIALIZER
+   (struct cmd_config_per_port_buffer_split_protocol_result,
+port_id, RTE_UINT16);
+cmdline_parse_token_string_t 
cmd_config_per_port_buffer_split_protocol_result_buffer_split =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_config_per_port_buffer_split_protocol_result,
+buffer_split, "buffer_split");
+cmdline_parse_token_string_t 
cmd_config_per_port_buffer_split_protocol_result_protocol =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_config_per_port_buffer_split_protocol_result,
+protocol, "mac#ipv4#ipv6#l3#tcp#udp#sctp#l4#"
+  "inner_mac#inner_ipv4#inner_ipv6#inner_l3#inner_tcp#"
+  "inner_udp#inner_sctp#inner_l4");
+
+static void
+cmd_config_per_port_buffer_split_protocol_parsed(void *parsed_result,
+   __rte_unused struct cmdline *cl,
+   __rte_unused void *data)
+{
+   struct cmd_config_per_port_buffer_split_protocol_result *res = 
parsed_result;
+   portid_t port_id = res->port_id;
+   struct rte_port *port = &ports[port_id];
+   uint32_t protocol;
+
+   if (port_id_is_invalid(port_id, ENABLED_WARN))
+   return;
+
+   if (port->port_status != RTE_PORT_STOPPED) {
+   fprintf(stderr,
+   "Error: Can't config offload when Port %d is not 
stopped\n",
+   port_id);
+   return;
+   }
+
+   if (!strcmp(res->protocol, "mac"))
+   protocol = RTE_PTYPE_L2_ETHER;
+   else if (!strcmp(res->protocol, "ipv4"))
+   protocol = RTE_PTYPE_L3_IPV4;
+   else if (!strcmp(res->protocol, "ipv6"))
+   protocol = RTE_PTYPE_L3_IPV6;
+   else if (!strcmp(res->protocol, "l3"))
+   protocol = RTE_PTYPE_L3_IPV4|RTE_PTYPE_L3_IPV6;
+   else if (!strcmp(res->protocol, "tcp"))
+   protocol = RTE_PTYPE_L4_TCP;
+   else if (!strcmp(res->protocol, "udp"))
+   protocol = RTE_PTYPE_L4_UDP;
+   else if (!strcmp(res->protocol, "sctp"))
+   protocol = RTE_PTYPE_L4_SCTP;
+   else if (!str

[PATCH v5 3/4] net/ice: support proto based buf split in Rx path

2022-04-26 Thread wenxuanx . wu
From: Wenxuan Wu 

This patch adds support for proto based buffer split in normal Rx data
paths. When the Rx queue is configured with specific protocol type,
packets received will be directly splitted into protocol header and
payload parts. And the two parts will be put into different mempools.

Currently, protocol based buffer split is not supported in vectorized
paths.

Signed-off-by: Xuan Ding 
Signed-off-by: Yuan Wang 
Signed-off-by: Wenxuan Wu 
Reviewed-by: Qi Zhang 
---
 drivers/net/ice/ice_ethdev.c  |  10 +-
 drivers/net/ice/ice_rxtx.c| 219 ++
 drivers/net/ice/ice_rxtx.h|  16 ++
 drivers/net/ice/ice_rxtx_vec_common.h |   3 +
 4 files changed, 216 insertions(+), 32 deletions(-)

diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index 73e550f5fb..ce3f49c863 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -3713,7 +3713,8 @@ ice_dev_info_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)
RTE_ETH_RX_OFFLOAD_OUTER_IPV4_CKSUM |
RTE_ETH_RX_OFFLOAD_VLAN_EXTEND |
RTE_ETH_RX_OFFLOAD_RSS_HASH |
-   RTE_ETH_RX_OFFLOAD_TIMESTAMP;
+   RTE_ETH_RX_OFFLOAD_TIMESTAMP |
+   RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT;
dev_info->tx_offload_capa |=
RTE_ETH_TX_OFFLOAD_QINQ_INSERT |
RTE_ETH_TX_OFFLOAD_IPV4_CKSUM |
@@ -3725,7 +3726,7 @@ ice_dev_info_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)
dev_info->flow_type_rss_offloads |= ICE_RSS_OFFLOAD_ALL;
}
 
-   dev_info->rx_queue_offload_capa = 0;
+   dev_info->rx_queue_offload_capa = RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT;
dev_info->tx_queue_offload_capa = RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE;
 
dev_info->reta_size = pf->hash_lut_size;
@@ -3794,6 +3795,11 @@ ice_dev_info_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)
dev_info->default_rxportconf.ring_size = ICE_BUF_SIZE_MIN;
dev_info->default_txportconf.ring_size = ICE_BUF_SIZE_MIN;
 
+   dev_info->rx_seg_capa.max_nseg = ICE_RX_MAX_NSEG;
+   dev_info->rx_seg_capa.multi_pools = 1;
+   dev_info->rx_seg_capa.offset_allowed = 0;
+   dev_info->rx_seg_capa.offset_align_log2 = 0;
+
return 0;
 }
 
diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
index 2dd2637fbb..8cbcee3543 100644
--- a/drivers/net/ice/ice_rxtx.c
+++ b/drivers/net/ice/ice_rxtx.c
@@ -282,7 +282,6 @@ ice_program_hw_rx_queue(struct ice_rx_queue *rxq)
/* Set buffer size as the head split is disabled. */
buf_size = (uint16_t)(rte_pktmbuf_data_room_size(rxq->mp) -
  RTE_PKTMBUF_HEADROOM);
-   rxq->rx_hdr_len = 0;
rxq->rx_buf_len = RTE_ALIGN(buf_size, (1 << ICE_RLAN_CTX_DBUF_S));
rxq->max_pkt_len =
RTE_MIN((uint32_t)ICE_SUPPORT_CHAIN_NUM * rxq->rx_buf_len,
@@ -311,11 +310,52 @@ ice_program_hw_rx_queue(struct ice_rx_queue *rxq)
 
memset(&rx_ctx, 0, sizeof(rx_ctx));
 
+   if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+   switch (rxq->rxseg[0].proto) {
+   case RTE_PTYPE_L2_ETHER:
+   rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+   rx_ctx.hsplit_1 = ICE_RLAN_RX_HSPLIT_1_SPLIT_L2;
+   break;
+   case RTE_PTYPE_INNER_L2_ETHER:
+   rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+   rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_L2;
+   break;
+   case RTE_PTYPE_L3_IPV4:
+   case RTE_PTYPE_L3_IPV6:
+   case RTE_PTYPE_INNER_L3_IPV4:
+   case RTE_PTYPE_INNER_L3_IPV6:
+   rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+   rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_IP;
+   break;
+   case RTE_PTYPE_L4_TCP:
+   case RTE_PTYPE_L4_UDP:
+   case RTE_PTYPE_INNER_L4_TCP:
+   case RTE_PTYPE_INNER_L4_UDP:
+   rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+   rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_TCP_UDP;
+   break;
+   case RTE_PTYPE_L4_SCTP:
+   case RTE_PTYPE_INNER_L4_SCTP:
+   rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+   rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_SCTP;
+   break;
+   case 0:
+   PMD_DRV_LOG(ERR, "Buffer split protocol must be 
configured");
+   return -EINVAL;
+   default:
+   PMD_DRV_LOG(ERR, "Buffer split protocol is not 
supported");
+   return -EINVAL;
+   }
+   rxq->

RE: [dpdk-dev] [PATCH v4] ethdev: mtr: support protocol based input color selection

2022-04-26 Thread Dumitrescu, Cristian
Hi Jerin,

Thank you for implementing according to our agreement, I am happy to see that 
we are converging.

Here are some comments below:



> diff --git a/lib/ethdev/rte_mtr.h b/lib/ethdev/rte_mtr.h
> index 40df0888c8..76ffbcf724 100644
> --- a/lib/ethdev/rte_mtr.h
> +++ b/lib/ethdev/rte_mtr.h
> @@ -213,6 +213,52 @@ struct rte_mtr_meter_policy_params {
>   const struct rte_flow_action *actions[RTE_COLORS];
>  };
> 
> +/**
> + * Input color protocol method

I suggest adding some more explanations here:
More than one method can be enabled for a given meter. Even if enabled, a 
method might not be applicable to each input packet, in case the associated 
protocol header is not present in the packet. The highest priority method that 
is both enabled for the meter and also applicable for the current input packet 
wins; if none is both enabled and applicable, the default input color is used. 
@see function rte_mtr_color_in_protocol_priority_set()

> + */
> +enum rte_mtr_color_in_protocol {
> + /**
> +  * If the input packet has at least one VLAN label, its input color is
> +  * detected by the outermost VLAN DEI(1bit), PCP(3 bits)
> +  * indexing into the struct rte_mtr_params::vlan_table.
> +  * Otherwise, the *default_input_color* is applied.
> +  *

The statement "Otherwise, the *default_input_color* is applied" is incorrect 
IMO and should be removed, as multiple methods might be enabled and also 
applicable to a specific input packet, in which case the highest priority 
method wins, as opposed to the default input color.

I suggest a simplification "Enable the detection of the packet input color 
based on the outermost VLAN header fields DEI (1 bit) and PCP (3 bits). These 
fields are used as index into the VLAN table"

> +  * @see struct rte_mtr_params::default_input_color
> +  * @see struct rte_mtr_params::vlan_table
> +  */
> + RTE_MTR_COLOR_IN_PROTO_OUTER_VLAN = RTE_BIT64(0),
> + /**
> +  * If the input packet has at least one VLAN label, its input color is
> +  * detected by the innermost VLAN DEI(1bit), PCP(3 bits)
> +  * indexing into the struct rte_mtr_params::vlan_table.
> +  * Otherwise, the *default_input_color* is applied.
> +  *
> +  * @see struct rte_mtr_params::default_input_color
> +  * @see struct rte_mtr_params::vlan_table
> +  */

Same simplification suggested here.

> + RTE_MTR_COLOR_IN_PROTO_INNER_VLAN = RTE_BIT64(1),
> + /**
> +  * If the input packet is IPv4 or IPv6, its input color is detected by
> +  * the outermost DSCP field indexing into the
> +  * struct rte_mtr_params::dscp_table.
> +  * Otherwise, the *default_input_color* is applied.
> +  *
> +  * @see struct rte_mtr_params::default_input_color
> +  * @see struct rte_mtr_params::dscp_table
> +  */

Same simplification suggested here.

> + RTE_MTR_COLOR_IN_PROTO_OUTER_DSCP = RTE_BIT64(2),

I am OK to keep DSCP for the name of the table instead of renaming the table, 
as you suggested, but this method name should reflect the protocol, not the 
field: RTE_MTR_COLOR_IN_PROTO_OUTER_IP.

> + /**
> +  * If the input packet is IPv4 or IPv6, its input color is detected by
> +  * the innermost DSCP field indexing into the
> +  * struct rte_mtr_params::dscp_table.
> +  * Otherwise, the *default_input_color* is applied.
> +  *
> +  * @see struct rte_mtr_params::default_input_color
> +  * @see struct rte_mtr_params::dscp_table
> +  */

Same simplification suggested here.

> + RTE_MTR_COLOR_IN_PROTO_INNER_DSCP = RTE_BIT64(3),

I am OK to keep DSCP for the name of the table instead of renaming the table, 
as you suggested, but this method name should reflect the protocol, not the 
field: RTE_MTR_COLOR_IN_PROTO_INNER_IP.

> +
>  /**
>   * Parameters for each traffic metering & policing object
>   *
> @@ -233,20 +279,58 @@ struct rte_mtr_params {
>*/
>   int use_prev_mtr_color;
> 
> - /** Meter input color. When non-NULL: it points to a pre-allocated and
> + /** Meter input color based on IP DSCP protocol field.
> +  *
> +  * Valid when *input_color_proto_mask* set to any of the following
> +  * RTE_MTR_COLOR_IN_PROTO_OUTER_DSCP,
> +  * RTE_MTR_COLOR_IN_PROTO_INNER_DSCP
> +  *
> +  * When non-NULL: it points to a pre-allocated and
>* pre-populated table with exactly 64 elements providing the input
>* color for each value of the IPv4/IPv6 Differentiated Services Code
> -  * Point (DSCP) input packet field. When NULL: it is equivalent to
> -  * setting this parameter to an all-green populated table (i.e. table
> -  * with all the 64 elements set to green color). The color blind mode
> -  * is configured by setting *use_prev_mtr_color* to 0 and *dscp_table*
> -  * to either NULL or to an all-green populated table. When
> -  * *use_prev_mtr_color* is non-zero value or when *dscp_table*
> contains
> - 

[PATCH] net/ivaf: make reset wait time longer

2022-04-26 Thread wenxuanx . wu
From: Wenxuan Wu 

In 810 CA series, reset time would be longer to wait the kernel return
value.
this patch enable reset wait time longer to finish kernel reset
operation.

Signed-off-by: Wenxuan Wu 
---
 drivers/net/iavf/iavf.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/iavf/iavf.h b/drivers/net/iavf/iavf.h
index a01d18e61b..6183fc40d6 100644
--- a/drivers/net/iavf/iavf.h
+++ b/drivers/net/iavf/iavf.h
@@ -18,7 +18,7 @@
 
 #define IAVF_AQ_LEN   32
 #define IAVF_AQ_BUF_SZ4096
-#define IAVF_RESET_WAIT_CNT   50
+#define IAVF_RESET_WAIT_CNT   100
 #define IAVF_BUF_SIZE_MIN 1024
 #define IAVF_FRAME_SIZE_MAX   9728
 #define IAVF_QUEUE_BASE_ADDR_UNIT 128
-- 
2.25.1



[RFC] eal: allow worker lcore stacks to be allocated from hugepage memory

2022-04-26 Thread Don Wallwork
Add support for using hugepages for worker lcore stack memory.  The
intent is to improve performance by reducing stack memory related TLB
misses and also by using memory local to the NUMA node of each lcore.

Platforms desiring to make use of this capability must enable the
associated option flag and stack size settings in platform config
files.
---
 lib/eal/linux/eal.c | 39 +++
 1 file changed, 39 insertions(+)

diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c
index 1ef263434a..4e1e5b6915 100644
--- a/lib/eal/linux/eal.c
+++ b/lib/eal/linux/eal.c
@@ -1143,9 +1143,48 @@ rte_eal_init(int argc, char **argv)
 
lcore_config[i].state = WAIT;
 
+#ifdef RTE_EAL_NUMA_AWARE_LCORE_STACK
+   /* Allocate NUMA aware stack memory and set pthread attributes 
*/
+   pthread_attr_t attr;
+   void *stack_ptr =
+   rte_zmalloc_socket("lcore_stack",
+  RTE_EAL_NUMA_AWARE_LCORE_STACK_SIZE,
+  RTE_EAL_NUMA_AWARE_LCORE_STACK_SIZE,
+  rte_lcore_to_socket_id(i));
+
+   if (stack_ptr == NULL) {
+   rte_eal_init_alert("Cannot allocate stack memory");
+   rte_errno = ENOMEM;
+   return -1;
+   }
+
+   if (pthread_attr_init(&attr) != 0) {
+   rte_eal_init_alert("Cannot init pthread attributes");
+   rte_errno = EINVAL;
+   return -1;
+   }
+   if (pthread_attr_setstack(&attr,
+ stack_ptr,
+ RTE_EAL_NUMA_AWARE_LCORE_STACK_SIZE) 
!= 0) {
+   rte_eal_init_alert("Cannot set pthread stack 
attributes");
+   rte_errno = ENOTSUP;
+   return -1;
+   }
+
+   /* create a thread for each lcore */
+   ret = pthread_create(&lcore_config[i].thread_id, &attr,
+eal_thread_loop, (void *)(uintptr_t)i);
+
+   if (pthread_attr_destroy(&attr) != 0) {
+   rte_eal_init_alert("Cannot destroy pthread attributes");
+   rte_errno = EFAULT;
+   return -1;
+   }
+#else
/* create a thread for each lcore */
ret = pthread_create(&lcore_config[i].thread_id, NULL,
 eal_thread_loop, (void *)(uintptr_t)i);
+#endif
if (ret != 0)
rte_panic("Cannot create thread\n");
 
-- 
2.17.1



Re: [PATCH 2/3] mem: fix ASan shadow for remapped memory segments

2022-04-26 Thread Burakov, Anatoly

On 21-Apr-22 2:18 PM, Burakov, Anatoly wrote:

On 21-Apr-22 10:37 AM, David Marchand wrote:

On Wed, Apr 20, 2022 at 4:47 PM Burakov, Anatoly
 wrote:


On 15-Apr-22 6:31 PM, David Marchand wrote:

When releasing some memory, the allocator can choose to return some
pages to the OS. At the same time, this memory was poisoned in ASAn
shadow. Doing the latter made it impossible to remap this same page
later.
On the other hand, without this poison, the OS would pagefault in any
case for this page.

Remove the poisoning for unmapped pages.

Bugzilla ID: 994
Fixes: 6cc51b1293ce ("mem: instrument allocator for ASan")
Cc: sta...@dpdk.org

Signed-off-by: David Marchand 
---
   lib/eal/common/malloc_elem.h |  4 
   lib/eal/common/malloc_heap.c | 12 +++-
   2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/lib/eal/common/malloc_elem.h 
b/lib/eal/common/malloc_elem.h

index 228f178418..b859003722 100644
--- a/lib/eal/common/malloc_elem.h
+++ b/lib/eal/common/malloc_elem.h
@@ -272,6 +272,10 @@ old_malloc_size(struct malloc_elem *elem)

   #else /* !RTE_MALLOC_ASAN */

+static inline void
+asan_set_zone(void *ptr __rte_unused, size_t len __rte_unused,
+ uint32_t val __rte_unused) { }
+
   static inline void
   asan_set_freezone(void *ptr __rte_unused, size_t size 
__rte_unused) { }


diff --git a/lib/eal/common/malloc_heap.c 
b/lib/eal/common/malloc_heap.c

index 6c572b6f2c..5913d9f862 100644
--- a/lib/eal/common/malloc_heap.c
+++ b/lib/eal/common/malloc_heap.c
@@ -860,6 +860,7 @@ malloc_heap_free(struct malloc_elem *elem)
   size_t len, aligned_len, page_sz;
   struct rte_memseg_list *msl;
   unsigned int i, n_segs, before_space, after_space;
+ bool unmapped_pages = false;
   int ret;
   const struct internal_config *internal_conf =
   eal_get_internal_configuration();
@@ -999,6 +1000,13 @@ malloc_heap_free(struct malloc_elem *elem)

   /* don't care if any of this fails */
   malloc_heap_free_pages(aligned_start, aligned_len);
+ /*
+  * Clear any poisoning in ASan for the associated 
pages so that
+  * next time EAL maps those pages, the allocator can 
access

+  * them.
+  */
+ asan_set_zone(aligned_start, aligned_len, 0x00);
+ unmapped_pages = true;

   request_sync();
   } else {
@@ -1032,7 +1040,9 @@ malloc_heap_free(struct malloc_elem *elem)

   rte_mcfg_mem_write_unlock();
   free_unlock:
- asan_set_freezone(asan_ptr, asan_data_len);
+ /* Poison memory range if belonging to some still mapped 
pages. */

+ if (!unmapped_pages)
+ asan_set_freezone(asan_ptr, asan_data_len);

   rte_spinlock_unlock(&(heap->lock));
   return ret;


I suspect the patch should be a little more complicated than that. When
we unmap pages, we don't necessarily unmap the entire malloc element, it
could be that we have a freed allocation like so:

| malloc header | free space | unmapped space | free space | next malloc
header |

So, i think the freezone should be set from asan_ptr till aligned_start,
and then from (aligned_start + aligned_len) till (asan_ptr +
asan_data_len). Does that make sense?


(btw, I get a bounce for Zhihong mail address, is he not working at
Intel anymore?)

To be honest, I don't understand if we can get to this situation :-)
(especially the free space after the unmapped region).
But I guess you mean something like (on top of current patch):

@@ -1040,9 +1040,25 @@ malloc_heap_free(struct malloc_elem *elem)

 rte_mcfg_mem_write_unlock();
  free_unlock:
-   /* Poison memory range if belonging to some still mapped 
pages. */

-   if (!unmapped_pages)
+   if (!unmapped_pages) {
 asan_set_freezone(asan_ptr, asan_data_len);
+   } else {
+   /*
+    * We may be in a situation where we unmapped pages 
like this:

+    * malloc header | free space | unmapped space | free
space | malloc header
+    */
+   void *free1_start = asan_ptr;
+   void *free1_end = aligned_start;
+   void *free2_start = RTE_PTR_ADD(aligned_start, 
aligned_len);

+   void *free2_end = RTE_PTR_ADD(asan_ptr, asan_data_len);
+
+   if (free1_start < free1_end)
+   asan_set_freezone(free1_start,
+   RTE_PTR_DIFF(free1_end, free1_start));
+   if (free2_start < free2_end)
+   asan_set_freezone(free2_start,
+   RTE_PTR_DIFF(free2_end, free2_start));
+   }

 rte_spinlock_unlock(&(heap->lock));
 return ret;



Something like that, yes. I will have to think through this a bit more, 
especially in light of your func_reentrancy splat :)




So, the reason splat in func_reentrancy test happens is as follows: the 
above patch is sorta correct (i have a different one but does

RE: [PATCH v2] net/ice: optimize max queue number calculation

2022-04-26 Thread Wu, Wenjun1



> -Original Message-
> From: Zhang, Qi Z 
> Sent: Friday, April 8, 2022 7:24 PM
> To: Yang, Qiming ; Wu, Wenjun1
> 
> Cc: dev@dpdk.org; Zhang, Qi Z 
> Subject: [PATCH v2] net/ice: optimize max queue number calculation
> 
> Remove the limitation that max queue pair number must be 2^n.
> With this patch, even on a 8 ports device, the max queue pair number
> increased from 128 to 254.
> 
> Signed-off-by: Qi Zhang 
> ---
> 
> v2:
> - fix check patch warning
> 
>  drivers/net/ice/ice_ethdev.c | 24 
>  1 file changed, 20 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c index
> 73e550f5fb..ff2b3e45d9 100644
> --- a/drivers/net/ice/ice_ethdev.c
> +++ b/drivers/net/ice/ice_ethdev.c
> @@ -819,10 +819,26 @@ ice_vsi_config_tc_queue_mapping(struct ice_vsi
> *vsi,
>   return -ENOTSUP;
>   }
> 
> - vsi->nb_qps = RTE_MIN(vsi->nb_qps, ICE_MAX_Q_PER_TC);
> - fls = (vsi->nb_qps == 0) ? 0 : rte_fls_u32(vsi->nb_qps) - 1;
> - /* Adjust the queue number to actual queues that can be applied */
> - vsi->nb_qps = (vsi->nb_qps == 0) ? 0 : 0x1 << fls;
> + /* vector 0 is reserved and 1 vector for ctrl vsi */
> + if (vsi->adapter->hw.func_caps.common_cap.num_msix_vectors < 2)
> + vsi->nb_qps = 0;
> + else
> + vsi->nb_qps = RTE_MIN
> + ((uint16_t)vsi->adapter-
> >hw.func_caps.common_cap.num_msix_vectors - 2,
> + RTE_MIN(vsi->nb_qps, ICE_MAX_Q_PER_TC));
> +
> + /* nb_qps(hex)  -> fls */
> + /*  -> 0 */
> + /* 0001 -> 0 */
> + /* 0002 -> 1 */
> + /* 0003 ~ 0004  -> 2 */
> + /* 0005 ~ 0008  -> 3 */
> + /* 0009 ~ 0010  -> 4 */
> + /* 0011 ~ 0020  -> 5 */
> + /* 0021 ~ 0040  -> 6 */
> + /* 0041 ~ 0080  -> 7 */
> + /* 0081 ~ 0100  -> 8 */
> + fls = (vsi->nb_qps == 0) ? 0 : rte_fls_u32(vsi->nb_qps - 1);
> 
>   qp_idx = 0;
>   /* Set tc and queue mapping with VSI */
> --
> 2.26.2

Acked-by: Wenjun Wu < wenjun1...@intel.com>

Thanks
Wenjun



RE: [RFC] ethdev: datapath-focused meter actions

2022-04-26 Thread Dumitrescu, Cristian
Hi Alexander,

After reviewing this RFC, I have to say that your proposal is very unclear to 
me. I don't understand what is the problem you're trying to solve and what 
exactly is that you cannot do with the current meter and flow APIs.

I suggest we get together for a community call with all the interested folks 
invited in order to get more clarity on your proposal, thank you!

> -Original Message-
> From: Jerin Jacob 
> Sent: Friday, April 8, 2022 9:21 AM
> To: Alexander Kozyrev ; Dumitrescu, Cristian
> 
> Cc: dpdk-dev ; Ori Kam ; Thomas
> Monjalon ; Ivan Malov ;
> Andrew Rybchenko ; Yigit, Ferruh
> ; Awal, Mohammad Abdul
> ; Zhang, Qi Z ;
> Jerin Jacob ; Ajit Khaparde
> ; Richardson, Bruce
> 
> Subject: Re: [RFC] ethdev: datapath-focused meter actions
> 
> + @Cristian Dumitrescu meter maintainer.
> 
> 
> On Fri, Apr 8, 2022 at 8:17 AM Alexander Kozyrev 
> wrote:
> >
> > The introduction of asynchronous flow rules operations allowed users
> > to create/destroy flow rules as part of the datapath without blocking
> > on Flow API and slowing the packet processing down.
> >
> > That applies to every possible action that has no preparation steps.
> > Unfortunately, one notable exception is the meter action.
> > There is a separate API to prepare a meter profile and a meter policy
> > before any meter object can be used as a flow rule action.

I disagree. Creation of meter policies and meter objects is decoupled from the 
flow creation. Meter policies and meter objects can all be created at 
initialization or on-the-fly, and their creation does not directly require the 
data plane to be stopped.

Please explain what problem are you trying to fix here. I suggest you provide 
the sequence diagram and tell us where the problem is.

> >
> > The application logic is the following:
> > 1. rte_mtr_meter_profile_add() is called to create the meter profile
> > first to define how to classify incoming packets and to assign an
> > appropriate color to them.
> > 2. rte_mtr_meter_policy_add() is invoked to define the fate of a packet,
> > based on its color (practically creating flow rules, matching colors).

Nope, the policy add does not create any flows. In fact, it does not create any 
meter objects either. It simply defines a configuration pattern that can be 
reused many times when meter objects are created afterwards.

> > 3. rte_mtr_create() is then needed to search (with locks) for previously
> > created profile and policy in order to create the meter object.

The rte_mtr_create() is not created at the time the flow is created, but at a 
prior decoupled moment. I don't see any issue here.

> > 4. rte_flow_create() is now finally can be used to specify the created
> > meter as an action.
> >
> > This approach doesn't fit into the asynchronous rule creation model
> > and can be improved with the following proposal:

Again, the creation of meter policies and objects is decoupled from the flow 
creation; in fact, the meter policies and objects must be created before the 
flows using them are created.

> > 1. Creating a policy may be replaced with the creation of a group with
> > up to 3 different rules for every color using asynchronous Flow API.
> > That requires the introduction of a new pattern item - meter color.
> > Then creation a flow rule with the meter means a simple jump to a group:
> > rte_flow_async_create(group=1, pattern=color, actions=...);
> > rte_flow_async_create(group=0, pattern=5-tuple,
> >   actions=meter,jump group 1);
> > This allows to classify packets and act upon their color classifications.
> > The Meter action assigns a color to a packet and an appropriate action
> > is selected based on the Meter color in group 1.
> >

The meter objects requires a relatively complex configuration procedure. This 
is one of the reasons meters have their own API, so we can keep that complexity 
away from the flow API.

You seem to indicate that your desired behavior is to create the meter objects 
when the flow is created rather than in advance. Did I get it correctly? This 
is possible with the current API as well by simply creating the meter object 
immediately before the flow gets created.

Stitching the creation of new meter object to the flow creation (if I 
understand your approach right) doe not allow for some important features, such 
as:
-reusing meter objects that were previously created by reassigning them to a 
different flow
-having multiple flows use the same shared meter.

> > 2. Preparing a meter object should be the part of flow rule creation

Why?? Please take some time to clearly explain this, your entire proposal seems 
to be predicated on this assertion being true.

> > and use the same flow queue to benefit from asynchronous operations:
> > rte_flow_async_create(group=0, pattern=5-tuple,
> >   actions=meter id 1 profile rfc2697, jump group 1);
> > Creation of the meter object takes time and flow creation must wait
> > until it is ready be

RE: [RFC] ethdev: datapath-focused meter actions

2022-04-26 Thread Dumitrescu, Cristian
I forgot to mention: besides the my statement at the top of my reply, there are 
many comments inline below :)

> -Original Message-
> From: Dumitrescu, Cristian
> Sent: Tuesday, April 26, 2022 2:44 PM
> To: Jerin Jacob ; Alexander Kozyrev
> 
> Cc: dpdk-dev ; Ori Kam ; Thomas
> Monjalon ; Ivan Malov ;
> Andrew Rybchenko ; Yigit, Ferruh
> ; Awal, Mohammad Abdul
> ; Zhang, Qi Z ;
> Jerin Jacob ; Ajit Khaparde
> ; Richardson, Bruce
> 
> Subject: RE: [RFC] ethdev: datapath-focused meter actions
> 
> Hi Alexander,
> 
> After reviewing this RFC, I have to say that your proposal is very unclear to 
> me.
> I don't understand what is the problem you're trying to solve and what exactly
> is that you cannot do with the current meter and flow APIs.
> 
> I suggest we get together for a community call with all the interested folks
> invited in order to get more clarity on your proposal, thank you!
> 
> > -Original Message-
> > From: Jerin Jacob 
> > Sent: Friday, April 8, 2022 9:21 AM
> > To: Alexander Kozyrev ; Dumitrescu, Cristian
> > 
> > Cc: dpdk-dev ; Ori Kam ; Thomas
> > Monjalon ; Ivan Malov ;
> > Andrew Rybchenko ; Yigit, Ferruh
> > ; Awal, Mohammad Abdul
> > ; Zhang, Qi Z ;
> > Jerin Jacob ; Ajit Khaparde
> > ; Richardson, Bruce
> > 
> > Subject: Re: [RFC] ethdev: datapath-focused meter actions
> >
> > + @Cristian Dumitrescu meter maintainer.
> >
> >
> > On Fri, Apr 8, 2022 at 8:17 AM Alexander Kozyrev 
> > wrote:
> > >
> > > The introduction of asynchronous flow rules operations allowed users
> > > to create/destroy flow rules as part of the datapath without blocking
> > > on Flow API and slowing the packet processing down.
> > >
> > > That applies to every possible action that has no preparation steps.
> > > Unfortunately, one notable exception is the meter action.
> > > There is a separate API to prepare a meter profile and a meter policy
> > > before any meter object can be used as a flow rule action.
> 
> I disagree. Creation of meter policies and meter objects is decoupled from the
> flow creation. Meter policies and meter objects can all be created at
> initialization or on-the-fly, and their creation does not directly require 
> the data
> plane to be stopped.
> 
> Please explain what problem are you trying to fix here. I suggest you provide
> the sequence diagram and tell us where the problem is.
> 
> > >
> > > The application logic is the following:
> > > 1. rte_mtr_meter_profile_add() is called to create the meter profile
> > > first to define how to classify incoming packets and to assign an
> > > appropriate color to them.
> > > 2. rte_mtr_meter_policy_add() is invoked to define the fate of a packet,
> > > based on its color (practically creating flow rules, matching colors).
> 
> Nope, the policy add does not create any flows. In fact, it does not create 
> any
> meter objects either. It simply defines a configuration pattern that can be
> reused many times when meter objects are created afterwards.
> 
> > > 3. rte_mtr_create() is then needed to search (with locks) for previously
> > > created profile and policy in order to create the meter object.
> 
> The rte_mtr_create() is not created at the time the flow is created, but at a
> prior decoupled moment. I don't see any issue here.
> 
> > > 4. rte_flow_create() is now finally can be used to specify the created
> > > meter as an action.
> > >
> > > This approach doesn't fit into the asynchronous rule creation model
> > > and can be improved with the following proposal:
> 
> Again, the creation of meter policies and objects is decoupled from the flow
> creation; in fact, the meter policies and objects must be created before the
> flows using them are created.
> 
> > > 1. Creating a policy may be replaced with the creation of a group with
> > > up to 3 different rules for every color using asynchronous Flow API.
> > > That requires the introduction of a new pattern item - meter color.
> > > Then creation a flow rule with the meter means a simple jump to a group:
> > > rte_flow_async_create(group=1, pattern=color, actions=...);
> > > rte_flow_async_create(group=0, pattern=5-tuple,
> > >   actions=meter,jump group 1);
> > > This allows to classify packets and act upon their color classifications.
> > > The Meter action assigns a color to a packet and an appropriate action
> > > is selected based on the Meter color in group 1.
> > >
> 
> The meter objects requires a relatively complex configuration procedure. This
> is one of the reasons meters have their own API, so we can keep that
> complexity away from the flow API.
> 
> You seem to indicate that your desired behavior is to create the meter objects
> when the flow is created rather than in advance. Did I get it correctly? This 
> is
> possible with the current API as well by simply creating the meter object
> immediately before the flow gets created.
> 
> Stitching the creation of new meter object to the flow creation (if I 
> understand
> your approach right) d

DPDK 21.11.1 released

2022-04-26 Thread Kevin Traynor
Hi all,

Here is a new stable release:
https://fast.dpdk.org/rel/dpdk-21.11.1.tar.xz

The git tree is at:
https://dpdk.org/browse/dpdk-stable/?h=21.11

This is the first stable release of 21.11 LTS and contains
~400 fixes.

See the release notes for details:
http://doc.dpdk.org/guides-21.11/rel_notes/release_21_11.html#fixes

Thanks to the authors who helped with backports and to the
following who helped with validation:
Nvidia, Intel, Canonical and Red Hat.

Kevin

---
 MAINTAINERS|   2 +
 VERSION|   2 +-
 app/dumpcap/main.c |   9 +-
 app/pdump/main.c   |  16 +-
 app/proc-info/main.c   |   6 +-
 app/test-acl/main.c|   6 +-
 app/test-compress-perf/comp_perf_test_cyclecount.c |   9 +-
 app/test-compress-perf/comp_perf_test_throughput.c |   2 +-
 app/test-compress-perf/comp_perf_test_verify.c |   2 +-
 app/test-compress-perf/main.c  |   5 +-
 app/test-crypto-perf/cperf_test_pmd_cyclecount.c   |   2 +-
 app/test-eventdev/evt_options.c|   2 +-
 app/test-eventdev/test_order_common.c  |   2 +-
 app/test-fib/main.c|  12 +-
 app/test-flow-perf/config.h|   2 +-
 app/test-flow-perf/main.c  |   2 +-
 app/test-pmd/cmd_flex_item.c   |   3 +-
 app/test-pmd/cmdline.c |  18 +-
 app/test-pmd/cmdline_flow.c|  13 +-
 app/test-pmd/cmdline_tm.c  |   4 +-
 app/test-pmd/config.c  |  22 +-
 app/test-pmd/csumonly.c|  24 +-
 app/test-pmd/parameters.c  |   2 +-
 app/test-pmd/testpmd.c |  28 +-
 app/test-pmd/testpmd.h |   1 +
 app/test-pmd/txonly.c  |  24 +-
 app/test-regex/main.c  |  38 +-
 app/test/meson.build   |   2 +-
 app/test/test_barrier.c|   2 +-
 app/test/test_bpf.c|  10 +-
 app/test/test_compressdev.c|   2 +-
 app/test/test_cryptodev.c  |  13 +-
 app/test/test_cryptodev_asym.c |   2 +-
 app/test/test_cryptodev_rsa_test_vectors.h |   2 +-
 app/test/test_dmadev.c |   8 +-
 app/test/test_efd.c|   2 +-
 app/test/test_fib_perf.c   |   2 +-
 app/test/test_kni.c|   4 +-
 app/test/test_kvargs.c |  16 +-
 app/test/test_link_bonding.c   |   4 +
 app/test/test_link_bonding_rssconf.c   |   4 +
 app/test/test_lpm6_data.h  |   2 +-
 app/test/test_mbuf.c   |   4 -
 app/test/test_member.c |   2 +-
 app/test/test_memory.c |   2 +-
 app/test/test_mempool.c|   4 +-
 app/test/test_memzone.c|   6 +-
 app/test/test_metrics.c|   2 +-
 app/test/test_pcapng.c |   2 +-
 app/test/test_power_cpufreq.c  |   2 +-
 app/test/test_rcu_qsbr.c   |   4 +-
 app/test/test_red.c|   8 +-
 app/test/test_security.c   |   2 +-
 app/test/test_table_pipeline.c |   2 +-
 app/test/test_thash.c  |   2 +-
 buildtools/binutils-avx512-check.py|   4 +-
 buildtools/call-sphinx-build.py|   4 +-
 buildtools/meson.build |   5 +-
 config/arm/meson.build |  10 +-
 config/meson.build |   5 +-
 config/x86/meson.build |   2 +-
 devtools/check-abi.sh  |   4 -
 devtools/check-forbidden-tokens.awk|   3 +
 devtools/check-symbol-change.sh|   6 +-
 devtools/check-symbol-maps.sh  |   7 +
 devtools/libabigail.abignore   |  20 +
 doc/api/generate_examples.sh   |  14 +-
 doc/api/meson.build|  10 +-
 doc/guides/compressdevs/mlx5.rst   |   6 +-
 doc/guides/conf.py |   6 +-
 doc/guides/cryptodevs/mlx5.rst |   6 +-
 doc/guides/dmadevs/hisilicon.rst   |   4 +-
 doc/guides/dmadevs/idxd.rst|  29 +-
 doc/guides/eventdevs/dlb2.rst  |  1

RE: [PATCH] app/test: fix buffer overflow in table unit tests

2022-04-26 Thread Dumitrescu, Cristian



> -Original Message-
> From: Medvedkin, Vladimir 
> Sent: Thursday, April 21, 2022 6:35 PM
> To: dev@dpdk.org
> Cc: Dumitrescu, Cristian ; sta...@dpdk.org
> Subject: [PATCH] app/test: fix buffer overflow in table unit tests
> 
> This patch fixes stack buffer overflow reported by ASAN.
> 
> Bugzilla ID: 820
> Fixes: 5205954791cb ("app/test: packet framework unit tests")
> Cc: cristian.dumitre...@intel.com
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Vladimir Medvedkin 
> ---

Acked-by: Cristian Dumitrescu 


Re: [PATCH 2/3] mem: fix ASan shadow for remapped memory segments

2022-04-26 Thread David Marchand
On Tue, Apr 26, 2022 at 2:54 PM Burakov, Anatoly
 wrote:
> >> @@ -1040,9 +1040,25 @@ malloc_heap_free(struct malloc_elem *elem)
> >>
> >>  rte_mcfg_mem_write_unlock();
> >>   free_unlock:
> >> -   /* Poison memory range if belonging to some still mapped
> >> pages. */
> >> -   if (!unmapped_pages)
> >> +   if (!unmapped_pages) {
> >>  asan_set_freezone(asan_ptr, asan_data_len);
> >> +   } else {
> >> +   /*
> >> +* We may be in a situation where we unmapped pages
> >> like this:
> >> +* malloc header | free space | unmapped space | free
> >> space | malloc header
> >> +*/
> >> +   void *free1_start = asan_ptr;
> >> +   void *free1_end = aligned_start;
> >> +   void *free2_start = RTE_PTR_ADD(aligned_start,
> >> aligned_len);
> >> +   void *free2_end = RTE_PTR_ADD(asan_ptr, asan_data_len);
> >> +
> >> +   if (free1_start < free1_end)
> >> +   asan_set_freezone(free1_start,
> >> +   RTE_PTR_DIFF(free1_end, free1_start));
> >> +   if (free2_start < free2_end)
> >> +   asan_set_freezone(free2_start,
> >> +   RTE_PTR_DIFF(free2_end, free2_start));
> >> +   }
> >>
> >>  rte_spinlock_unlock(&(heap->lock));
> >>  return ret;
> >>
> >
> > Something like that, yes. I will have to think through this a bit more,
> > especially in light of your func_reentrancy splat :)
> >
>
> So, the reason splat in func_reentrancy test happens is as follows: the
> above patch is sorta correct (i have a different one but does the same
> thing), but incomplete. What happens then is when we add new memory, we
> are integrating it into our existing malloc heap, which triggers
> `malloc_elem_join_adjacent_free()` which will trigger a write into old
> header space being merged, which may be marked as "freed". So, again we
> are hit with our internal allocator messing with ASan.

I ended up with the same conclusion.
Thanks for confirming.


>
> To properly fix this is to answer the following question: what is the
> goal of having ASan support in DPDK? Is it there to catch bugs *in the
> allocator*, or can we just trust that our allocator code is correct, and
> only concern ourselves with user-allocated areas of the code? Because it

The best would be to handle both.
I don't think clang disables ASan for the instrumentations on malloc.


> seems like the best way to address this issue would be to just avoid
> triggering ASan checks for certain allocator-internal actions: this way,
> we don't need to care what allocator itself does, just what user code
> does. As in, IIRC there was a compiler attribute that disables ASan
> checks for a specific function: perhaps we could just wrap certain
> access in that and be done with it?
>
> What do you think?

It is tempting because it is the easiest way to avoid the issue.
Though, by waiving those checks in the allocator, does it leave the
ASan shadow in a consistent state?


-- 
David Marchand



Re: [PATCH 1/2] rib: mark error checks with unlikely

2022-04-26 Thread Medvedkin, Vladimir




On 13/04/2022 03:09, Stephen Hemminger wrote:

Also mark some conditional functions as const.

Signed-off-by: Stephen Hemminger 
---
  lib/rib/rte_rib.c | 26 +-
  1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/lib/rib/rte_rib.c b/lib/rib/rte_rib.c
index cd9e823068d2..2a3de5065a31 100644
--- a/lib/rib/rte_rib.c
+++ b/lib/rib/rte_rib.c
@@ -48,13 +48,13 @@ struct rte_rib {
  };
  
  static inline bool

-is_valid_node(struct rte_rib_node *node)
+is_valid_node(const struct rte_rib_node *node)
  {
return (node->flag & RTE_RIB_VALID_NODE) == RTE_RIB_VALID_NODE;
  }
  
  static inline bool

-is_right_node(struct rte_rib_node *node)
+is_right_node(const struct rte_rib_node *node)
  {
return node->parent->right == node;
  }
@@ -99,7 +99,7 @@ rte_rib_lookup(struct rte_rib *rib, uint32_t ip)
  {
struct rte_rib_node *cur, *prev = NULL;
  
-	if (rib == NULL) {

+   if (unlikely(rib == NULL)) {
rte_errno = EINVAL;
return NULL;
}
@@ -147,7 +147,7 @@ __rib_lookup_exact(struct rte_rib *rib, uint32_t ip, 
uint8_t depth)
  struct rte_rib_node *
  rte_rib_lookup_exact(struct rte_rib *rib, uint32_t ip, uint8_t depth)
  {
-   if ((rib == NULL) || (depth > RIB_MAXDEPTH)) {
+   if (unlikely(rib == NULL || depth > RIB_MAXDEPTH)) {
rte_errno = EINVAL;
return NULL;
}
@@ -167,7 +167,7 @@ rte_rib_get_nxt(struct rte_rib *rib, uint32_t ip,
  {
struct rte_rib_node *tmp, *prev = NULL;
  
-	if ((rib == NULL) || (depth > RIB_MAXDEPTH)) {

+   if (unlikely(rib == NULL || depth > RIB_MAXDEPTH)) {
rte_errno = EINVAL;
return NULL;
}
@@ -244,7 +244,7 @@ rte_rib_insert(struct rte_rib *rib, uint32_t ip, uint8_t 
depth)
uint32_t common_prefix;
uint8_t common_depth;
  
-	if ((rib == NULL) || (depth > RIB_MAXDEPTH)) {

+   if (unlikely(rib == NULL || depth > RIB_MAXDEPTH)) {
rte_errno = EINVAL;
return NULL;
}
@@ -342,7 +342,7 @@ rte_rib_insert(struct rte_rib *rib, uint32_t ip, uint8_t 
depth)
  int
  rte_rib_get_ip(const struct rte_rib_node *node, uint32_t *ip)
  {
-   if ((node == NULL) || (ip == NULL)) {
+   if (unlikely(node == NULL || ip == NULL)) {
rte_errno = EINVAL;
return -1;
}
@@ -353,7 +353,7 @@ rte_rib_get_ip(const struct rte_rib_node *node, uint32_t 
*ip)
  int
  rte_rib_get_depth(const struct rte_rib_node *node, uint8_t *depth)
  {
-   if ((node == NULL) || (depth == NULL)) {
+   if (unlikely(node == NULL || depth == NULL)) {
rte_errno = EINVAL;
return -1;
}
@@ -370,7 +370,7 @@ rte_rib_get_ext(struct rte_rib_node *node)
  int
  rte_rib_get_nh(const struct rte_rib_node *node, uint64_t *nh)
  {
-   if ((node == NULL) || (nh == NULL)) {
+   if (unlikely(node == NULL || nh == NULL)) {
rte_errno = EINVAL;
return -1;
}
@@ -381,7 +381,7 @@ rte_rib_get_nh(const struct rte_rib_node *node, uint64_t 
*nh)
  int
  rte_rib_set_nh(struct rte_rib_node *node, uint64_t nh)
  {
-   if (node == NULL) {
+   if (unlikely(node == NULL)) {
rte_errno = EINVAL;
return -1;
}
@@ -399,7 +399,7 @@ rte_rib_create(const char *name, int socket_id, const 
struct rte_rib_conf *conf)
struct rte_mempool *node_pool;
  
  	/* Check user arguments. */

-   if (name == NULL || conf == NULL || conf->max_nodes <= 0) {
+   if (unlikely(name == NULL || conf == NULL || conf->max_nodes <= 0)) {
rte_errno = EINVAL;
return NULL;
}
@@ -434,7 +434,7 @@ rte_rib_create(const char *name, int socket_id, const 
struct rte_rib_conf *conf)
  
  	/* allocate tailq entry */

te = rte_zmalloc("RIB_TAILQ_ENTRY", sizeof(*te), 0);
-   if (te == NULL) {
+   if (unlikely(te == NULL)) {
RTE_LOG(ERR, LPM,
"Can not allocate tailq entry for RIB %s\n", name);
rte_errno = ENOMEM;
@@ -444,7 +444,7 @@ rte_rib_create(const char *name, int socket_id, const 
struct rte_rib_conf *conf)
/* Allocate memory to store the RIB data structures. */
rib = rte_zmalloc_socket(mem_name,
sizeof(struct rte_rib), RTE_CACHE_LINE_SIZE, socket_id);
-   if (rib == NULL) {
+   if (unlikely(rib == NULL)) {
RTE_LOG(ERR, LPM, "RIB %s memory allocation failed\n", name);
rte_errno = ENOMEM;
goto free_te;


Acked-by: Vladimir Medvedkin 

--
Regards,
Vladimir


Re: [PATCH 2/2] rib6: mark error tests with unlikely

2022-04-26 Thread Medvedkin, Vladimir




On 13/04/2022 03:09, Stephen Hemminger wrote:

Also mark some conditional functions as const.

Signed-off-by: Stephen Hemminger 
---
  lib/rib/rte_rib6.c | 25 -
  1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/lib/rib/rte_rib6.c b/lib/rib/rte_rib6.c
index 042ac1f090bf..650bf1b8f681 100644
--- a/lib/rib/rte_rib6.c
+++ b/lib/rib/rte_rib6.c
@@ -47,13 +47,13 @@ struct rte_rib6 {
  };
  
  static inline bool

-is_valid_node(struct rte_rib6_node *node)
+is_valid_node(const struct rte_rib6_node *node)
  {
return (node->flag & RTE_RIB_VALID_NODE) == RTE_RIB_VALID_NODE;
  }
  
  static inline bool

-is_right_node(struct rte_rib6_node *node)
+is_right_node(const struct rte_rib6_node *node)
  {
return node->parent->right == node;
  }
@@ -171,7 +171,7 @@ rte_rib6_lookup_exact(struct rte_rib6 *rib,
uint8_t tmp_ip[RTE_RIB6_IPV6_ADDR_SIZE];
int i;
  
-	if ((rib == NULL) || (ip == NULL) || (depth > RIB6_MAXDEPTH)) {

+   if (unlikely(rib == NULL || ip == NULL || depth > RIB6_MAXDEPTH)) {
rte_errno = EINVAL;
return NULL;
}
@@ -210,7 +210,7 @@ rte_rib6_get_nxt(struct rte_rib6 *rib,
uint8_t tmp_ip[RTE_RIB6_IPV6_ADDR_SIZE];
int i;
  
-	if ((rib == NULL) || (ip == NULL) || (depth > RIB6_MAXDEPTH)) {

+   if (unlikely(rib == NULL || ip == NULL || depth > RIB6_MAXDEPTH)) {
rte_errno = EINVAL;
return NULL;
}
@@ -293,8 +293,7 @@ rte_rib6_insert(struct rte_rib6 *rib,
int i, d;
uint8_t common_depth, ip_xor;
  
-	if (unlikely((rib == NULL) || (ip == NULL) ||

-   (depth > RIB6_MAXDEPTH))) {
+   if (unlikely((rib == NULL || ip == NULL || depth > RIB6_MAXDEPTH))) {
rte_errno = EINVAL;
return NULL;
}
@@ -413,7 +412,7 @@ int
  rte_rib6_get_ip(const struct rte_rib6_node *node,
uint8_t ip[RTE_RIB6_IPV6_ADDR_SIZE])
  {
-   if ((node == NULL) || (ip == NULL)) {
+   if (unlikely(node == NULL || ip == NULL)) {
rte_errno = EINVAL;
return -1;
}
@@ -424,7 +423,7 @@ rte_rib6_get_ip(const struct rte_rib6_node *node,
  int
  rte_rib6_get_depth(const struct rte_rib6_node *node, uint8_t *depth)
  {
-   if ((node == NULL) || (depth == NULL)) {
+   if (unlikely(node == NULL || depth == NULL)) {
rte_errno = EINVAL;
return -1;
}
@@ -441,7 +440,7 @@ rte_rib6_get_ext(struct rte_rib6_node *node)
  int
  rte_rib6_get_nh(const struct rte_rib6_node *node, uint64_t *nh)
  {
-   if ((node == NULL) || (nh == NULL)) {
+   if (unlikely(node == NULL || nh == NULL)) {
rte_errno = EINVAL;
return -1;
}
@@ -452,7 +451,7 @@ rte_rib6_get_nh(const struct rte_rib6_node *node, uint64_t 
*nh)
  int
  rte_rib6_set_nh(struct rte_rib6_node *node, uint64_t nh)
  {
-   if (node == NULL) {
+   if (unlikely(node == NULL)) {
rte_errno = EINVAL;
return -1;
}
@@ -471,7 +470,7 @@ rte_rib6_create(const char *name, int socket_id,
struct rte_mempool *node_pool;
  
  	/* Check user arguments. */

-   if (name == NULL || conf == NULL || conf->max_nodes <= 0) {
+   if (unlikely(name == NULL || conf == NULL || conf->max_nodes <= 0)) {
rte_errno = EINVAL;
return NULL;
}
@@ -506,7 +505,7 @@ rte_rib6_create(const char *name, int socket_id,
  
  	/* allocate tailq entry */

te = rte_zmalloc("RIB6_TAILQ_ENTRY", sizeof(*te), 0);
-   if (te == NULL) {
+   if (unlikely(te == NULL)) {
RTE_LOG(ERR, LPM,
"Can not allocate tailq entry for RIB6 %s\n", name);
rte_errno = ENOMEM;
@@ -516,7 +515,7 @@ rte_rib6_create(const char *name, int socket_id,
/* Allocate memory to store the RIB6 data structures. */
rib = rte_zmalloc_socket(mem_name,
sizeof(struct rte_rib6), RTE_CACHE_LINE_SIZE, socket_id);
-   if (rib == NULL) {
+   if (unlikely(rib == NULL)) {
RTE_LOG(ERR, LPM, "RIB6 %s memory allocation failed\n", name);
rte_errno = ENOMEM;
goto free_te;


Acked-by: Vladimir Medvedkin 

--
Regards,
Vladimir


Re: [PATCH] rib: fix traversal with /32 route

2022-04-26 Thread Medvedkin, Vladimir

+Cc:sta...@dpdk.org

On 14/04/2022 21:01, Stephen Hemminger wrote:

If a /32 route is entered in the RIB the code to traverse
will not see that a a end of the tree. This is due to trying
to do a negative shift which is an undefined in C.

Fix by checking for max depth as is already done in rib6.

Signed-off-by: Stephen Hemminger 
---
  lib/rib/rte_rib.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/lib/rib/rte_rib.c b/lib/rib/rte_rib.c
index cd9e823068d2..0603980cabd2 100644
--- a/lib/rib/rte_rib.c
+++ b/lib/rib/rte_rib.c
@@ -71,6 +71,8 @@ is_covered(uint32_t ip1, uint32_t ip2, uint8_t depth)
  static inline struct rte_rib_node *
  get_nxt_node(struct rte_rib_node *node, uint32_t ip)
  {
+   if (node->depth == RIB_MAXDEPTH)
+   return NULL;
return (ip & (1 << (31 - node->depth))) ? node->right : node->left;
  }
  


Acked-by: Vladimir Medvedkin 

--
Regards,
Vladimir


Re: [PATCH] rib: fix traversal with /32 route

2022-04-26 Thread Medvedkin, Vladimir

Fixes: 5a5793a5ffa2 ("rib: add RIB library")

On 26/04/2022 15:28, Medvedkin, Vladimir wrote:

+Cc:sta...@dpdk.org

On 14/04/2022 21:01, Stephen Hemminger wrote:

If a /32 route is entered in the RIB the code to traverse
will not see that a a end of the tree. This is due to trying
to do a negative shift which is an undefined in C.

Fix by checking for max depth as is already done in rib6.

Signed-off-by: Stephen Hemminger 
---
  lib/rib/rte_rib.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/lib/rib/rte_rib.c b/lib/rib/rte_rib.c
index cd9e823068d2..0603980cabd2 100644
--- a/lib/rib/rte_rib.c
+++ b/lib/rib/rte_rib.c
@@ -71,6 +71,8 @@ is_covered(uint32_t ip1, uint32_t ip2, uint8_t depth)
  static inline struct rte_rib_node *
  get_nxt_node(struct rte_rib_node *node, uint32_t ip)
  {
+    if (node->depth == RIB_MAXDEPTH)
+    return NULL;
  return (ip & (1 << (31 - node->depth))) ? node->right : node->left;
  }


Acked-by: Vladimir Medvedkin 



--
Regards,
Vladimir


Re: [EXT] Re: [PATCH v3 0/5] Add JSON vector set support to fips validation

2022-04-26 Thread Brandon Lo
Hi Gowrishankar,

I apologize for the late response. I have not worked on the AES-CBC
implementation, so you are free to go ahead.
Please let me know if you run into any issues that I can help with.

Thanks,
Brandon

On Thu, Apr 21, 2022 at 4:02 AM Gowrishankar Muthukrishnan
 wrote:
>
> Hi Brandon,
> Following some cleanup patches I have posted against examples/fips, I would 
> like to take enabling AES_CBC in fips validation.
> Please let me know if you/anyone have already have WIP for the same, before I 
> proceed.
>
> Thanks,
> Gowrishankar
>
> > -Original Message-
> > From: Brandon Lo 
> > Sent: Thursday, April 14, 2022 7:12 PM
> > To: dev ; Zhang, Roy Fan ;
> > Power, Ciara 
> > Subject: [EXT] Re: [PATCH v3 0/5] Add JSON vector set support to fips
> > validation
> >
> > External Email
> >
> > --
> > Adding the dev mailing list back into this discussion.
> >
> > On Wed, Apr 13, 2022 at 9:13 AM Brandon Lo  wrote:
> > >
> > > Hi guys,
> > >
> > > Lincoln and I would like to know if we can get this patch set looked
> > > at and merged before submitting the rest of the algorithms. So far,
> > > I've worked on implementing the HMAC and CMAC tests, but I keep
> > > getting pulled away by some requests from the community. This patchset
> > > does not seem to break backward compatibility, so merging it will only
> > > lead to more coverage from the UNH lab. It may also be easier to
> > > review since it isn't going to be one huge patchset that needs to be
> > > looked at in the future.
> > >
> > > On Thu, Feb 17, 2022 at 7:47 AM Brandon Lo  wrote:
> > > >
> > > > On Fri, Feb 11, 2022 at 9:16 AM Brandon Lo  wrote:
> > > > > I only have the AES-GCM algorithm implemented because the current
> > > > > implementations of the other algorithms require some extra
> > > > > information than what comes with the JSON format in the API.
> > > > > For example, I couldn't find the JSON counterpart for things like
> > > > > fips_validation_sha.c's "MD =" or "Seed =" as well as
> > > > > fips_validation_ccm.c's extra test types like CCM-DVPT, CCM-VADT,
> > etc.
> > > > > just to name a few.
> > > > > This could very well be due to my inexperience with the FIPS
> > > > > validation, and I definitely plan to take another look at it again.
> > > > >
> > > > > My assumption is that the JSON version of FIPS validation files
> > > > > isn't used as much as the old CAVP format, so I am more aiming
> > > > > towards getting something working in the lab first and then
> > > > > expanding on it later.
> > > >
> > > > Hi all,
> > > >
> > > > Could I get someone to look at this patch set?
> > > > The UNH lab is ready to deploy FIPS testing on patches that affect
> > > > the crypto portion of DPDK.
> > > >
> > > > Thanks,
> > > > Brandon
> > > >
> > > >
> > > > --
> > > > Brandon Lo
> > > > UNH InterOperability Laboratory
> > > > 21 Madbury Rd, Suite 100, Durham, NH 03824 b...@iol.unh.edu
> > > > https://urldefense.proofpoint.com/v2/url?u=http-
> > 3A__www.iol.unh.edu&
> > > > d=DwIBaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=EAtr-
> > g7yUFhtOio8r2Rtm13Aqe4WVp_S
> > > >
> > _gHpcu6KFVo&m=35t4n1T3FnlAkNla3EmGLgWSAhIknbuvLgguNAXKjN0xCMs
> > cV7HXyJ
> > > >
> > 95BftFMJJJ&s=GVCZy3E9sE9H23TSCEcLyQoT4zxNQ4pyameEW76PZno&e=
> > >
> > >
> > >
> > > --
> > > Brandon Lo
> > > UNH InterOperability Laboratory
> > > 21 Madbury Rd, Suite 100, Durham, NH 03824 b...@iol.unh.edu
> > > https://urldefense.proofpoint.com/v2/url?u=http-
> > 3A__www.iol.unh.edu&d=
> > > DwIBaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=EAtr-
> > g7yUFhtOio8r2Rtm13Aqe4WVp_S_gHp
> > >
> > cu6KFVo&m=35t4n1T3FnlAkNla3EmGLgWSAhIknbuvLgguNAXKjN0xCMscV7H
> > XyJ95BftF
> > > MJJJ&s=GVCZy3E9sE9H23TSCEcLyQoT4zxNQ4pyameEW76PZno&e=
> >
> >
> >
> > --
> > Brandon Lo
> > UNH InterOperability Laboratory
> > 21 Madbury Rd, Suite 100, Durham, NH 03824 b...@iol.unh.edu
> > https://urldefense.proofpoint.com/v2/url?u=http-
> > 3A__www.iol.unh.edu&d=DwIBaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=EAtr-
> > g7yUFhtOio8r2Rtm13Aqe4WVp_S_gHpcu6KFVo&m=35t4n1T3FnlAkNla3EmG
> > LgWSAhIknbuvLgguNAXKjN0xCMscV7HXyJ95BftFMJJJ&s=GVCZy3E9sE9H23T
> > SCEcLyQoT4zxNQ4pyameEW76PZno&e=



-- 
Brandon Lo
UNH InterOperability Laboratory
21 Madbury Rd, Suite 100, Durham, NH 03824
b...@iol.unh.edu
www.iol.unh.edu


Re: Reuse Of lcore after returning from its worker thread

2022-04-26 Thread Stephen Hemminger
On Wed, 20 Apr 2022 17:52:20 +0530
Ansar Kannankattil  wrote:

> Hi,
> As per my understanding "*rte_eal_wait_lcore" *is a blocking call in case
> of lcore state running.
> 1. Is there any direct way to reuse the lcore which we returned from a
> worker thread?
> 2. Technically is there any issue in reusing the lcore by some means?

Yes just relaunch with new work function.


Re: [RFC] eal: allow worker lcore stacks to be allocated from hugepage memory

2022-04-26 Thread Stephen Hemminger
On Tue, 26 Apr 2022 08:19:59 -0400
Don Wallwork  wrote:

> Add support for using hugepages for worker lcore stack memory.  The
> intent is to improve performance by reducing stack memory related TLB
> misses and also by using memory local to the NUMA node of each lcore.
> 
> Platforms desiring to make use of this capability must enable the
> associated option flag and stack size settings in platform config
> files.
> ---
>  lib/eal/linux/eal.c | 39 +++
>  1 file changed, 39 insertions(+)
> 

Good idea but having a fixed size stack makes writing complex application
more difficult. Plus you lose the safety of guard pages.


Re: [PATCH] ci: do not dump error logs in GHA containers

2022-04-26 Thread Aaron Conole
David Marchand  writes:

> On Tue, Apr 26, 2022 at 9:09 AM David Marchand
>  wrote:
>>
>> On error, the build logs are displayed in GHA console and logs unless
>> the GITHUB_WORKFLOW env variable is set.
>> However, containers in GHA do not automatically inherit this variable.
>> We could pass this variable in the container environment, but in the
>> end, dumping those logs is only for Travis which we don't really care
>> about anymore.
>>
>> Let's make the linux-build.sh more generic and dump logs from Travis
>> yaml itself.
>>
>> Fixes: b35c4b0aa2bc ("ci: add Fedora 35 container in GHA")
>>
>> Signed-off-by: David Marchand 
>
> TBH, I did not test Travis by lack of interest (plus I don't want to
> be bothered with their ui / credit stuff).
> We could consider dropping Travis in the near future.
>
> Opinions?

I think it makes sense.  We haven't had travis reports in a while
because their credit system made it impossible to use.  We had kept it
around for users of travis, but at this point, I think most people have
migrated to GHA.



Re: [PATCH 1/2] ci: switch to Ubuntu 20.04

2022-04-26 Thread Aaron Conole
David Marchand  writes:

> Ubuntu 18.04 is now rather old.
> Besides, other entities in our CI are also testing this distribution.
>
> Switch to a newer Ubuntu release and benefit from more recent
> tool(chain)s: for example, net/cnxk now builds fine and can be
> re-enabled.
>
> Signed-off-by: David Marchand 
> ---

LGTM
Acked-by: Aaron Conole 



Re: [PATCH 2/2] ci: add mingw cross compilation in GHA

2022-04-26 Thread Aaron Conole
David Marchand  writes:

> Add mingw cross compilation in our public CI so that users with their
> own github repository have a first level of checks for Windows compilation
> before submitting to the mailing list.
> This does not replace our better checks in other entities of the CI.
>
> Only the helloworld example is compiled (same as what is tested in
> test-meson-builds.sh).
>
> Note: the mingw cross compilation toolchain (version 5.0) in Ubuntu
> 18.04 was broken (missing a ENOMSG definition).
>
> Signed-off-by: David Marchand 
> ---

Acked-by: Aaron Conole 



Re: [PATCH 2/3] mem: fix ASan shadow for remapped memory segments

2022-04-26 Thread Burakov, Anatoly

On 26-Apr-22 3:15 PM, David Marchand wrote:

On Tue, Apr 26, 2022 at 2:54 PM Burakov, Anatoly
 wrote:

@@ -1040,9 +1040,25 @@ malloc_heap_free(struct malloc_elem *elem)

  rte_mcfg_mem_write_unlock();
   free_unlock:
-   /* Poison memory range if belonging to some still mapped
pages. */
-   if (!unmapped_pages)
+   if (!unmapped_pages) {
  asan_set_freezone(asan_ptr, asan_data_len);
+   } else {
+   /*
+* We may be in a situation where we unmapped pages
like this:
+* malloc header | free space | unmapped space | free
space | malloc header
+*/
+   void *free1_start = asan_ptr;
+   void *free1_end = aligned_start;
+   void *free2_start = RTE_PTR_ADD(aligned_start,
aligned_len);
+   void *free2_end = RTE_PTR_ADD(asan_ptr, asan_data_len);
+
+   if (free1_start < free1_end)
+   asan_set_freezone(free1_start,
+   RTE_PTR_DIFF(free1_end, free1_start));
+   if (free2_start < free2_end)
+   asan_set_freezone(free2_start,
+   RTE_PTR_DIFF(free2_end, free2_start));
+   }

  rte_spinlock_unlock(&(heap->lock));
  return ret;



Something like that, yes. I will have to think through this a bit more,
especially in light of your func_reentrancy splat :)



So, the reason splat in func_reentrancy test happens is as follows: the
above patch is sorta correct (i have a different one but does the same
thing), but incomplete. What happens then is when we add new memory, we
are integrating it into our existing malloc heap, which triggers
`malloc_elem_join_adjacent_free()` which will trigger a write into old
header space being merged, which may be marked as "freed". So, again we
are hit with our internal allocator messing with ASan.


I ended up with the same conclusion.
Thanks for confirming.




To properly fix this is to answer the following question: what is the
goal of having ASan support in DPDK? Is it there to catch bugs *in the
allocator*, or can we just trust that our allocator code is correct, and
only concern ourselves with user-allocated areas of the code? Because it


The best would be to handle both.
I don't think clang disables ASan for the instrumentations on malloc.


I've actually prototyped these changes a bit. We use memset in a few 
places, and that one can't be disabled as far as i can tell (not without 
blacklisting memset for entire DPDK).






seems like the best way to address this issue would be to just avoid
triggering ASan checks for certain allocator-internal actions: this way,
we don't need to care what allocator itself does, just what user code
does. As in, IIRC there was a compiler attribute that disables ASan
checks for a specific function: perhaps we could just wrap certain
access in that and be done with it?

What do you think?


It is tempting because it is the easiest way to avoid the issue.
Though, by waiving those checks in the allocator, does it leave the
ASan shadow in a consistent state?



The "consistent state" is kinda difficult to achieve because there is no 
"default" state for memory - sometimes it comes as available (0x00), 
sometimes it is marked as already freed (0xFF). So, coming into a malloc 
function, we don't know whether the memory we're about to mess with is 
0x00 or 0xFF.


What we could do is mark every malloc header with 0xFF regardless of its 
status, and leave the rest to "regular" zoning. This would be strange 
from ASan's point of view (because we're marking memory as "freed" when 
it wasn't ever allocated), but at least this would be consistent :D


--
Thanks,
Anatoly


Re: [PATCH] net/pcap: support MTU set

2022-04-26 Thread Ferruh Yigit

On 3/22/2022 1:02 PM, Ido Goshen wrote:

This test 
https://doc.dpdk.org/dts/test_plans/jumboframes_test_plan.html#test-case-jumbo-frames-with-no-jumbo-frame-support
 fails for pcap pmd
Jumbo packet is unexpectedly received and transmitted



Hi Ido,

Yes, pcap ignores MTU, but I don't see why it should use MTU (except 
from making above DTS test pass).


For the cases packets written to .pcap file or read from a .pcap file, 
most probably user is interested in all packets, I don't think using MTU 
to filter the packets is a good idea, missing packets (because of MTU) 
can confuse users.


Unless there is a good use case, I am for rejecting this feature.



without patch:

root@u18c_3nbp:/home/cgs/workspace/master/jumbo# ./dpdk-testpmd --no-huge 
-m1024 -l 0-2  
--vdev='net_pcap0,rx_pcap=rx_pcap=jumbo_9000.pcap,tx_pcap=file_tx.pcap' -- 
--no-flush-rx --total-num-mbufs=2048 -i
...
testpmd> start
...
testpmd> show port stats 0

    NIC statistics for port 0  
   RX-packets: 1  RX-missed: 0  RX-bytes:  8996
   RX-errors: 0
   RX-nombuf:  0
   TX-packets: 1  TX-errors: 0  TX-bytes:  8996

   Throughput (since last show)
   Rx-pps:0  Rx-bps:0
   Tx-pps:0  Tx-bps:0
   


While with the patch it will fail unless --max-pkt-len is used to support jumbo

root@u18c_3nbp:/home/cgs/workspace/master/jumbo# ./dpdk-testpmd-patch --no-huge 
-m1024 -l 0-2  
--vdev='net_pcap0,rx_pcap=rx_pcap=jumbo_9000.pcap,tx_pcap=file_tx.pcap' -- 
--no-flush-rx --total-num-mbufs=2048 -i
...
testpmd> start
...
testpmd> show port stats 0

    NIC statistics for port 0  
   RX-packets: 0  RX-missed: 0  RX-bytes:  0
   RX-errors: 1
   RX-nombuf:  0
   TX-packets: 0  TX-errors: 0  TX-bytes:  0

   Throughput (since last show)
   Rx-pps:0  Rx-bps:0
   Tx-pps:0  Tx-bps:0
   

root@u18c_3nbp:/home/cgs/workspace/master/jumbo# ./dpdk-testpmd-patch --no-huge 
-m1024 -l 0-2  
--vdev='net_pcap0,rx_pcap=rx_pcap=jumbo_9000.pcap,tx_pcap=file_tx.pcap' -- 
--no-flush-rx --total-num-mbufs=2048 -i --max-pkt-len 9400
...
testpmd> start
...
testpmd> show port stats 0

    NIC statistics for port 0  
   RX-packets: 1  RX-missed: 0  RX-bytes:  8996
   RX-errors: 0
   RX-nombuf:  0
   TX-packets: 1  TX-errors: 0  TX-bytes:  8996

   Throughput (since last show)
   Rx-pps:0  Rx-bps:0
   Tx-pps:0  Tx-bps:0
   


-Original Message-
From: Ido Goshen
Sent: Thursday, 17 March 2022 21:12
To: Stephen Hemminger 
Cc: Ferruh Yigit ; dev@dpdk.org
Subject: RE: [PATCH] net/pcap: support MTU set

As far as I can see the initial device MTU is derived from port *RX* 
configuration
in struct rte_eth_rxmode https://doc.dpdk.org/api-
21.11/structrte__eth__rxmode.html
Couple of real NICs I've tested (ixgbe, i40e based) don't allow oversized, tests
details can be seen in https://bugs.dpdk.org/show_bug.cgi?id=961


-Original Message-
From: Stephen Hemminger 
Sent: Thursday, 17 March 2022 20:21
To: Ido Goshen 
Cc: Ferruh Yigit ; dev@dpdk.org
Subject: Re: [PATCH] net/pcap: support MTU set

On Thu, 17 Mar 2022 19:43:47 +0200
ido g  wrote:


+   if (unlikely(header.caplen > dev->data->mtu)) {
+   pcap_q->rx_stat.err_pkts++;
+   rte_pktmbuf_free(mbuf);
+   break;
+   }


MTU should only be enforced on transmit.
Other real network devices allow oversized packets.

Since the pcap file is something user provides, if you don't want that
then use something to filter the file.




[PATCH 0/2] ACL fix 8B field

2022-04-26 Thread Konstantin Ananyev
Fix problem with 8B fields and extend test-acl test coverage.

Konstantin Ananyev (2):
  acl: fix rules with 8 bytes field size are broken
  app/acl: support different formats for IPv6 address

 app/test-acl/main.c | 355 ++--
 lib/acl/acl_bld.c   |  14 +-
 2 files changed, 286 insertions(+), 83 deletions(-)

-- 
2.34.1



[PATCH 2/2] app/acl: support different formats for IPv6 address

2022-04-26 Thread Konstantin Ananyev
Within ACL rule IPv6 address can be represented in different ways:
either as 4x4B fields, or as 2x8B fields.
Till now, only first format was supported.
Extend test-acl to support both formats, mainly for testing and
demonstrating purposes.
To control desired behavior '--ipv6' command-line option is extend
to accept an optional argument:
To be more precise:
 '--ipv6'- use 4x4B fields format (default behavior)
 '--ipv6=4B' - use 4x4B fields format (default behavior)
 '--ipv6=8B' - use 2x8B fields format
app/acl: use posix functions for network address parsing

Also replaced home brewed IPv4/IPv6 address parsing with inet_pton() calls.

Signed-off-by: Konstantin Ananyev 
---
 app/test-acl/main.c | 355 ++--
 1 file changed, 276 insertions(+), 79 deletions(-)

diff --git a/app/test-acl/main.c b/app/test-acl/main.c
index 06e3847ab9..41ce83db08 100644
--- a/app/test-acl/main.c
+++ b/app/test-acl/main.c
@@ -57,6 +57,12 @@ enum {
DUMP_MAX
 };
 
+enum {
+   IPV6_FRMT_NONE,
+   IPV6_FRMT_U32,
+   IPV6_FRMT_U64,
+};
+
 struct acl_alg {
const char *name;
enum rte_acl_classify_alg alg;
@@ -123,7 +129,7 @@ static struct {
.name = "default",
.alg = RTE_ACL_CLASSIFY_DEFAULT,
},
-   .ipv6 = 0
+   .ipv6 = IPV6_FRMT_NONE,
 };
 
 static struct rte_acl_param prm = {
@@ -210,6 +216,7 @@ struct rte_acl_field_def ipv4_defs[NUM_FIELDS_IPV4] = {
 #defineIPV6_ADDR_LEN   16
 #defineIPV6_ADDR_U16   (IPV6_ADDR_LEN / sizeof(uint16_t))
 #defineIPV6_ADDR_U32   (IPV6_ADDR_LEN / sizeof(uint32_t))
+#defineIPV6_ADDR_U64   (IPV6_ADDR_LEN / sizeof(uint64_t))
 
 struct ipv6_5tuple {
uint8_t  proto;
@@ -219,6 +226,7 @@ struct ipv6_5tuple {
uint16_t port_dst;
 };
 
+/* treat IPV6 address as uint32_t[4] (default mode) */
 enum {
PROTO_FIELD_IPV6,
SRC1_FIELD_IPV6,
@@ -234,6 +242,27 @@ enum {
NUM_FIELDS_IPV6
 };
 
+/* treat IPV6 address as uint64_t[2] (default mode) */
+enum {
+   PROTO_FIELD_IPV6_U64,
+   SRC1_FIELD_IPV6_U64,
+   SRC2_FIELD_IPV6_U64,
+   DST1_FIELD_IPV6_U64,
+   DST2_FIELD_IPV6_U64,
+   SRCP_FIELD_IPV6_U64,
+   DSTP_FIELD_IPV6_U64,
+   NUM_FIELDS_IPV6_U64
+};
+
+enum {
+   PROTO_INDEX_IPV6_U64 = PROTO_FIELD_IPV6_U64,
+   SRC1_INDEX_IPV6_U64 = SRC1_FIELD_IPV6_U64,
+   SRC2_INDEX_IPV6_U64 = SRC2_FIELD_IPV6_U64 + 1,
+   DST1_INDEX_IPV6_U64 = DST1_FIELD_IPV6_U64 + 2,
+   DST2_INDEX_IPV6_U64 = DST2_FIELD_IPV6_U64 + 3,
+   PRT_INDEX_IPV6_U64 = SRCP_FIELD_IPV6 + 4,
+};
+
 struct rte_acl_field_def ipv6_defs[NUM_FIELDS_IPV6] = {
{
.type = RTE_ACL_FIELD_TYPE_BITMASK,
@@ -314,6 +343,57 @@ struct rte_acl_field_def ipv6_defs[NUM_FIELDS_IPV6] = {
},
 };
 
+struct rte_acl_field_def ipv6_u64_defs[NUM_FIELDS_IPV6_U64] = {
+   {
+   .type = RTE_ACL_FIELD_TYPE_BITMASK,
+   .size = sizeof(uint8_t),
+   .field_index = PROTO_FIELD_IPV6_U64,
+   .input_index = PROTO_FIELD_IPV6_U64,
+   .offset = offsetof(struct ipv6_5tuple, proto),
+   },
+   {
+   .type = RTE_ACL_FIELD_TYPE_MASK,
+   .size = sizeof(uint64_t),
+   .field_index = SRC1_FIELD_IPV6_U64,
+   .input_index = SRC1_INDEX_IPV6_U64,
+   .offset = offsetof(struct ipv6_5tuple, ip_src[0]),
+   },
+   {
+   .type = RTE_ACL_FIELD_TYPE_MASK,
+   .size = sizeof(uint64_t),
+   .field_index = SRC2_FIELD_IPV6_U64,
+   .input_index = SRC2_INDEX_IPV6_U64,
+   .offset = offsetof(struct ipv6_5tuple, ip_src[2]),
+   },
+   {
+   .type = RTE_ACL_FIELD_TYPE_MASK,
+   .size = sizeof(uint64_t),
+   .field_index = DST1_FIELD_IPV6_U64,
+   .input_index = DST1_INDEX_IPV6_U64,
+   .offset = offsetof(struct ipv6_5tuple, ip_dst[0]),
+   },
+   {
+   .type = RTE_ACL_FIELD_TYPE_MASK,
+   .size = sizeof(uint64_t),
+   .field_index = DST2_FIELD_IPV6_U64,
+   .input_index = DST2_INDEX_IPV6_U64,
+   .offset = offsetof(struct ipv6_5tuple, ip_dst[2]),
+   },
+   {
+   .type = RTE_ACL_FIELD_TYPE_RANGE,
+   .size = sizeof(uint16_t),
+   .field_index = SRCP_FIELD_IPV6_U64,
+   .input_index = PRT_INDEX_IPV6_U64,
+   .offset = offsetof(struct ipv6_5tuple, port_src),
+   },
+   {
+   .type = RTE_ACL_FIELD_TYPE_RANGE,
+   .size = sizeof(uint16_t),
+   .field_index = DSTP_FIELD_IPV6_U64,
+   .input_index = PRT_INDEX_IPV6_U64,
+   .offset = offsetof(struct ipv6_5tuple, port_dst),
+   },
+};
 
 enum {
CB_FLD_SRC_ADDR,
@@ -385,49 +465,11 @@ pa

[PATCH 1/2] acl: fix rules with 8 bytes field size are broken

2022-04-26 Thread Konstantin Ananyev
In theory ACL library allows fields with 8B long.
Though in practice they usually not used, not tested,
and as was revealed by Ido, this functionality is not working properly.
There are few places inside ACL build code-path that need to be addressed.

Bugzilla ID: 673
Fixes: dc276b5780c2 ("acl: new library")
Cc: sta...@dpdk.org

Reported-by: Ido Goshen 
Signed-off-by: Konstantin Ananyev 
---
 lib/acl/acl_bld.c | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/lib/acl/acl_bld.c b/lib/acl/acl_bld.c
index 7ea30f4186..2816632803 100644
--- a/lib/acl/acl_bld.c
+++ b/lib/acl/acl_bld.c
@@ -12,6 +12,9 @@
 /* number of pointers per alloc */
 #define ACL_PTR_ALLOC  32
 
+/* account for situation when all fields are 8B long */
+#define ACL_MAX_INDEXES(2 * RTE_ACL_MAX_FIELDS)
+
 /* macros for dividing rule sets heuristics */
 #define NODE_MAX   0x4000
 #define NODE_MIN   0x800
@@ -80,7 +83,7 @@ struct acl_build_context {
struct tb_mem_poolpool;
struct rte_acl_trie   tries[RTE_ACL_MAX_TRIES];
struct rte_acl_bld_trie   bld_tries[RTE_ACL_MAX_TRIES];
-   uint32_tdata_indexes[RTE_ACL_MAX_TRIES][RTE_ACL_MAX_FIELDS];
+   uint32_tdata_indexes[RTE_ACL_MAX_TRIES][ACL_MAX_INDEXES];
 
/* memory free lists for nodes and blocks used for node ptrs */
struct acl_mem_block  blocks[MEM_BLOCK_NUM];
@@ -988,7 +991,7 @@ build_trie(struct acl_build_context *context, struct 
rte_acl_build_rule *head,
 */
uint64_t mask;
mask = RTE_ACL_MASKLEN_TO_BITMASK(
-   fld->mask_range.u32,
+   fld->mask_range.u64,
rule->config->defs[n].size);
 
/* gen a mini-trie for this field */
@@ -1301,6 +1304,9 @@ acl_build_index(const struct rte_acl_config *config, 
uint32_t *data_index)
if (last_header != config->defs[n].input_index) {
last_header = config->defs[n].input_index;
data_index[m++] = config->defs[n].offset;
+   if (config->defs[n].size > sizeof(uint32_t))
+   data_index[m++] = config->defs[n].offset +
+   sizeof(uint32_t);
}
}
 
@@ -1487,7 +1493,7 @@ acl_set_data_indexes(struct rte_acl_ctx *ctx)
memcpy(ctx->data_indexes + ofs, ctx->trie[i].data_index,
n * sizeof(ctx->data_indexes[0]));
ctx->trie[i].data_index = ctx->data_indexes + ofs;
-   ofs += RTE_ACL_MAX_FIELDS;
+   ofs += ACL_MAX_INDEXES;
}
 }
 
@@ -1643,7 +1649,7 @@ rte_acl_build(struct rte_acl_ctx *ctx, const struct 
rte_acl_config *cfg)
/* allocate and fill run-time  structures. */
rc = rte_acl_gen(ctx, bcx.tries, bcx.bld_tries,
bcx.num_tries, bcx.cfg.num_categories,
-   RTE_ACL_MAX_FIELDS * RTE_DIM(bcx.tries) *
+   ACL_MAX_INDEXES * RTE_DIM(bcx.tries) *
sizeof(ctx->data_indexes[0]), max_size);
if (rc == 0) {
/* set data indexes. */
-- 
2.34.1



Fwd: Does ACL support field size of 8 bytes?

2022-04-26 Thread Konstantin Ananyev







Hi Ido,


I've lots of good experience with ACL but can't make it work with u64 values
I know it can be split to 2xu32 fields, but it makes it more complex to use and 
a wastes double  number of fields (we hit the
RTE_ACL_MAX_FIELDS 64 limit)


Wow, that's a lot of fields...


According to the documentation and rte_acl.h fields size can be 8 bytes (u64)
e.g.
  'The size parameter defines the length of the field in bytes. Allowable 
values are 1, 2, 4, or 8 bytes.'
  (from 
https://doc.dpdk.org/guides-21.11/prog_guide/packet_classif_access_ctrl.html#rule-definition)

Though there's a hint it's less recommended
  'Also, it is best to define fields of 8 or more bytes as 4 byte fields so 
that the build processes can eliminate fields that are all wild.'

It's also not clear how it fits in a group (i.e. what's input_index stride) 
which is only 4 bytes
'All subsequent fields has to be grouped into sets of 4 consecutive bytes.'

I couldn't find any example or test app that's using 8 bytes
e.g. for IPv6 address 4xu32 fields are always used and not 2xu64

Should it work?
Did anyone try it successfully and/or can share an example?


You are right: though it is formally supported, we do not test it, and 
AFAIK no-one used it till now.
As we do group fields by 4B long chunks anyway, 8B field is sort of 
awkward and confusing.
To be honest, I don't even remember what was the rationale beyond 
introducing it at first place.
Anyway, just submitted patches that should fix 8B field support (at 
least it works for me now):

https://patches.dpdk.org/project/dpdk/list/?series=22676
Please give it a try.
In long term it probably would be good to hear from you and other users, 
should we keep 8B

support at all, or might be it would be easier just to abandon it.
Thanks
Konstantin






Re: [PATCH V2 1/4] net/bonding: fix non-active slaves aren't stopped

2022-04-26 Thread Ferruh Yigit

On 3/24/2022 3:00 AM, Min Hu (Connor) wrote:

From: Huisong Li 

When stopping a bonded port, all slaves should be deactivated. But only


s/deactivated/stopped/ ?


active slaves are stopped. So fix it and do "deactivae_slave()" for active


s/deactivae_slave()/deactivate_slave()/


slaves.


Hi Connor,

When a bonding port is closed, is it clear if all slave ports or active 
slave ports should be stopped?




Fixes: 0911d4ec0183 ("net/bonding: fix crash when stopping mode 4 port")
Cc: sta...@dpdk.org

Signed-off-by: Huisong Li 
Signed-off-by: Min Hu (Connor) 
---
  drivers/net/bonding/rte_eth_bond_pmd.c | 20 +++-
  1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c 
b/drivers/net/bonding/rte_eth_bond_pmd.c
index b305b6a35b..469dc71170 100644
--- a/drivers/net/bonding/rte_eth_bond_pmd.c
+++ b/drivers/net/bonding/rte_eth_bond_pmd.c
@@ -2118,18 +2118,20 @@ bond_ethdev_stop(struct rte_eth_dev *eth_dev)
internals->link_status_polling_enabled = 0;
for (i = 0; i < internals->slave_count; i++) {
uint16_t slave_id = internals->slaves[i].port_id;
+
+   internals->slaves[i].last_link_status = 0;
+   ret = rte_eth_dev_stop(slave_id);
+   if (ret != 0) {
+   RTE_BOND_LOG(ERR, "Failed to stop device on port %u",
+slave_id);
+   return ret;


Should it return here or try to stop all ports?
What about to record the return status, but keep continue to stop all 
ports. And return error if any of the stop failed?



+   }
+
+   /* active slaves need to deactivate. */


" active slaves need to be deactivated. " ?


if (find_slave_by_id(internals->active_slaves,
internals->active_slave_count, slave_id) !=
-   internals->active_slave_count) {
-   internals->slaves[i].last_link_status = 0;
-   ret = rte_eth_dev_stop(slave_id);
-   if (ret != 0) {
-   RTE_BOND_LOG(ERR, "Failed to stop device on port 
%u",
-slave_id);
-   return ret;
-   }
+   internals->active_slave_count)


I think original indentation for this line is better.


deactivate_slave(eth_dev, slave_id);
-   }
}
  
  	return 0;




Re: [PATCH V2 2/4] net/bonding: fix non-terminable while loop

2022-04-26 Thread Ferruh Yigit

On 3/24/2022 3:00 AM, Min Hu (Connor) wrote:

From: Huisong Li 

All slaves will be stopped and removed when closing a bonded port. But the
while loop can not stop if both rte_eth_dev_stop and
rte_eth_bond_slave_remove fail to run.



Agree that this is a defect introduced in below commit. Thanks for the fix.


Fixes: fb0379bc5db3 ("net/bonding: check stop call status")
Cc: sta...@dpdk.org

Signed-off-by: Huisong Li 
Signed-off-by: Min Hu (Connor) 
---
  drivers/net/bonding/rte_eth_bond_pmd.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c 
b/drivers/net/bonding/rte_eth_bond_pmd.c
index 469dc71170..00d4deda44 100644
--- a/drivers/net/bonding/rte_eth_bond_pmd.c
+++ b/drivers/net/bonding/rte_eth_bond_pmd.c
@@ -2149,13 +2149,14 @@ bond_ethdev_close(struct rte_eth_dev *dev)
return 0;
  
  	RTE_BOND_LOG(INFO, "Closing bonded device %s", dev->device->name);

-   while (internals->slave_count != skipped) {
+   while (skipped < internals->slave_count) {


When below fixed with adding 'continue', no need to change the check, 
right? Although new one is also correct.



uint16_t port_id = internals->slaves[skipped].port_id;
  
  		if (rte_eth_dev_stop(port_id) != 0) {

RTE_BOND_LOG(ERR, "Failed to stop device on port %u",
 port_id);
skipped++;
+   continue;


Can't we remove the slave even if 'stop()' failed? If so I think better 
to just log the error and keep continue in that case, what do you think?



}
  
  		if (rte_eth_bond_slave_remove(bond_port_id, port_id) != 0) {




Re: [RFC] eal: allow worker lcore stacks to be allocated from hugepage memory

2022-04-26 Thread Don Wallwork




On 4/26/2022 10:58 AM, Stephen Hemminger wrote:

On Tue, 26 Apr 2022 08:19:59 -0400
Don Wallwork  wrote:


Add support for using hugepages for worker lcore stack memory.  The
intent is to improve performance by reducing stack memory related TLB
misses and also by using memory local to the NUMA node of each lcore.

Platforms desiring to make use of this capability must enable the
associated option flag and stack size settings in platform config
files.
---
  lib/eal/linux/eal.c | 39 +++
  1 file changed, 39 insertions(+)


Good idea but having a fixed size stack makes writing complex application
more difficult. Plus you lose the safety of guard pages.


Thanks for the quick reply.

The expectation is that use of this optional feature would be limited to 
cases where
the performance gains justify the implications of these tradeoffs. For 
example, a specific
data plane application may be okay with limited stack size and could be 
tested to ensure

stack usage remains within limits.

Also, since this applies only to worker threads, the main thread would 
not be impacted

by this change.




[PATCH 1/6] app/eventdev: simplify signal handling and teardown

2022-04-26 Thread Pavan Nikhilesh
Remove rte_*_dev calls from signal handler callback.
Split ethernet device teardown into Rx and Tx sections, wait for
workers to finish processing after disabling Rx to allow workers
to complete processing currently held packets.

Signed-off-by: Pavan Nikhilesh 
---
 app/test-eventdev/evt_main.c | 58 +---
 app/test-eventdev/evt_test.h |  3 ++
 app/test-eventdev/test_perf_atq.c|  1 +
 app/test-eventdev/test_perf_common.c | 20 +++-
 app/test-eventdev/test_perf_common.h |  4 +-
 app/test-eventdev/test_perf_queue.c  |  1 +
 app/test-eventdev/test_pipeline_atq.c|  1 +
 app/test-eventdev/test_pipeline_common.c | 19 +++-
 app/test-eventdev/test_pipeline_common.h |  5 +-
 app/test-eventdev/test_pipeline_queue.c  |  1 +
 10 files changed, 72 insertions(+), 41 deletions(-)

diff --git a/app/test-eventdev/evt_main.c b/app/test-eventdev/evt_main.c
index a7d6b0c1cf..c5d63061bf 100644
--- a/app/test-eventdev/evt_main.c
+++ b/app/test-eventdev/evt_main.c
@@ -19,11 +19,7 @@ struct evt_test *test;
 static void
 signal_handler(int signum)
 {
-   int i;
-   static uint8_t once;
-
-   if ((signum == SIGINT || signum == SIGTERM) && !once) {
-   once = true;
+   if (signum == SIGINT || signum == SIGTERM) {
printf("\nSignal %d received, preparing to exit...\n",
signum);
 
@@ -31,36 +27,7 @@ signal_handler(int signum)
/* request all lcores to exit from the main loop */
*(int *)test->test_priv = true;
rte_wmb();
-
-   if (test->ops.ethdev_destroy)
-   test->ops.ethdev_destroy(test, &opt);
-
-   if (test->ops.cryptodev_destroy)
-   test->ops.cryptodev_destroy(test, &opt);
-
-   rte_eal_mp_wait_lcore();
-
-   if (test->ops.test_result)
-   test->ops.test_result(test, &opt);
-
-   if (opt.prod_type == EVT_PROD_TYPE_ETH_RX_ADPTR) {
-   RTE_ETH_FOREACH_DEV(i)
-   rte_eth_dev_close(i);
-   }
-
-   if (test->ops.eventdev_destroy)
-   test->ops.eventdev_destroy(test, &opt);
-
-   if (test->ops.mempool_destroy)
-   test->ops.mempool_destroy(test, &opt);
-
-   if (test->ops.test_destroy)
-   test->ops.test_destroy(test, &opt);
}
-
-   /* exit with the expected status */
-   signal(signum, SIG_DFL);
-   kill(getpid(), signum);
}
 }
 
@@ -189,10 +156,29 @@ main(int argc, char **argv)
}
}
 
+   if (test->ops.ethdev_rx_stop)
+   test->ops.ethdev_rx_stop(test, &opt);
+
+   if (test->ops.cryptodev_destroy)
+   test->ops.cryptodev_destroy(test, &opt);
+
rte_eal_mp_wait_lcore();
 
-   /* Print the test result */
-   ret = test->ops.test_result(test, &opt);
+   if (test->ops.test_result)
+   test->ops.test_result(test, &opt);
+
+   if (test->ops.ethdev_destroy)
+   test->ops.ethdev_destroy(test, &opt);
+
+   if (test->ops.eventdev_destroy)
+   test->ops.eventdev_destroy(test, &opt);
+
+   if (test->ops.mempool_destroy)
+   test->ops.mempool_destroy(test, &opt);
+
+   if (test->ops.test_destroy)
+   test->ops.test_destroy(test, &opt);
+
 nocap:
if (ret == EVT_TEST_SUCCESS) {
printf("Result: "CLGRN"%s"CLNRM"\n", "Success");
diff --git a/app/test-eventdev/evt_test.h b/app/test-eventdev/evt_test.h
index 50fa474ec2..1049f99ddc 100644
--- a/app/test-eventdev/evt_test.h
+++ b/app/test-eventdev/evt_test.h
@@ -41,6 +41,8 @@ typedef void (*evt_test_eventdev_destroy_t)
(struct evt_test *test, struct evt_options *opt);
 typedef void (*evt_test_ethdev_destroy_t)
(struct evt_test *test, struct evt_options *opt);
+typedef void (*evt_test_ethdev_rx_stop_t)(struct evt_test *test,
+ struct evt_options *opt);
 typedef void (*evt_test_cryptodev_destroy_t)
(struct evt_test *test, struct evt_options *opt);
 typedef void (*evt_test_mempool_destroy_t)
@@ -60,6 +62,7 @@ struct evt_test_ops {
evt_test_launch_lcores_t launch_lcores;
evt_test_result_t test_result;
evt_test_eventdev_destroy_t eventdev_destroy;
+   evt_test_ethdev_rx_stop_t ethdev_rx_stop;
evt_test_ethdev_destroy_t ethdev_destroy;
evt_test_cryptodev_destroy_t cryptodev_destroy;
evt_test_mempool_destroy_t mempool_destroy;
diff --git a/app/test-eventdev/test_perf_atq.c 
b/app/test-eventdev/test_perf_atq.c
index 67ff681666

[PATCH 2/6] app/eventdev: clean up worker state before exit

2022-04-26 Thread Pavan Nikhilesh
Event ports are configured to implicitly release the scheduler contexts
currently held in the next call to rte_event_dequeue_burst().
A worker core might still hold a scheduling context during exit, as the
next call to rte_event_dequeue_burst() is never made.
This might lead to deadlock based on the worker exit timing and when
there are very less number of flows.

Add clean up function to release any scheduling contexts held by the
worker by using RTE_EVENT_OP_RELEASE.

Signed-off-by: Pavan Nikhilesh 
---
 app/test-eventdev/test_perf_atq.c|  31 +++--
 app/test-eventdev/test_perf_common.c |  17 +++
 app/test-eventdev/test_perf_common.h |   3 +
 app/test-eventdev/test_perf_queue.c  |  30 +++--
 app/test-eventdev/test_pipeline_atq.c| 134 -
 app/test-eventdev/test_pipeline_common.c |  39 ++
 app/test-eventdev/test_pipeline_common.h |  59 ++---
 app/test-eventdev/test_pipeline_queue.c  | 145 ++-
 8 files changed, 304 insertions(+), 154 deletions(-)

diff --git a/app/test-eventdev/test_perf_atq.c 
b/app/test-eventdev/test_perf_atq.c
index bac3ea602f..5a0b190384 100644
--- a/app/test-eventdev/test_perf_atq.c
+++ b/app/test-eventdev/test_perf_atq.c
@@ -37,13 +37,14 @@ atq_fwd_event(struct rte_event *const ev, uint8_t *const 
sched_type_list,
 static int
 perf_atq_worker(void *arg, const int enable_fwd_latency)
 {
-   PERF_WORKER_INIT;
+   uint16_t enq = 0, deq = 0;
struct rte_event ev;
+   PERF_WORKER_INIT;
 
while (t->done == false) {
-   uint16_t event = rte_event_dequeue_burst(dev, port, &ev, 1, 0);
+   deq = rte_event_dequeue_burst(dev, port, &ev, 1, 0);
 
-   if (!event) {
+   if (!deq) {
rte_pause();
continue;
}
@@ -78,24 +79,29 @@ perf_atq_worker(void *arg, const int enable_fwd_latency)
 bufs, sz, cnt);
} else {
atq_fwd_event(&ev, sched_type_list, nb_stages);
-   while (rte_event_enqueue_burst(dev, port, &ev, 1) != 1)
-   rte_pause();
+   do {
+   enq = rte_event_enqueue_burst(dev, port, &ev,
+ 1);
+   } while (!enq && !t->done);
}
}
+
+   perf_worker_cleanup(pool, dev, port, &ev, enq, deq);
+
return 0;
 }
 
 static int
 perf_atq_worker_burst(void *arg, const int enable_fwd_latency)
 {
-   PERF_WORKER_INIT;
-   uint16_t i;
/* +1 to avoid prefetch out of array check */
struct rte_event ev[BURST_SIZE + 1];
+   uint16_t enq = 0, nb_rx = 0;
+   PERF_WORKER_INIT;
+   uint16_t i;
 
while (t->done == false) {
-   uint16_t const nb_rx = rte_event_dequeue_burst(dev, port, ev,
-   BURST_SIZE, 0);
+   nb_rx = rte_event_dequeue_burst(dev, port, ev, BURST_SIZE, 0);
 
if (!nb_rx) {
rte_pause();
@@ -146,14 +152,15 @@ perf_atq_worker_burst(void *arg, const int 
enable_fwd_latency)
}
}
 
-   uint16_t enq;
-
enq = rte_event_enqueue_burst(dev, port, ev, nb_rx);
-   while (enq < nb_rx) {
+   while ((enq < nb_rx) && !t->done) {
enq += rte_event_enqueue_burst(dev, port,
ev + enq, nb_rx - enq);
}
}
+
+   perf_worker_cleanup(pool, dev, port, ev, enq, nb_rx);
+
return 0;
 }
 
diff --git a/app/test-eventdev/test_perf_common.c 
b/app/test-eventdev/test_perf_common.c
index e93b0e7272..f673a9fddd 100644
--- a/app/test-eventdev/test_perf_common.c
+++ b/app/test-eventdev/test_perf_common.c
@@ -985,6 +985,23 @@ perf_opt_dump(struct evt_options *opt, uint8_t nb_queues)
evt_dump("prod_enq_burst_sz", "%d", opt->prod_enq_burst_sz);
 }
 
+void
+perf_worker_cleanup(struct rte_mempool *const pool, uint8_t dev_id,
+   uint8_t port_id, struct rte_event events[], uint16_t nb_enq,
+   uint16_t nb_deq)
+{
+   int i;
+
+   if (nb_deq) {
+   for (i = nb_enq; i < nb_deq; i++)
+   rte_mempool_put(pool, events[i].event_ptr);
+
+   for (i = 0; i < nb_deq; i++)
+   events[i].op = RTE_EVENT_OP_RELEASE;
+   rte_event_enqueue_burst(dev_id, port_id, events, nb_deq);
+   }
+}
+
 void
 perf_eventdev_destroy(struct evt_test *test, struct evt_options *opt)
 {
diff --git a/app/test-eventdev/test_perf_common.h 
b/app/test-eventdev/test_perf_common.h
index e504bb1df9..f6bfc73be0 100644
--- a/app/test-eventdev/test_perf_common.h
+++ b/app/test-eventdev/test_perf_common.h
@@ -184,5 +184,8 @@ void perf_cryptodev_d

[PATCH 3/6] examples/eventdev: clean up worker state before exit

2022-04-26 Thread Pavan Nikhilesh
Event ports are configured to implicitly release the scheduler contexts
currently held in the next call to rte_event_dequeue_burst().
A worker core might still hold a scheduling context during exit, as the
next call to rte_event_dequeue_burst() is never made.
This might lead to deadlock based on the worker exit timing and when
there are very less number of flows.

Add clean up function to release any scheduling contexts held by the
worker by using RTE_EVENT_OP_RELEASE.

Signed-off-by: Pavan Nikhilesh 
---
 examples/eventdev_pipeline/pipeline_common.h  | 22 ++
 .../pipeline_worker_generic.c | 23 +++---
 .../eventdev_pipeline/pipeline_worker_tx.c| 79 ---
 3 files changed, 87 insertions(+), 37 deletions(-)

diff --git a/examples/eventdev_pipeline/pipeline_common.h 
b/examples/eventdev_pipeline/pipeline_common.h
index b12eb281e1..9899b257b0 100644
--- a/examples/eventdev_pipeline/pipeline_common.h
+++ b/examples/eventdev_pipeline/pipeline_common.h
@@ -140,5 +140,27 @@ schedule_devices(unsigned int lcore_id)
}
 }
 
+static inline void
+worker_cleanup(uint8_t dev_id, uint8_t port_id, struct rte_event events[],
+  uint16_t nb_enq, uint16_t nb_deq)
+{
+   int i;
+
+   if (!(nb_deq - nb_enq))
+   return;
+
+   if (nb_deq) {
+   for (i = nb_enq; i < nb_deq; i++) {
+   if (events[i].op == RTE_EVENT_OP_RELEASE)
+   continue;
+   rte_pktmbuf_free(events[i].mbuf);
+   }
+
+   for (i = 0; i < nb_deq; i++)
+   events[i].op = RTE_EVENT_OP_RELEASE;
+   rte_event_enqueue_burst(dev_id, port_id, events, nb_deq);
+   }
+}
+
 void set_worker_generic_setup_data(struct setup_data *caps, bool burst);
 void set_worker_tx_enq_setup_data(struct setup_data *caps, bool burst);
diff --git a/examples/eventdev_pipeline/pipeline_worker_generic.c 
b/examples/eventdev_pipeline/pipeline_worker_generic.c
index ce1e92d59e..c564c808e2 100644
--- a/examples/eventdev_pipeline/pipeline_worker_generic.c
+++ b/examples/eventdev_pipeline/pipeline_worker_generic.c
@@ -16,6 +16,7 @@ worker_generic(void *arg)
uint8_t port_id = data->port_id;
size_t sent = 0, received = 0;
unsigned int lcore_id = rte_lcore_id();
+   uint16_t nb_rx = 0, nb_tx = 0;
 
while (!fdata->done) {
 
@@ -27,8 +28,7 @@ worker_generic(void *arg)
continue;
}
 
-   const uint16_t nb_rx = rte_event_dequeue_burst(dev_id, port_id,
-   &ev, 1, 0);
+   nb_rx = rte_event_dequeue_burst(dev_id, port_id, &ev, 1, 0);
 
if (nb_rx == 0) {
rte_pause();
@@ -47,11 +47,14 @@ worker_generic(void *arg)
 
work();
 
-   while (rte_event_enqueue_burst(dev_id, port_id, &ev, 1) != 1)
-   rte_pause();
+   do {
+   nb_tx = rte_event_enqueue_burst(dev_id, port_id, &ev,
+   1);
+   } while (!nb_tx && !fdata->done);
sent++;
}
 
+   worker_cleanup(dev_id, port_id, &ev, nb_tx, nb_rx);
if (!cdata.quiet)
printf("  worker %u thread done. RX=%zu TX=%zu\n",
rte_lcore_id(), received, sent);
@@ -69,10 +72,9 @@ worker_generic_burst(void *arg)
uint8_t port_id = data->port_id;
size_t sent = 0, received = 0;
unsigned int lcore_id = rte_lcore_id();
+   uint16_t i, nb_rx = 0, nb_tx = 0;
 
while (!fdata->done) {
-   uint16_t i;
-
if (fdata->cap.scheduler)
fdata->cap.scheduler(lcore_id);
 
@@ -81,8 +83,8 @@ worker_generic_burst(void *arg)
continue;
}
 
-   const uint16_t nb_rx = rte_event_dequeue_burst(dev_id, port_id,
-   events, RTE_DIM(events), 0);
+   nb_rx = rte_event_dequeue_burst(dev_id, port_id, events,
+   RTE_DIM(events), 0);
 
if (nb_rx == 0) {
rte_pause();
@@ -103,8 +105,7 @@ worker_generic_burst(void *arg)
 
work();
}
-   uint16_t nb_tx = rte_event_enqueue_burst(dev_id, port_id,
-   events, nb_rx);
+   nb_tx = rte_event_enqueue_burst(dev_id, port_id, events, nb_rx);
while (nb_tx < nb_rx && !fdata->done)
nb_tx += rte_event_enqueue_burst(dev_id, port_id,
events + nb_tx,
@@ -112,6 +113,8 @@ worker_generic_burst(void *arg)
sent += nb_tx;
}
 
+   worker_cleanup(dev_id, port_id, events, nb_tx, nb_rx);
+
if (!cdata.quiet)
p

[PATCH 4/6] examples/l3fwd: clean up worker state before exit

2022-04-26 Thread Pavan Nikhilesh
Event ports are configured to implicitly release the scheduler contexts
currently held in the next call to rte_event_dequeue_burst().
A worker core might still hold a scheduling context during exit, as the
next call to rte_event_dequeue_burst() is never made.
This might lead to deadlock based on the worker exit timing and when
there are very less number of flows.

Add clean up function to release any scheduling contexts held by the
worker by using RTE_EVENT_OP_RELEASE.

Signed-off-by: Pavan Nikhilesh 
---
 examples/l3fwd/l3fwd_em.c| 32 ++--
 examples/l3fwd/l3fwd_event.c | 34 ++
 examples/l3fwd/l3fwd_event.h |  5 +
 examples/l3fwd/l3fwd_fib.c   | 10 --
 examples/l3fwd/l3fwd_lpm.c   | 32 ++--
 5 files changed, 91 insertions(+), 22 deletions(-)

diff --git a/examples/l3fwd/l3fwd_em.c b/examples/l3fwd/l3fwd_em.c
index 24d0910fe0..6f8d94f120 100644
--- a/examples/l3fwd/l3fwd_em.c
+++ b/examples/l3fwd/l3fwd_em.c
@@ -653,6 +653,7 @@ em_event_loop_single(struct l3fwd_event_resources *evt_rsrc,
const uint8_t tx_q_id = evt_rsrc->evq.event_q_id[
evt_rsrc->evq.nb_queues - 1];
const uint8_t event_d_id = evt_rsrc->event_d_id;
+   uint8_t deq = 0, enq = 0;
struct lcore_conf *lconf;
unsigned int lcore_id;
struct rte_event ev;
@@ -665,7 +666,9 @@ em_event_loop_single(struct l3fwd_event_resources *evt_rsrc,
 
RTE_LOG(INFO, L3FWD, "entering %s on lcore %u\n", __func__, lcore_id);
while (!force_quit) {
-   if (!rte_event_dequeue_burst(event_d_id, event_p_id, &ev, 1, 0))
+   deq = rte_event_dequeue_burst(event_d_id, event_p_id, &ev, 1,
+ 0);
+   if (!deq)
continue;
 
struct rte_mbuf *mbuf = ev.mbuf;
@@ -684,19 +687,22 @@ em_event_loop_single(struct l3fwd_event_resources 
*evt_rsrc,
if (flags & L3FWD_EVENT_TX_ENQ) {
ev.queue_id = tx_q_id;
ev.op = RTE_EVENT_OP_FORWARD;
-   while (rte_event_enqueue_burst(event_d_id, event_p_id,
-   &ev, 1) && !force_quit)
-   ;
+   do {
+   enq = rte_event_enqueue_burst(
+   event_d_id, event_p_id, &ev, 1);
+   } while (!enq && !force_quit);
}
 
if (flags & L3FWD_EVENT_TX_DIRECT) {
rte_event_eth_tx_adapter_txq_set(mbuf, 0);
-   while (!rte_event_eth_tx_adapter_enqueue(event_d_id,
-   event_p_id, &ev, 1, 0) &&
-   !force_quit)
-   ;
+   do {
+   enq = rte_event_eth_tx_adapter_enqueue(
+   event_d_id, event_p_id, &ev, 1, 0);
+   } while (!enq && !force_quit);
}
}
+
+   l3fwd_event_worker_cleanup(event_d_id, event_p_id, &ev, enq, deq, 0);
 }
 
 static __rte_always_inline void
@@ -709,9 +715,9 @@ em_event_loop_burst(struct l3fwd_event_resources *evt_rsrc,
const uint8_t event_d_id = evt_rsrc->event_d_id;
const uint16_t deq_len = evt_rsrc->deq_depth;
struct rte_event events[MAX_PKT_BURST];
+   int i, nb_enq = 0, nb_deq = 0;
struct lcore_conf *lconf;
unsigned int lcore_id;
-   int i, nb_enq, nb_deq;
 
if (event_p_id < 0)
return;
@@ -769,6 +775,9 @@ em_event_loop_burst(struct l3fwd_event_resources *evt_rsrc,
nb_deq - nb_enq, 0);
}
}
+
+   l3fwd_event_worker_cleanup(event_d_id, event_p_id, events, nb_enq,
+  nb_deq, 0);
 }
 
 static __rte_always_inline void
@@ -832,9 +841,9 @@ em_event_loop_vector(struct l3fwd_event_resources *evt_rsrc,
const uint8_t event_d_id = evt_rsrc->event_d_id;
const uint16_t deq_len = evt_rsrc->deq_depth;
struct rte_event events[MAX_PKT_BURST];
+   int i, nb_enq = 0, nb_deq = 0;
struct lcore_conf *lconf;
unsigned int lcore_id;
-   int i, nb_enq, nb_deq;
 
if (event_p_id < 0)
return;
@@ -887,6 +896,9 @@ em_event_loop_vector(struct l3fwd_event_resources *evt_rsrc,
nb_deq - nb_enq, 0);
}
}
+
+   l3fwd_event_worker_cleanup(event_d_id, event_p_id, events, nb_enq,
+  nb_deq, 1);
 }
 
 int __rte_noinline
diff --git a/examples/l3fwd/l3fwd_event.c b/examples/l3fwd/l3fwd_event.c
index 7a401290f8..a14a21b414 100644
--- a/examples/l3fwd/l3fwd_event.c
+++ b/examples/l3fwd/l3fwd_event.c
@@ 

[PATCH 5/6] examples/l2fwd-event: clean up worker state before exit

2022-04-26 Thread Pavan Nikhilesh
Event ports are configured to implicitly release the scheduler contexts
currently held in the next call to rte_event_dequeue_burst().
A worker core might still hold a scheduling context during exit, as the
next call to rte_event_dequeue_burst() is never made.
This might lead to deadlock based on the worker exit timing and when
there are very less number of flows.

Add clean up function to release any scheduling contexts held by the
worker by using RTE_EVENT_OP_RELEASE.

Signed-off-by: Pavan Nikhilesh 
---
 examples/l2fwd-event/l2fwd_common.c | 34 +
 examples/l2fwd-event/l2fwd_common.h |  3 +++
 examples/l2fwd-event/l2fwd_event.c  | 31 --
 3 files changed, 56 insertions(+), 12 deletions(-)

diff --git a/examples/l2fwd-event/l2fwd_common.c 
b/examples/l2fwd-event/l2fwd_common.c
index cf3d1b8aaf..15bfe790a0 100644
--- a/examples/l2fwd-event/l2fwd_common.c
+++ b/examples/l2fwd-event/l2fwd_common.c
@@ -114,3 +114,37 @@ l2fwd_event_init_ports(struct l2fwd_resources *rsrc)
 
return nb_ports_available;
 }
+
+static void
+l2fwd_event_vector_array_free(struct rte_event events[], uint16_t num)
+{
+   uint16_t i;
+
+   for (i = 0; i < num; i++) {
+   rte_pktmbuf_free_bulk(events[i].vec->mbufs,
+ events[i].vec->nb_elem);
+   rte_mempool_put(rte_mempool_from_obj(events[i].vec),
+   events[i].vec);
+   }
+}
+
+void
+l2fwd_event_worker_cleanup(uint8_t event_d_id, uint8_t port_id,
+  struct rte_event events[], uint16_t nb_enq,
+  uint16_t nb_deq, uint8_t is_vector)
+{
+   int i;
+
+   if (nb_deq) {
+   if (is_vector)
+   l2fwd_event_vector_array_free(events + nb_enq,
+ nb_deq - nb_enq);
+   else
+   for (i = nb_enq; i < nb_deq; i++)
+   rte_pktmbuf_free(events[i].mbuf);
+
+   for (i = 0; i < nb_deq; i++)
+   events[i].op = RTE_EVENT_OP_RELEASE;
+   rte_event_enqueue_burst(event_d_id, port_id, events, nb_deq);
+   }
+}
diff --git a/examples/l2fwd-event/l2fwd_common.h 
b/examples/l2fwd-event/l2fwd_common.h
index 396e238c6a..bff3b65abf 100644
--- a/examples/l2fwd-event/l2fwd_common.h
+++ b/examples/l2fwd-event/l2fwd_common.h
@@ -140,5 +140,8 @@ l2fwd_get_rsrc(void)
 }
 
 int l2fwd_event_init_ports(struct l2fwd_resources *rsrc);
+void l2fwd_event_worker_cleanup(uint8_t event_d_id, uint8_t port_id,
+   struct rte_event events[], uint16_t nb_enq,
+   uint16_t nb_deq, uint8_t is_vector);
 
 #endif /* __L2FWD_COMMON_H__ */
diff --git a/examples/l2fwd-event/l2fwd_event.c 
b/examples/l2fwd-event/l2fwd_event.c
index 6df3cdfeab..63450537fe 100644
--- a/examples/l2fwd-event/l2fwd_event.c
+++ b/examples/l2fwd-event/l2fwd_event.c
@@ -193,6 +193,7 @@ l2fwd_event_loop_single(struct l2fwd_resources *rsrc,
evt_rsrc->evq.nb_queues - 1];
const uint64_t timer_period = rsrc->timer_period;
const uint8_t event_d_id = evt_rsrc->event_d_id;
+   uint8_t enq = 0, deq = 0;
struct rte_event ev;
 
if (port_id < 0)
@@ -203,26 +204,28 @@ l2fwd_event_loop_single(struct l2fwd_resources *rsrc,
 
while (!rsrc->force_quit) {
/* Read packet from eventdev */
-   if (!rte_event_dequeue_burst(event_d_id, port_id, &ev, 1, 0))
+   deq = rte_event_dequeue_burst(event_d_id, port_id, &ev, 1, 0);
+   if (!deq)
continue;
 
l2fwd_event_fwd(rsrc, &ev, tx_q_id, timer_period, flags);
 
if (flags & L2FWD_EVENT_TX_ENQ) {
-   while (rte_event_enqueue_burst(event_d_id, port_id,
-  &ev, 1) &&
-   !rsrc->force_quit)
-   ;
+   do {
+   enq = rte_event_enqueue_burst(event_d_id,
+ port_id, &ev, 1);
+   } while (!enq && !rsrc->force_quit);
}
 
if (flags & L2FWD_EVENT_TX_DIRECT) {
-   while (!rte_event_eth_tx_adapter_enqueue(event_d_id,
-   port_id,
-   &ev, 1, 0) &&
-   !rsrc->force_quit)
-   ;
+   do {
+   enq = rte_event_eth_tx_adapter_enqueue(
+   event_d_id, port_id, &ev, 1, 0);
+   } while (!enq && !rsrc->force_quit);
}
}

[PATCH 6/6] examples/ipsec-secgw: cleanup worker state before exit

2022-04-26 Thread Pavan Nikhilesh
Event ports are configured to implicitly release the scheduler contexts
currently held in the next call to rte_event_dequeue_burst().
A worker core might still hold a scheduling context during exit as the
next call to rte_event_dequeue_burst() is never made.
This might lead to deadlock based on the worker exit timing and when
there are very less number of flows.

Add a cleanup function to release any scheduling contexts held by the
worker by using RTE_EVENT_OP_RELEASE.

Signed-off-by: Pavan Nikhilesh 
---
 examples/ipsec-secgw/ipsec_worker.c | 40 -
 1 file changed, 28 insertions(+), 12 deletions(-)

diff --git a/examples/ipsec-secgw/ipsec_worker.c 
b/examples/ipsec-secgw/ipsec_worker.c
index 8639426c5c..3df5acf384 100644
--- a/examples/ipsec-secgw/ipsec_worker.c
+++ b/examples/ipsec-secgw/ipsec_worker.c
@@ -749,7 +749,7 @@ ipsec_wrkr_non_burst_int_port_drv_mode(struct 
eh_event_link_info *links,
uint8_t nb_links)
 {
struct port_drv_mode_data data[RTE_MAX_ETHPORTS];
-   unsigned int nb_rx = 0;
+   unsigned int nb_rx = 0, nb_tx;
struct rte_mbuf *pkt;
struct rte_event ev;
uint32_t lcore_id;
@@ -847,11 +847,19 @@ ipsec_wrkr_non_burst_int_port_drv_mode(struct 
eh_event_link_info *links,
 * directly enqueued to the adapter and it would be
 * internally submitted to the eth device.
 */
-   rte_event_eth_tx_adapter_enqueue(links[0].eventdev_id,
-   links[0].event_port_id,
-   &ev,/* events */
-   1,  /* nb_events */
-   0   /* flags */);
+   nb_tx = rte_event_eth_tx_adapter_enqueue(links[0].eventdev_id,
+links[0].event_port_id,
+&ev, /* events */
+1,   /* nb_events */
+0 /* flags */);
+   if (!nb_tx)
+   rte_pktmbuf_free(ev.mbuf);
+   }
+
+   if (ev.u64) {
+   ev.op = RTE_EVENT_OP_RELEASE;
+   rte_event_enqueue_burst(links[0].eventdev_id,
+   links[0].event_port_id, &ev, 1);
}
 }
 
@@ -864,7 +872,7 @@ ipsec_wrkr_non_burst_int_port_app_mode(struct 
eh_event_link_info *links,
uint8_t nb_links)
 {
struct lcore_conf_ev_tx_int_port_wrkr lconf;
-   unsigned int nb_rx = 0;
+   unsigned int nb_rx = 0, nb_tx;
struct rte_event ev;
uint32_t lcore_id;
int32_t socket_id;
@@ -952,11 +960,19 @@ ipsec_wrkr_non_burst_int_port_app_mode(struct 
eh_event_link_info *links,
 * directly enqueued to the adapter and it would be
 * internally submitted to the eth device.
 */
-   rte_event_eth_tx_adapter_enqueue(links[0].eventdev_id,
-   links[0].event_port_id,
-   &ev,/* events */
-   1,  /* nb_events */
-   0   /* flags */);
+   nb_tx = rte_event_eth_tx_adapter_enqueue(links[0].eventdev_id,
+links[0].event_port_id,
+&ev, /* events */
+1,   /* nb_events */
+0 /* flags */);
+   if (!nb_tx)
+   rte_pktmbuf_free(ev.mbuf);
+   }
+
+   if (ev.u64) {
+   ev.op = RTE_EVENT_OP_RELEASE;
+   rte_event_enqueue_burst(links[0].eventdev_id,
+   links[0].event_port_id, &ev, 1);
}
 }
 
-- 
2.25.1



Re: [RFC] eal: allow worker lcore stacks to be allocated from hugepage memory

2022-04-26 Thread Stephen Hemminger
On Tue, 26 Apr 2022 17:01:18 -0400
Don Wallwork  wrote:

> On 4/26/2022 10:58 AM, Stephen Hemminger wrote:
> > On Tue, 26 Apr 2022 08:19:59 -0400
> > Don Wallwork  wrote:
> >  
> >> Add support for using hugepages for worker lcore stack memory.  The
> >> intent is to improve performance by reducing stack memory related TLB
> >> misses and also by using memory local to the NUMA node of each lcore.
> >>
> >> Platforms desiring to make use of this capability must enable the
> >> associated option flag and stack size settings in platform config
> >> files.
> >> ---
> >>   lib/eal/linux/eal.c | 39 +++
> >>   1 file changed, 39 insertions(+)
> >>  
> > Good idea but having a fixed size stack makes writing complex application
> > more difficult. Plus you lose the safety of guard pages.  
> 
> Thanks for the quick reply.
> 
> The expectation is that use of this optional feature would be limited to 
> cases where
> the performance gains justify the implications of these tradeoffs. For 
> example, a specific
> data plane application may be okay with limited stack size and could be 
> tested to ensure
> stack usage remains within limits.
> 
> Also, since this applies only to worker threads, the main thread would 
> not be impacted
> by this change.
> 
> 

I would prefer it as a runtime, not compile time option.
That way distributions could ship DPDK and application could opt in if it 
wanted.


Re: [RFC] eal: allow worker lcore stacks to be allocated from hugepage memory

2022-04-26 Thread Don Wallwork




On 4/26/2022 5:21 PM, Stephen Hemminger wrote:

On Tue, 26 Apr 2022 17:01:18 -0400
Don Wallwork  wrote:


On 4/26/2022 10:58 AM, Stephen Hemminger wrote:

On Tue, 26 Apr 2022 08:19:59 -0400
Don Wallwork  wrote:
  

Add support for using hugepages for worker lcore stack memory.  The
intent is to improve performance by reducing stack memory related TLB
misses and also by using memory local to the NUMA node of each lcore.

Platforms desiring to make use of this capability must enable the
associated option flag and stack size settings in platform config
files.
---
   lib/eal/linux/eal.c | 39 +++
   1 file changed, 39 insertions(+)
  

Good idea but having a fixed size stack makes writing complex application
more difficult. Plus you lose the safety of guard pages.

Thanks for the quick reply.

The expectation is that use of this optional feature would be limited to
cases where
the performance gains justify the implications of these tradeoffs. For
example, a specific
data plane application may be okay with limited stack size and could be
tested to ensure
stack usage remains within limits.

Also, since this applies only to worker threads, the main thread would
not be impacted
by this change.



I would prefer it as a runtime, not compile time option.
That way distributions could ship DPDK and application could opt in if it 
wanted.

Good point..  I'll work on a v2 and will post that when it's ready.


[PATCH 1/2] event/cnxk: add additional checks in OP_RELEASE

2022-04-26 Thread Pavan Nikhilesh
Add additional checks while performing RTE_EVENT_OP_RELEASE to
ensure that there are no pending SWTAGs and FLUSHEs in flight.

Signed-off-by: Pavan Nikhilesh 
---
 drivers/event/cnxk/cn10k_eventdev.c |  4 +---
 drivers/event/cnxk/cn10k_worker.c   |  8 ++--
 drivers/event/cnxk/cn9k_eventdev.c  |  4 +---
 drivers/event/cnxk/cn9k_worker.c| 16 
 drivers/event/cnxk/cn9k_worker.h|  3 +--
 drivers/event/cnxk/cnxk_worker.h| 17 ++---
 6 files changed, 35 insertions(+), 17 deletions(-)

diff --git a/drivers/event/cnxk/cn10k_eventdev.c 
b/drivers/event/cnxk/cn10k_eventdev.c
index 9b4d2895ec..2fa2cd31c2 100644
--- a/drivers/event/cnxk/cn10k_eventdev.c
+++ b/drivers/event/cnxk/cn10k_eventdev.c
@@ -137,9 +137,7 @@ cn10k_sso_hws_flush_events(void *hws, uint8_t queue_id, 
uintptr_t base,
if (fn != NULL && ev.u64 != 0)
fn(arg, ev);
if (ev.sched_type != SSO_TT_EMPTY)
-   cnxk_sso_hws_swtag_flush(
-   ws->base + SSOW_LF_GWS_WQE0,
-   ws->base + SSOW_LF_GWS_OP_SWTAG_FLUSH);
+   cnxk_sso_hws_swtag_flush(ws->base);
do {
val = plt_read64(ws->base + SSOW_LF_GWS_PENDSTATE);
} while (val & BIT_ULL(56));
diff --git a/drivers/event/cnxk/cn10k_worker.c 
b/drivers/event/cnxk/cn10k_worker.c
index 975a22336a..0d99b4c5e5 100644
--- a/drivers/event/cnxk/cn10k_worker.c
+++ b/drivers/event/cnxk/cn10k_worker.c
@@ -18,8 +18,12 @@ cn10k_sso_hws_enq(void *port, const struct rte_event *ev)
cn10k_sso_hws_forward_event(ws, ev);
break;
case RTE_EVENT_OP_RELEASE:
-   cnxk_sso_hws_swtag_flush(ws->base + SSOW_LF_GWS_WQE0,
-ws->base + SSOW_LF_GWS_OP_SWTAG_FLUSH);
+   if (ws->swtag_req) {
+   cnxk_sso_hws_desched(ev->u64, ws->base);
+   ws->swtag_req = 0;
+   break;
+   }
+   cnxk_sso_hws_swtag_flush(ws->base);
break;
default:
return 0;
diff --git a/drivers/event/cnxk/cn9k_eventdev.c 
b/drivers/event/cnxk/cn9k_eventdev.c
index 4bba477dd1..41bbe3cb22 100644
--- a/drivers/event/cnxk/cn9k_eventdev.c
+++ b/drivers/event/cnxk/cn9k_eventdev.c
@@ -156,9 +156,7 @@ cn9k_sso_hws_flush_events(void *hws, uint8_t queue_id, 
uintptr_t base,
if (fn != NULL && ev.u64 != 0)
fn(arg, ev);
if (ev.sched_type != SSO_TT_EMPTY)
-   cnxk_sso_hws_swtag_flush(
-   ws_base + SSOW_LF_GWS_TAG,
-   ws_base + SSOW_LF_GWS_OP_SWTAG_FLUSH);
+   cnxk_sso_hws_swtag_flush(ws_base);
do {
val = plt_read64(ws_base + SSOW_LF_GWS_PENDSTATE);
} while (val & BIT_ULL(56));
diff --git a/drivers/event/cnxk/cn9k_worker.c b/drivers/event/cnxk/cn9k_worker.c
index a981bc986f..41dbe6cafb 100644
--- a/drivers/event/cnxk/cn9k_worker.c
+++ b/drivers/event/cnxk/cn9k_worker.c
@@ -19,8 +19,12 @@ cn9k_sso_hws_enq(void *port, const struct rte_event *ev)
cn9k_sso_hws_forward_event(ws, ev);
break;
case RTE_EVENT_OP_RELEASE:
-   cnxk_sso_hws_swtag_flush(ws->base + SSOW_LF_GWS_TAG,
-ws->base + SSOW_LF_GWS_OP_SWTAG_FLUSH);
+   if (ws->swtag_req) {
+   cnxk_sso_hws_desched(ev->u64, ws->base);
+   ws->swtag_req = 0;
+   break;
+   }
+   cnxk_sso_hws_swtag_flush(ws->base);
break;
default:
return 0;
@@ -78,8 +82,12 @@ cn9k_sso_hws_dual_enq(void *port, const struct rte_event *ev)
cn9k_sso_hws_dual_forward_event(dws, base, ev);
break;
case RTE_EVENT_OP_RELEASE:
-   cnxk_sso_hws_swtag_flush(base + SSOW_LF_GWS_TAG,
-base + SSOW_LF_GWS_OP_SWTAG_FLUSH);
+   if (dws->swtag_req) {
+   cnxk_sso_hws_desched(ev->u64, base);
+   dws->swtag_req = 0;
+   break;
+   }
+   cnxk_sso_hws_swtag_flush(base);
break;
default:
return 0;
diff --git a/drivers/event/cnxk/cn9k_worker.h b/drivers/event/cnxk/cn9k_worker.h
index 917d1e0b40..88eb4e9cf9 100644
--- a/drivers/event/cnxk/cn9k_worker.h
+++ b/drivers/event/cnxk/cn9k_worker.h
@@ -841,8 +841,7 @@ cn9k_sso_hws_event_tx(uint64_t base, struct rte_event *ev, 
uint64_t *cmd,
return 1;
}
 
-   cnxk_sso_hws_swtag_flush(base + SSOW_LF_GWS_TAG,
-base + SSOW_LF_GWS_OP_SWTAG_FLUSH);
+   cnxk_sso_hws_swtag_flush(b

[PATCH 2/2] event/cnxk: move post-processing to separate function

2022-04-26 Thread Pavan Nikhilesh
Move event post-processing to a separate function.
Do complete event post-processing in tear-down functions to prevent
incorrect memory free.

Signed-off-by: Pavan Nikhilesh 
---
 drivers/event/cnxk/cn10k_eventdev.c |   5 +-
 drivers/event/cnxk/cn10k_worker.h   | 190 +---
 drivers/event/cnxk/cn9k_eventdev.c  |   9 +-
 drivers/event/cnxk/cn9k_worker.h| 114 ++---
 4 files changed, 138 insertions(+), 180 deletions(-)

diff --git a/drivers/event/cnxk/cn10k_eventdev.c 
b/drivers/event/cnxk/cn10k_eventdev.c
index 2fa2cd31c2..94829e789c 100644
--- a/drivers/event/cnxk/cn10k_eventdev.c
+++ b/drivers/event/cnxk/cn10k_eventdev.c
@@ -133,7 +133,10 @@ cn10k_sso_hws_flush_events(void *hws, uint8_t queue_id, 
uintptr_t base,
 
while (aq_cnt || cq_ds_cnt || ds_cnt) {
plt_write64(req, ws->base + SSOW_LF_GWS_OP_GET_WORK0);
-   cn10k_sso_hws_get_work_empty(ws, &ev);
+   cn10k_sso_hws_get_work_empty(
+   ws, &ev,
+   (NIX_RX_OFFLOAD_MAX - 1) | NIX_RX_REAS_F |
+   NIX_RX_MULTI_SEG_F | CPT_RX_WQE_F);
if (fn != NULL && ev.u64 != 0)
fn(arg, ev);
if (ev.sched_type != SSO_TT_EMPTY)
diff --git a/drivers/event/cnxk/cn10k_worker.h 
b/drivers/event/cnxk/cn10k_worker.h
index c96048f47d..03bae4bd53 100644
--- a/drivers/event/cnxk/cn10k_worker.h
+++ b/drivers/event/cnxk/cn10k_worker.h
@@ -196,15 +196,88 @@ cn10k_process_vwqe(uintptr_t vwqe, uint16_t port_id, 
const uint32_t flags,
}
 }
 
+static __rte_always_inline void
+cn10k_sso_hws_post_process(struct cn10k_sso_hws *ws, uint64_t *u64,
+  const uint32_t flags)
+{
+   uint64_t tstamp_ptr;
+
+   u64[0] = (u64[0] & (0x3ull << 32)) << 6 |
+(u64[0] & (0x3FFull << 36)) << 4 | (u64[0] & 0x);
+   if ((flags & CPT_RX_WQE_F) &&
+   (CNXK_EVENT_TYPE_FROM_TAG(u64[0]) == RTE_EVENT_TYPE_CRYPTODEV)) {
+   u64[1] = cn10k_cpt_crypto_adapter_dequeue(u64[1]);
+   } else if (CNXK_EVENT_TYPE_FROM_TAG(u64[0]) == RTE_EVENT_TYPE_ETHDEV) {
+   uint8_t port = CNXK_SUB_EVENT_FROM_TAG(u64[0]);
+   uint64_t mbuf;
+
+   mbuf = u64[1] - sizeof(struct rte_mbuf);
+   rte_prefetch0((void *)mbuf);
+   if (flags & NIX_RX_OFFLOAD_SECURITY_F) {
+   const uint64_t mbuf_init =
+   0x10001ULL | RTE_PKTMBUF_HEADROOM |
+   (flags & NIX_RX_OFFLOAD_TSTAMP_F ? 8 : 0);
+   struct rte_mbuf *m;
+   uintptr_t sa_base;
+   uint64_t iova = 0;
+   uint8_t loff = 0;
+   uint16_t d_off;
+   uint64_t cq_w1;
+   uint64_t cq_w5;
+
+   m = (struct rte_mbuf *)mbuf;
+   d_off = (uintptr_t)(m->buf_addr) - (uintptr_t)m;
+   d_off += RTE_PKTMBUF_HEADROOM;
+
+   cq_w1 = *(uint64_t *)(u64[1] + 8);
+   cq_w5 = *(uint64_t *)(u64[1] + 40);
+
+   sa_base = cnxk_nix_sa_base_get(port, ws->lookup_mem);
+   sa_base &= ~(ROC_NIX_INL_SA_BASE_ALIGN - 1);
+
+   mbuf = (uint64_t)nix_sec_meta_to_mbuf_sc(
+   cq_w1, cq_w5, sa_base, (uintptr_t)&iova, &loff,
+   (struct rte_mbuf *)mbuf, d_off, flags,
+   mbuf_init | ((uint64_t)port) << 48);
+   if (loff)
+   roc_npa_aura_op_free(m->pool->pool_id, 0, iova);
+   }
+
+   u64[0] = CNXK_CLR_SUB_EVENT(u64[0]);
+   cn10k_wqe_to_mbuf(u64[1], mbuf, port, u64[0] & 0xF, flags,
+ ws->lookup_mem);
+   /* Extracting tstamp, if PTP enabled*/
+   tstamp_ptr = *(uint64_t *)(((struct nix_wqe_hdr_s *)u64[1]) +
+  CNXK_SSO_WQE_SG_PTR);
+   cn10k_nix_mbuf_to_tstamp((struct rte_mbuf *)mbuf, ws->tstamp,
+flags & NIX_RX_OFFLOAD_TSTAMP_F,
+(uint64_t *)tstamp_ptr);
+   u64[1] = mbuf;
+   } else if (CNXK_EVENT_TYPE_FROM_TAG(u64[0]) ==
+  RTE_EVENT_TYPE_ETHDEV_VECTOR) {
+   uint8_t port = CNXK_SUB_EVENT_FROM_TAG(u64[0]);
+   __uint128_t vwqe_hdr = *(__uint128_t *)u64[1];
+
+   vwqe_hdr = ((vwqe_hdr >> 64) & 0xFFF) | BIT_ULL(31) |
+  ((vwqe_hdr & 0x) << 48) | ((uint64_t)port << 32);
+   *(uint64_t *)u64[1] = (uint64_t)vwqe_hdr;
+   cn10k_process_vwqe(u64[1], port, flags, ws->lookup_mem,
+  ws->tstamp, ws->lmt_base

Re: [Patch v2] net/netvsc: report correct stats values

2022-04-26 Thread Ferruh Yigit

On 3/24/2022 5:45 PM, lon...@linuxonhyperv.com wrote:

From: Long Li 

The netvsc should add to the values from the VF and report the sum.



Per port stats already accumulated, like:
'stats->opackets += txq->stats.packets;'


Fixes: 4e9c73e96e ("net/netvsc: add Hyper-V network device")
Cc: sta...@dpdk.org
Signed-off-by: Long Li 
---
  drivers/net/netvsc/hn_ethdev.c | 10 +-
  1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/netvsc/hn_ethdev.c b/drivers/net/netvsc/hn_ethdev.c
index 0a357d3645..a6202d898b 100644
--- a/drivers/net/netvsc/hn_ethdev.c
+++ b/drivers/net/netvsc/hn_ethdev.c
@@ -804,8 +804,8 @@ static int hn_dev_stats_get(struct rte_eth_dev *dev,
stats->oerrors += txq->stats.errors;
  
  		if (i < RTE_ETHDEV_QUEUE_STAT_CNTRS) {

-   stats->q_opackets[i] = txq->stats.packets;
-   stats->q_obytes[i] = txq->stats.bytes;
+   stats->q_opackets[i] += txq->stats.packets;
+   stats->q_obytes[i] += txq->stats.bytes;


This is per queue stats, 'stats->q_opackets[i]', in next iteration of 
the loop, 'i' will be increased and 'txq' will be updated, so as far as 
I can see the above change has no affect.



}
}
  
@@ -821,12 +821,12 @@ static int hn_dev_stats_get(struct rte_eth_dev *dev,

stats->imissed += rxq->stats.ring_full;
  
  		if (i < RTE_ETHDEV_QUEUE_STAT_CNTRS) {

-   stats->q_ipackets[i] = rxq->stats.packets;
-   stats->q_ibytes[i] = rxq->stats.bytes;
+   stats->q_ipackets[i] += rxq->stats.packets;
+   stats->q_ibytes[i] += rxq->stats.bytes;
}
}
  
-	stats->rx_nombuf = dev->data->rx_mbuf_alloc_failed;

+   stats->rx_nombuf += dev->data->rx_mbuf_alloc_failed;


Why '+='? Is 'dev->data->rx_mbuf_alloc_failed' reset somewhere between 
two consecutive stats get call?


Anyway, above line has no affect, since the 'stats->rx_nombuf' is 
overwritten by 'rte_eth_stats_get()'. So above line can be removed.



return 0;
  }
  




Re: [Patch v2] net/netvsc: fix the calculation of checksums based on mbuf flag

2022-04-26 Thread Ferruh Yigit

On 3/24/2022 5:46 PM, lon...@linuxonhyperv.com wrote:

From: Long Li 

The netvsc should use RTE_MBUF_F_TX_L4_MASK and check the masked value to
decide the correct way to calculate checksums.

Not checking for RTE_MBUF_F_TX_L4_MASK results in incorrect RNDIS packets
sent to VSP and incorrect checksums calculated by the VSP.

Fixes: 4e9c73e96e ("net/netvsc: add Hyper-V network device")
Cc: sta...@dpdk.org



Signed-off-by: Long Li 


Reviewed-by: Ferruh Yigit 

Moving ack from previous version:
Acked-by: Stephen Hemminger 

Applied to dpdk-next-net/main, thanks.


Re: [Patch v2] net/netvsc: fix the calculation of checksums based on mbuf flag

2022-04-26 Thread Ferruh Yigit

On 3/24/2022 5:46 PM, lon...@linuxonhyperv.com wrote:

From: Long Li 

The netvsc should use RTE_MBUF_F_TX_L4_MASK and check the masked value to
decide the correct way to calculate checksums.

Not checking for RTE_MBUF_F_TX_L4_MASK results in incorrect RNDIS packets
sent to VSP and incorrect checksums calculated by the VSP.

Fixes: 4e9c73e96e ("net/netvsc: add Hyper-V network device")
Cc: sta...@dpdk.org
Signed-off-by: Long Li 
---
  drivers/net/netvsc/hn_rxtx.c | 13 +
  1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/net/netvsc/hn_rxtx.c b/drivers/net/netvsc/hn_rxtx.c
index 028f176c7e..34f40be5b8 100644
--- a/drivers/net/netvsc/hn_rxtx.c
+++ b/drivers/net/netvsc/hn_rxtx.c
@@ -1348,8 +1348,11 @@ static void hn_encap(struct rndis_packet_msg *pkt,
*pi_data = NDIS_LSO2_INFO_MAKEIPV4(hlen,
   m->tso_segsz);
}
-   } else if (m->ol_flags &
-  (RTE_MBUF_F_TX_TCP_CKSUM | RTE_MBUF_F_TX_UDP_CKSUM | 
RTE_MBUF_F_TX_IP_CKSUM)) {
+   } else if ((m->ol_flags & RTE_MBUF_F_TX_L4_MASK) ==
+   RTE_MBUF_F_TX_TCP_CKSUM ||
+  (m->ol_flags & RTE_MBUF_F_TX_L4_MASK) ==
+   RTE_MBUF_F_TX_UDP_CKSUM ||
+  (m->ol_flags & RTE_MBUF_F_TX_IP_CKSUM)) {


As far as I can see following drivers also has similar issue, can 
maintainers (cc'ed) of below drivers check:


bnxt
dpaa
hnic
ionic
liquidio
mlx4
mvneta
mvpp2
qede


Re: [Patch v2] net/netvsc: fix the calculation of checksums based on mbuf flag

2022-04-26 Thread Ajit Khaparde
On Tue, Apr 26, 2022 at 2:57 PM Ferruh Yigit  wrote:
>
> On 3/24/2022 5:46 PM, lon...@linuxonhyperv.com wrote:
> > From: Long Li 
> >
> > The netvsc should use RTE_MBUF_F_TX_L4_MASK and check the masked value to
> > decide the correct way to calculate checksums.
> >
> > Not checking for RTE_MBUF_F_TX_L4_MASK results in incorrect RNDIS packets
> > sent to VSP and incorrect checksums calculated by the VSP.
> >
> > Fixes: 4e9c73e96e ("net/netvsc: add Hyper-V network device")
> > Cc: sta...@dpdk.org
> > Signed-off-by: Long Li 
> > ---
> >   drivers/net/netvsc/hn_rxtx.c | 13 +
> >   1 file changed, 9 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/net/netvsc/hn_rxtx.c b/drivers/net/netvsc/hn_rxtx.c
> > index 028f176c7e..34f40be5b8 100644
> > --- a/drivers/net/netvsc/hn_rxtx.c
> > +++ b/drivers/net/netvsc/hn_rxtx.c
> > @@ -1348,8 +1348,11 @@ static void hn_encap(struct rndis_packet_msg *pkt,
> >   *pi_data = NDIS_LSO2_INFO_MAKEIPV4(hlen,
> >  m->tso_segsz);
> >   }
> > - } else if (m->ol_flags &
> > -(RTE_MBUF_F_TX_TCP_CKSUM | RTE_MBUF_F_TX_UDP_CKSUM | 
> > RTE_MBUF_F_TX_IP_CKSUM)) {
> > + } else if ((m->ol_flags & RTE_MBUF_F_TX_L4_MASK) ==
> > + RTE_MBUF_F_TX_TCP_CKSUM ||
> > +(m->ol_flags & RTE_MBUF_F_TX_L4_MASK) ==
> > + RTE_MBUF_F_TX_UDP_CKSUM ||
> > +(m->ol_flags & RTE_MBUF_F_TX_IP_CKSUM)) {
>
> As far as I can see following drivers also has similar issue, can
> maintainers (cc'ed) of below drivers check:
>
> bnxt
ACK

> dpaa
> hnic
> ionic
> liquidio
> mlx4
> mvneta
> mvpp2
> qede


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [Patch v2] net/netvsc: report correct stats values

2022-04-26 Thread Stephen Hemminger
On Tue, 26 Apr 2022 22:56:14 +0100
Ferruh Yigit  wrote:

> > if (i < RTE_ETHDEV_QUEUE_STAT_CNTRS) {
> > -   stats->q_opackets[i] = txq->stats.packets;
> > -   stats->q_obytes[i] = txq->stats.bytes;
> > +   stats->q_opackets[i] += txq->stats.packets;
> > +   stats->q_obytes[i] += txq->stats.bytes;  
> 
> This is per queue stats, 'stats->q_opackets[i]', in next iteration of 
> the loop, 'i' will be increased and 'txq' will be updated, so as far as 
> I can see the above change has no affect.

Agree, that is why it was just assignment originally.


RE: [PATCH v2] net/ice: optimize max queue number calculation

2022-04-26 Thread Zhang, Qi Z



> -Original Message-
> From: Wu, Wenjun1 
> Sent: Tuesday, April 26, 2022 9:14 PM
> To: Zhang, Qi Z ; Yang, Qiming 
> Cc: dev@dpdk.org
> Subject: RE: [PATCH v2] net/ice: optimize max queue number calculation
> 
> 
> 
> > -Original Message-
> > From: Zhang, Qi Z 
> > Sent: Friday, April 8, 2022 7:24 PM
> > To: Yang, Qiming ; Wu, Wenjun1
> > 
> > Cc: dev@dpdk.org; Zhang, Qi Z 
> > Subject: [PATCH v2] net/ice: optimize max queue number calculation
> >
> > Remove the limitation that max queue pair number must be 2^n.
> > With this patch, even on a 8 ports device, the max queue pair number
> > increased from 128 to 254.
> >
> > Signed-off-by: Qi Zhang 
> > ---
> >
> > v2:
> > - fix check patch warning
> >
> >  drivers/net/ice/ice_ethdev.c | 24 
> >  1 file changed, 20 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/net/ice/ice_ethdev.c
> > b/drivers/net/ice/ice_ethdev.c index
> > 73e550f5fb..ff2b3e45d9 100644
> > --- a/drivers/net/ice/ice_ethdev.c
> > +++ b/drivers/net/ice/ice_ethdev.c
> > @@ -819,10 +819,26 @@ ice_vsi_config_tc_queue_mapping(struct ice_vsi
> > *vsi,
> > return -ENOTSUP;
> > }
> >
> > -   vsi->nb_qps = RTE_MIN(vsi->nb_qps, ICE_MAX_Q_PER_TC);
> > -   fls = (vsi->nb_qps == 0) ? 0 : rte_fls_u32(vsi->nb_qps) - 1;
> > -   /* Adjust the queue number to actual queues that can be applied */
> > -   vsi->nb_qps = (vsi->nb_qps == 0) ? 0 : 0x1 << fls;
> > +   /* vector 0 is reserved and 1 vector for ctrl vsi */
> > +   if (vsi->adapter->hw.func_caps.common_cap.num_msix_vectors < 2)
> > +   vsi->nb_qps = 0;
> > +   else
> > +   vsi->nb_qps = RTE_MIN
> > +   ((uint16_t)vsi->adapter-
> > >hw.func_caps.common_cap.num_msix_vectors - 2,
> > +   RTE_MIN(vsi->nb_qps, ICE_MAX_Q_PER_TC));
> > +
> > +   /* nb_qps(hex)  -> fls */
> > +   /*  -> 0 */
> > +   /* 0001 -> 0 */
> > +   /* 0002 -> 1 */
> > +   /* 0003 ~ 0004  -> 2 */
> > +   /* 0005 ~ 0008  -> 3 */
> > +   /* 0009 ~ 0010  -> 4 */
> > +   /* 0011 ~ 0020  -> 5 */
> > +   /* 0021 ~ 0040  -> 6 */
> > +   /* 0041 ~ 0080  -> 7 */
> > +   /* 0081 ~ 0100  -> 8 */
> > +   fls = (vsi->nb_qps == 0) ? 0 : rte_fls_u32(vsi->nb_qps - 1);
> >
> > qp_idx = 0;
> > /* Set tc and queue mapping with VSI */
> > --
> > 2.26.2
> 
> Acked-by: Wenjun Wu < wenjun1...@intel.com>
> 
> Thanks
> Wenjun
> 

Applied to dpdk-next-net-intel.

Thanks
Qi



RE: [RFC] eal: allow worker lcore stacks to be allocated from hugepage memory

2022-04-26 Thread Honnappa Nagarahalli


> 
> Add support for using hugepages for worker lcore stack memory.  The intent is
> to improve performance by reducing stack memory related TLB misses and also
> by using memory local to the NUMA node of each lcore.
This is a good idea. Have you measured any performance differences with this 
patch? What kind of benefits do you see?

> 
> Platforms desiring to make use of this capability must enable the associated
> option flag and stack size settings in platform config files.
> ---
>  lib/eal/linux/eal.c | 39 +++
>  1 file changed, 39 insertions(+)
> 
> diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c index
> 1ef263434a..4e1e5b6915 100644
> --- a/lib/eal/linux/eal.c
> +++ b/lib/eal/linux/eal.c
> @@ -1143,9 +1143,48 @@ rte_eal_init(int argc, char **argv)
> 
>   lcore_config[i].state = WAIT;
> 
> +#ifdef RTE_EAL_NUMA_AWARE_LCORE_STACK
> + /* Allocate NUMA aware stack memory and set pthread
> attributes */
> + pthread_attr_t attr;
> + void *stack_ptr =
> + rte_zmalloc_socket("lcore_stack",
> +
> RTE_EAL_NUMA_AWARE_LCORE_STACK_SIZE,
> +
> RTE_EAL_NUMA_AWARE_LCORE_STACK_SIZE,
> +rte_lcore_to_socket_id(i));
> +
> + if (stack_ptr == NULL) {
> + rte_eal_init_alert("Cannot allocate stack memory");
May be worth adding more details to the error message, like lcore id.

> + rte_errno = ENOMEM;
> + return -1;
> + }
> +
> + if (pthread_attr_init(&attr) != 0) {
> + rte_eal_init_alert("Cannot init pthread attributes");
> + rte_errno = EINVAL;
EFAULT would be better.

> + return -1;
> + }
> + if (pthread_attr_setstack(&attr,
> +   stack_ptr,
> +
> RTE_EAL_NUMA_AWARE_LCORE_STACK_SIZE) != 0) {
> + rte_eal_init_alert("Cannot set pthread stack
> attributes");
> + rte_errno = ENOTSUP;
EFAULT would be better.

> + return -1;
> + }
> +
> + /* create a thread for each lcore */
> + ret = pthread_create(&lcore_config[i].thread_id, &attr,
> +  eal_thread_loop, (void *)(uintptr_t)i);
> +
> + if (pthread_attr_destroy(&attr) != 0) {
> + rte_eal_init_alert("Cannot destroy pthread
> attributes");
> + rte_errno = EFAULT;
> + return -1;
> + }
> +#else
>   /* create a thread for each lcore */
>   ret = pthread_create(&lcore_config[i].thread_id, NULL,
>eal_thread_loop, (void *)(uintptr_t)i);
> +#endif
>   if (ret != 0)
>   rte_panic("Cannot create thread\n");
> 
> --
> 2.17.1



RE: [PATCH] doc: update matching versions in ice guide

2022-04-26 Thread Zhang, Qi Z



> -Original Message-
> From: Yang, Qiming 
> Sent: Tuesday, April 26, 2022 1:36 PM
> To: dev@dpdk.org
> Cc: Zhang, Qi Z ; Yang, Qiming
> ; sta...@dpdk.org
> Subject: [PATCH] doc: update matching versions in ice guide
> 
> Add recommended matching list for ice PMD in DPDK 22.03.
> 
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Qiming Yang 
> ---
>  doc/guides/nics/ice.rst | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/doc/guides/nics/ice.rst b/doc/guides/nics/ice.rst index
> a1780c46c3..6b903b9bbc 100644
> --- a/doc/guides/nics/ice.rst
> +++ b/doc/guides/nics/ice.rst
> @@ -62,6 +62,8 @@ The detailed information can refer to chapter Tested
> Platforms/Tested NICs in re
> 
> +---+---+-+---+--+---+
> |21.11  | 1.7.16|  1.3.27 |  1.3.31   |1.3.7 
> |3.1|
> 
> +---+---+-+---+--+---+
> +   |22.03  | 1.8.3 |  1.3.28 |  1.3.35   |1.3.8 
> |3.2|
> +
> + +---+---+-+---+---
> + ---+---+
> 
>  Pre-Installation Configuration
>  --
> --
> 2.17.1

Acked-by: Qi Zhang 

Applied to dpdk-next-net-intel.

Thanks
Qi



RE: [DPDK v4] net/ixgbe: promote MDIO API

2022-04-26 Thread Zhang, Qi Z



> -Original Message-
> From: Ray Kinsella 
> Sent: Tuesday, April 26, 2022 6:12 PM
> To: Zeng, ZhichaoX 
> Cc: dev@dpdk.org; Yang, Qiming ; Wang, Haiyue
> ; David Marchand 
> Subject: Re: [DPDK v4] net/ixgbe: promote MDIO API
> 
> 
> Zeng, ZhichaoX  writes:
> 
> > Hi, Ray, David:
> >
> > What is your opinion on this patch?
> >
> > Regards,
> > Zhichao
> >
> > -Original Message-
> > From: Zeng, ZhichaoX 
> > Sent: Tuesday, April 19, 2022 7:06 PM
> > To: dev@dpdk.org
> > Cc: Yang, Qiming ; Wang, Haiyue
> ; m...@ashroe.eu; Zeng, ZhichaoX
> 
> > Subject: [DPDK v4] net/ixgbe: promote MDIO API
> >
> > From: Zhichao Zeng 
> >
> > Promote the MDIO APIs to be stable.
> >
> > Signed-off-by: Zhichao Zeng 
> > ---
> >  drivers/net/ixgbe/rte_pmd_ixgbe.h |  5 -
> >  drivers/net/ixgbe/version.map | 10 +-
> >  2 files changed, 5 insertions(+), 10 deletions(-)
> >
> 
> Acked-by: Ray Kinsella 

Applied to dpdk-next-net-intel.

Thanks
Qi


RE: [PATCH v7 0/9] Enable ETS-based TX QoS on PF

2022-04-26 Thread Yang, Qiming
Hi,

> -Original Message-
> From: Wu, Wenjun1 
> Sent: 2022年4月22日 8:58
> To: dev@dpdk.org; Yang, Qiming ; Zhang, Qi Z
> 
> Subject: [PATCH v7 0/9] Enable ETS-based TX QoS on PF
> 
> This patch set enables ETS-based TX QoS on PF. It is supported to configure
> bandwidth and priority in both queue and queue group level, and weight
> only in queue level.
> 
> v2: fix code style issue.
> v3: fix uninitialization issue.
> v4: fix logical issue.
> v5: fix CI testing issue. Add explicit cast.
> v6: add release note.
> v7: merge the release note with the previous patch.
> 
> Ting Xu (1):
>   net/ice: support queue bandwidth limit
> 
> Wenjun Wu (8):
>   net/ice/base: fix dead lock issue when getting node from ID type
>   net/ice/base: support priority configuration of the exact node
>   net/ice/base: support queue BW allocation configuration
>   net/ice: support queue group bandwidth limit
>   net/ice: support queue priority configuration
>   net/ice: support queue weight configuration
>   net/ice: support queue group priority configuration
>   net/ice: add warning log for unsupported configuration
> 
>  doc/guides/rel_notes/release_22_07.rst |   4 +
>  drivers/net/ice/base/ice_sched.c   |  89 ++-
>  drivers/net/ice/base/ice_sched.h   |   6 +
>  drivers/net/ice/ice_ethdev.c   |  19 +
>  drivers/net/ice/ice_ethdev.h   |  55 ++
>  drivers/net/ice/ice_tm.c   | 844 +
>  drivers/net/ice/meson.build|   1 +
>  7 files changed, 1016 insertions(+), 2 deletions(-)  create mode 100644
> drivers/net/ice/ice_tm.c
> 
> --
> 2.25.1

Acked-by: Qiming Yang 


RE: [PATCH v6 07/16] examples/vdpa: add vDPA blk support in example

2022-04-26 Thread Pei, Andy
Hi Chenbo,

Thanks for your reply.
My reply is inline.

> -Original Message-
> From: Xia, Chenbo 
> Sent: Monday, April 25, 2022 9:39 PM
> To: Pei, Andy ; dev@dpdk.org
> Cc: maxime.coque...@redhat.com; Cao, Gang ; Liu,
> Changpeng 
> Subject: RE: [PATCH v6 07/16] examples/vdpa: add vDPA blk support in
> example
> 
> Hi Andy,
> 
> > -Original Message-
> > From: Pei, Andy 
> > Sent: Thursday, April 21, 2022 4:34 PM
> > To: dev@dpdk.org
> > Cc: Xia, Chenbo ; maxime.coque...@redhat.com;
> > Cao, Gang ; Liu, Changpeng
> > 
> > Subject: [PATCH v6 07/16] examples/vdpa: add vDPA blk support in
> > example
> >
> > Add virtio blk device support to vDPA example.
> >
> > Signed-off-by: Andy Pei 
> > ---
> >  examples/vdpa/main.c |  61 +-
> >  examples/vdpa/vdpa_blk_compact.h |  72 +
> >  examples/vdpa/vhost_user.h   | 169
> > +++
> >  3 files changed, 301 insertions(+), 1 deletion(-)  create mode 100644
> > examples/vdpa/vdpa_blk_compact.h  create mode 100644
> > examples/vdpa/vhost_user.h
> >
> > diff --git a/examples/vdpa/main.c b/examples/vdpa/main.c index
> > 5ab0765..1c809ab 100644
> > --- a/examples/vdpa/main.c
> > +++ b/examples/vdpa/main.c
> > @@ -20,6 +20,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include "vdpa_blk_compact.h"
> >
> >  #define MAX_PATH_LEN 128
> >  #define MAX_VDPA_SAMPLE_PORTS 1024
> > @@ -41,6 +42,7 @@ struct vdpa_port {
> >  static int devcnt;
> >  static int interactive;
> >  static int client_mode;
> > +static int isblk;
> >
> >  /* display usage */
> >  static void
> > @@ -49,7 +51,8 @@ struct vdpa_port {
> > printf("Usage: %s [EAL options] -- "
> >  "  --interactive|-i: run in interactive
> > mode.\n"
> >  "  --iface : specify the path prefix
> of
> > the socket files, e.g. /tmp/vhost-user-.\n"
> > -"  --client: register a vhost-user socket
> as
> > client mode.\n",
> > +"  --client: register a vhost-user socket
> as
> > client mode.\n"
> > +"  --isblk: device is a block device, e.g.
> > virtio_blk device.\n",
> >  prgname);
> >  }
> >
> > @@ -61,6 +64,7 @@ struct vdpa_port {
> > {"iface", required_argument, NULL, 0},
> > {"interactive", no_argument, &interactive, 1},
> > {"client", no_argument, &client_mode, 1},
> > +   {"isblk", no_argument, &isblk, 1},
> 
> I think a new API for get_device_type will be better than asking user to
> specify the device type.
> 
Good suggestion. I will send out a version of patch set and try to do this.
> > {NULL, 0, 0, 0},
> > };
> > int opt, idx;
> > @@ -159,6 +163,52 @@ struct vdpa_port {  };
> >
> >  static int
> > +vdpa_blk_device_set_features_and_protocol(const char *path) {
> > +   uint64_t protocol_features = 0;
> > +   int ret;
> > +
> > +   ret = rte_vhost_driver_set_features(path,
> VHOST_BLK_FEATURES_BASE);
> > +   if (ret != 0) {
> > +   RTE_LOG(ERR, VDPA,
> > +   "rte_vhost_driver_set_features for %s failed.\n",
> > +   path);
> > +   goto out;
> > +   }
> > +
> > +   ret = rte_vhost_driver_disable_features(path,
> > +   VHOST_VDPA_BLK_DISABLED_FEATURES);
> > +   if (ret != 0) {
> > +   RTE_LOG(ERR, VDPA,
> > +   "rte_vhost_driver_disable_features for %s failed.\n",
> > +   path);
> > +   goto out;
> > +   }
> > +
> > +   ret = rte_vhost_driver_get_protocol_features(path,
> > &protocol_features);
> > +   if (ret != 0) {
> > +   RTE_LOG(ERR, VDPA,
> > +   "rte_vhost_driver_get_protocol_features for %s
> > failed.\n",
> > +   path);
> > +   goto out;
> > +   }
> > +
> > +   protocol_features |= (1ULL << VHOST_USER_PROTOCOL_F_CONFIG);
> > +   protocol_features |= (1ULL <<
> VHOST_USER_PROTOCOL_F_LOG_SHMFD);
> > +
> > +   ret = rte_vhost_driver_set_protocol_features(path,
> > protocol_features);
> > +   if (ret != 0) {
> > +   RTE_LOG(ERR, VDPA,
> > +   "rte_vhost_driver_set_protocol_features for %s
> > failed.\n",
> > +   path);
> > +   goto out;
> > +   }
> > +
> > +out:
> > +   return ret;
> > +}
> > +
> > +static int
> >  start_vdpa(struct vdpa_port *vport)
> >  {
> > int ret;
> > @@ -192,6 +242,15 @@ struct vdpa_port {
> > "attach vdpa device failed: %s\n",
> > socket_path);
> >
> > +   if (isblk) {
> > +   RTE_LOG(NOTICE, VDPA, "is a blk device\n");
> > +   ret =
> vdpa_blk_device_set_features_and_protocol(socket_path);
> > +   if (ret != 0)
> > +   rte_exit(EXIT_FAILURE,
> > +   "set vhost blk driver features and protocol
> > features failed: %s\n",
> > +   

RE: [PATCH v6 03/16] vhost: add vhost msg support

2022-04-26 Thread Pei, Andy
Hi Chenbo, 

Thanks for your reply.
My reply is inline.

> -Original Message-
> From: Xia, Chenbo 
> Sent: Tuesday, April 26, 2022 5:17 PM
> To: Pei, Andy ; dev@dpdk.org
> Cc: maxime.coque...@redhat.com; Cao, Gang ; Liu,
> Changpeng 
> Subject: RE: [PATCH v6 03/16] vhost: add vhost msg support
> 
> > -Original Message-
> > From: Pei, Andy 
> > Sent: Tuesday, April 26, 2022 4:56 PM
> > To: Xia, Chenbo ; dev@dpdk.org
> > Cc: maxime.coque...@redhat.com; Cao, Gang ; Liu,
> > Changpeng 
> > Subject: RE: [PATCH v6 03/16] vhost: add vhost msg support
> >
> > HI Chenbo,
> >
> > Thanks for your reply.
> > My reply is inline.
> >
> > > -Original Message-
> > > From: Xia, Chenbo 
> > > Sent: Monday, April 25, 2022 8:42 PM
> > > To: Pei, Andy ; dev@dpdk.org
> > > Cc: maxime.coque...@redhat.com; Cao, Gang ; Liu,
> > > Changpeng 
> > > Subject: RE: [PATCH v6 03/16] vhost: add vhost msg support
> > >
> > > Hi Andy,
> > >
> > > > -Original Message-
> > > > From: Pei, Andy 
> > > > Sent: Thursday, April 21, 2022 4:34 PM
> > > > To: dev@dpdk.org
> > > > Cc: Xia, Chenbo ;
> > > > maxime.coque...@redhat.com; Cao, Gang ; Liu,
> > > > Changpeng 
> > > > Subject: [PATCH v6 03/16] vhost: add vhost msg support
> > > >
> > > > Add support for VHOST_USER_GET_CONFIG and
> > > VHOST_USER_SET_CONFIG.
> > > > VHOST_USER_GET_CONFIG and VHOST_USER_SET_CONFIG message is
> only
> > > > supported by virtio blk VDPA device.
> > > >
> > > > Signed-off-by: Andy Pei 
> > > > ---
> > > >  lib/vhost/vhost_user.c | 69
> > > > ++
> > > >  lib/vhost/vhost_user.h | 13 ++
> > > >  2 files changed, 82 insertions(+)
> > > >
> > > > diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c index
> > > > 1d39067..3780804 100644
> > > > --- a/lib/vhost/vhost_user.c
> > > > +++ b/lib/vhost/vhost_user.c
> > > > @@ -80,6 +80,8 @@
> > > >  [VHOST_USER_NET_SET_MTU]  = "VHOST_USER_NET_SET_MTU",
> > > > [VHOST_USER_SET_SLAVE_REQ_FD]  =
> > > "VHOST_USER_SET_SLAVE_REQ_FD",
> > > >  [VHOST_USER_IOTLB_MSG]  = "VHOST_USER_IOTLB_MSG",
> > > > +[VHOST_USER_GET_CONFIG]  = "VHOST_USER_GET_CONFIG",
> > > > +[VHOST_USER_SET_CONFIG]  = "VHOST_USER_SET_CONFIG",
> > > >  [VHOST_USER_CRYPTO_CREATE_SESS] =
> > > "VHOST_USER_CRYPTO_CREATE_SESS",
> > > >  [VHOST_USER_CRYPTO_CLOSE_SESS] =
> > > "VHOST_USER_CRYPTO_CLOSE_SESS",
> > > >  [VHOST_USER_POSTCOPY_ADVISE]  =
> > > "VHOST_USER_POSTCOPY_ADVISE", @@
> > > > -2542,6 +2544,71 @@ static int is_vring_iotlb(struct virtio_net
> > > > *dev, }
> > > >
> > > >  static int
> > > > +vhost_user_get_config(struct virtio_net **pdev, struct
> > > > +vhu_msg_context *ctx, int main_fd __rte_unused) { struct
> > > > +virtio_net *dev = *pdev; struct rte_vdpa_device *vdpa_dev =
> > > > +dev->vdpa_dev; int ret = 0;
> > > > +
> > > > +if (vdpa_dev->ops->get_config) {
> > > > +ret = vdpa_dev->ops->get_config(dev->vid,
> > > > +   ctx->msg.payload.cfg.region,
> > > > +   ctx->msg.payload.cfg.size);
> > > > +if (ret != 0) {
> > > > +ctx->msg.size = 0;
> > > > +VHOST_LOG_CONFIG(ERR,
> > > > + "(%s) get_config() return error!\n",
> > > > + dev->ifname);
> > > > +}
> > > > +} else {
> > > > +VHOST_LOG_CONFIG(ERR, "(%s) get_config() not
> > > supportted!\n",
> > >
> > > Supported
> > >
> > I will send out a new version to fix this.
> > > > + dev->ifname);
> > > > +}
> > > > +
> > > > +return RTE_VHOST_MSG_RESULT_REPLY; }
> > > > +
> > > > +static int
> > > > +vhost_user_set_config(struct virtio_net **pdev, struct
> > > > +vhu_msg_context *ctx, int main_fd __rte_unused) { struct
> > > > +virtio_net *dev = *pdev; struct rte_vdpa_device *vdpa_dev =
> > > > +dev->vdpa_dev; int ret = 0;
> > > > +
> > > > +if (ctx->msg.size != sizeof(struct vhost_user_config)) {
> > >
> > > I think you should do sanity check on payload.cfg.size and make sure
> > it's
> > > smaller than VHOST_USER_MAX_CONFIG_SIZE
> > >
> > > and same check for offset
> > >
> > I think payload.cfg.size can be smaller than or equal to
> > VHOST_USER_MAX_CONFIG_SIZE.
> > payload.cfg.ofset can be smaller than or equal to
> > VHOST_USER_MAX_CONFIG_SIZE as well
> 
> After double check: offset is the config space offset, so this should be
> checked in vdpa driver. Size check on vhost lib layer should be just <=
> MAX_you_defined
> 
OK.
> Thanks,
> Chenbo
> 
> >
> > > > +VHOST_LOG_CONFIG(ERR,
> > > > +"(%s) invalid set config msg size: %"PRId32" != %d\n",
> > > > +dev->ifname, ctx->msg.size,
> > >
> > > Based on you will change the log too, payload.cfg.size is uint32_t,
> > > so
> > PRId32 ->
> > > PRIu32
> > >
> > > > +(int)sizeof(struct vhost_user_config));
> > >
> > > So this can be %u
> > >
> > Sure.
> > > > +goto OUT;
> > > > +}
> > > > +
> > > > +if (vdpa_dev->ops->set_config) {
> > > > +ret = vdpa_dev->ops->set_config(dev->vid,
> > > > +ctx->msg.payload.cfg.region,
> > > > +ctx->msg.payload.cfg.offset,
> > > > +ctx->msg.payload.cfg.size,
> > > > +ctx->msg.payload.cfg.flags);
> > > > +if (ret)
> > > > +VHOST

[Bug 1001] [meson test] Debug-tests/dump_* all meson test time out because commands are not registered to command list

2022-04-26 Thread bugzilla
https://bugs.dpdk.org/show_bug.cgi?id=1001

Bug ID: 1001
   Summary: [meson test] Debug-tests/dump_* all meson test time
out because commands are not registered to command
list
   Product: DPDK
   Version: unspecified
  Hardware: All
OS: All
Status: UNCONFIRMED
  Severity: normal
  Priority: Normal
 Component: meson
  Assignee: dev@dpdk.org
  Reporter: weiyuanx...@intel.com
  Target Milestone: ---

[Environment]
DPDK version: Use make show version or for a non-released version: git remote
-v && git show-ref --heads
 22.07.0-rc0  55ae8965bf8eecd5ebec36663bb0f36018abf64b
OS: Red Hat Enterprise Linux 8.4 (Ootpa)/4.18.0-305.19.1.el8_4.x86_64
Compiler: gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-4)
Hardware platform: Intel(R) Xeon(R) Gold 6252N CPU @ 2.30GHz

[Test Setup]
Steps to reproduce
List the steps to reproduce the issue.

1. Use the following command to build DPDK:
CC=gcc meson -Denable_kmods=True -Dlibdir=lib  --default-library=static
x86_64-native-linuxapp-gcc/
ninja -C x86_64-native-linuxapp-gcc/

2. Execute the following command in the dpdk directory.
meson test -C x86_64-native-linuxapp-gcc dump_struct_sizes

[show the output]
root@localhost dpdk]# meson test -C x86_64-native-linuxapp-gcc
dump_struct_sizes
ninja: Entering directory `/root/dpdk/x86_64-native-linuxapp-gcc'
ninja: no work to do.
1/1 DPDK:debug-tests / dump_struct_sizesTIMEOUT600.02s   killed
by signal 15 SIGTERM
>>> MALLOC_PERTURB_=155 DPDK_TEST=dump_struct_sizes
>>> /root/dpdk/x86_64-native-linuxapp-gcc/app/test/dpdk-test



Ok: 0
Expected Fail:  0
Fail:   0
Unexpected Pass:0
Skipped:0
Timeout:1

Full log written to
/root/dpdk/x86_64-native-linuxapp-gcc/meson-logs/testlog.txt


show log from the testlog.txt.

1/1 DPDK:debug-tests / dump_struct_sizes TIMEOUT600.02s   killed by
signal 15 SIGTERM
05:28:14 MALLOC_PERTURB_=155 DPDK_TEST=dump_struct_sizes
/root/dpdk/x86_64-native-linuxapp-gcc/app/test/dpdk-test
--- output ---
stdout:
RTE>>
stderr:
EAL: Detected CPU lcores: 96
EAL: Detected NUMA nodes: 2
EAL: Detected static linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: 1024 hugepages of size 2097152 reserved, but no mounted hugetlbfs found
for that size
APP: HPET is not enabled, using TSC as default timer
APP: Invalid DPDK_TEST value 'dump_struct_sizes'
--


[Expected Result]
Test ok.

[Affected Test cases]
DPDK:debug-tests / dump_struct_sizes
DPDK:debug-tests / dump_mempool
DPDK:debug-tests / dump_malloc_stats
DPDK:debug-tests / dump_devargs
DPDK:debug-tests / dump_log_types
DPDK:debug-tests / dump_ring
DPDK:debug-tests / dump_physmem
DPDK:debug-tests / dump_memzone

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 1002] [meson test] Debug-tests/dump_* all meson test time out because commands are not registered to command list

2022-04-26 Thread bugzilla
https://bugs.dpdk.org/show_bug.cgi?id=1002

Bug ID: 1002
   Summary: [meson test] Debug-tests/dump_* all meson test time
out because commands are not registered to command
list
   Product: DPDK
   Version: unspecified
  Hardware: All
OS: All
Status: UNCONFIRMED
  Severity: normal
  Priority: Normal
 Component: meson
  Assignee: dev@dpdk.org
  Reporter: weiyuanx...@intel.com
  Target Milestone: ---

[Environment]
DPDK version: Use make show version or for a non-released version: git remote
-v && git show-ref --heads
 22.07.0-rc0  55ae8965bf8eecd5ebec36663bb0f36018abf64b
OS: Red Hat Enterprise Linux 8.4 (Ootpa)/4.18.0-305.19.1.el8_4.x86_64
Compiler: gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-4)
Hardware platform: Intel(R) Xeon(R) Gold 6252N CPU @ 2.30GHz

[Test Setup]
Steps to reproduce
List the steps to reproduce the issue.

1. Use the following command to build DPDK:
CC=gcc meson -Denable_kmods=True -Dlibdir=lib  --default-library=static
x86_64-native-linuxapp-gcc/
ninja -C x86_64-native-linuxapp-gcc/

2. Execute the following command in the dpdk directory.
meson test -C x86_64-native-linuxapp-gcc dump_struct_sizes

[show the output]
root@localhost dpdk]# meson test -C x86_64-native-linuxapp-gcc
dump_struct_sizes
ninja: Entering directory `/root/dpdk/x86_64-native-linuxapp-gcc'
ninja: no work to do.
1/1 DPDK:debug-tests / dump_struct_sizesTIMEOUT600.02s   killed
by signal 15 SIGTERM
>>> MALLOC_PERTURB_=155 DPDK_TEST=dump_struct_sizes
>>> /root/dpdk/x86_64-native-linuxapp-gcc/app/test/dpdk-test



Ok: 0
Expected Fail:  0
Fail:   0
Unexpected Pass:0
Skipped:0
Timeout:1

Full log written to
/root/dpdk/x86_64-native-linuxapp-gcc/meson-logs/testlog.txt


show log from the testlog.txt.

1/1 DPDK:debug-tests / dump_struct_sizes TIMEOUT600.02s   killed by
signal 15 SIGTERM
05:28:14 MALLOC_PERTURB_=155 DPDK_TEST=dump_struct_sizes
/root/dpdk/x86_64-native-linuxapp-gcc/app/test/dpdk-test
--- output ---
stdout:
RTE>>
stderr:
EAL: Detected CPU lcores: 96
EAL: Detected NUMA nodes: 2
EAL: Detected static linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: 1024 hugepages of size 2097152 reserved, but no mounted hugetlbfs found
for that size
APP: HPET is not enabled, using TSC as default timer
APP: Invalid DPDK_TEST value 'dump_struct_sizes'
--


[Expected Result]
Test ok.

[Affected Test cases]
DPDK:debug-tests / dump_struct_sizes
DPDK:debug-tests / dump_mempool
DPDK:debug-tests / dump_malloc_stats
DPDK:debug-tests / dump_devargs
DPDK:debug-tests / dump_log_types
DPDK:debug-tests / dump_ring
DPDK:debug-tests / dump_physmem
DPDK:debug-tests / dump_memzone

-- 
You are receiving this mail because:
You are the assignee for the bug.

  1   2   >