Re: [dpdk-dev] [PATCH] doc: announce security API changes for Inline IPsec

2021-07-30 Thread Hemant Agrawal
Acked-by: Hemant Agrawal 


Re: [dpdk-dev] [PATCH 1/2] vhost: announce vDPA driver API marking as internal

2021-07-30 Thread Maxime Coquelin



On 7/30/21 7:32 AM, Xia, Chenbo wrote:
> Hi Maxime,
> 
>> -Original Message-
>> From: Maxime Coquelin 
>> Sent: Thursday, July 29, 2021 10:43 PM
>> To: dev@dpdk.org; Xia, Chenbo ; amore...@redhat.com;
>> Richardson, Bruce ; Yigit, Ferruh
>> ; tho...@monjalon.net; acon...@redhat.com
>> Cc: Maxime Coquelin 
>> Subject: [PATCH 1/2] vhost: announce vDPA driver API marking as internal
>>
>> This patch announces the marking if all the vDPA driver API
>> as internal.
> 
> Marking if all -> marking all?

I meant marking of

> And API -> APIs
> 
> With things fixed:
> 
> Acked-by: Chenbo Xia 

Thanks, I'll post a v2.

Maxime

>>
>> Signed-off-by: Maxime Coquelin 
>> ---
>>  doc/guides/rel_notes/deprecation.rst | 4 
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/doc/guides/rel_notes/deprecation.rst
>> b/doc/guides/rel_notes/deprecation.rst
>> index 9584d6bfd7..b34bed61a6 100644
>> --- a/doc/guides/rel_notes/deprecation.rst
>> +++ b/doc/guides/rel_notes/deprecation.rst
>> @@ -147,3 +147,7 @@ Deprecation Notices
>>  * cmdline: ``cmdline`` structure will be made opaque to hide platform-
>> specific
>>content. On Linux and FreeBSD, supported prior to DPDK 20.11,
>>original structure will be kept until DPDK 21.11.
>> +
>> +* vhost: ``rte_vdpa_register_device``, ``rte_vdpa_unregister_device``,
>> +  ``rte_vhost_host_notifier_ctrl`` and ``rte_vdpa_relay_vring_used`` vDPA
>> +  driver API will be marked as internal in DPDK v21.11.
>> --
>> 2.31.1
> 



[dpdk-dev] [PATCH v2] drivers: remove warning with meson 0.59.0

2021-07-30 Thread jerinj
From: Jerin Jacob 

Since meson 0.59.0 version, the extract_all_objects() API
need to pass explicit boolean value.

To remove the following warning[1], added explicit `true` for
extract_all_objects() use in codebase whever there is
no argument.

[1]
WARNING: extract_all_objects called without setting recursive
keyword argument. Meson currently defaults to
non-recursive to maintain backward compatibility but
the default will be changed in the future.

Signed-off-by: Jerin Jacob 
---
v2..v1
- Corrrect the meson version number in git commit log(0.46.0 to 0.59.0)

 drivers/common/sfc_efx/base/meson.build | 2 +-
 drivers/meson.build | 2 +-
 drivers/net/e1000/base/meson.build  | 2 +-
 drivers/net/fm10k/base/meson.build  | 2 +-
 drivers/net/hinic/base/meson.build  | 2 +-
 drivers/net/i40e/base/meson.build   | 2 +-
 drivers/net/ice/base/meson.build| 2 +-
 drivers/net/igc/base/meson.build| 2 +-
 drivers/net/ixgbe/base/meson.build  | 2 +-
 drivers/net/ngbe/base/meson.build   | 2 +-
 drivers/net/octeontx/base/meson.build   | 2 +-
 drivers/net/qede/base/meson.build   | 2 +-
 drivers/net/thunderx/base/meson.build   | 2 +-
 drivers/net/txgbe/base/meson.build  | 2 +-
 drivers/raw/ifpga/base/meson.build  | 2 +-
 15 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/drivers/common/sfc_efx/base/meson.build 
b/drivers/common/sfc_efx/base/meson.build
index 9fba47b1cc..ff7f33fb44 100644
--- a/drivers/common/sfc_efx/base/meson.build
+++ b/drivers/common/sfc_efx/base/meson.build
@@ -86,7 +86,7 @@ if build
 dependencies: static_rte_eal,
 c_args: c_args)

-base_objs = base_lib.extract_all_objects()
+base_objs = base_lib.extract_all_objects(recursive: true)
 else
 base_objs = []
 endif
diff --git a/drivers/meson.build b/drivers/meson.build
index bc6f4f567f..d9e331ec85 100644
--- a/drivers/meson.build
+++ b/drivers/meson.build
@@ -160,7 +160,7 @@ foreach subpath:subdirs
 include_directories: includes,
 dependencies: static_deps,
 c_args: cflags)
-objs += tmp_lib.extract_all_objects()
+objs += tmp_lib.extract_all_objects(recursive: true)
 sources = custom_target(out_filename,
 command: [pmdinfo, tmp_lib.full_path(), '@OUTPUT@', 
pmdinfogen],
 output: out_filename,
diff --git a/drivers/net/e1000/base/meson.build 
b/drivers/net/e1000/base/meson.build
index 317692dfab..528a33f958 100644
--- a/drivers/net/e1000/base/meson.build
+++ b/drivers/net/e1000/base/meson.build
@@ -35,4 +35,4 @@ endforeach
 base_lib = static_library('e1000_base', sources,
 dependencies: static_rte_eal,
 c_args: c_args)
-base_objs = base_lib.extract_all_objects()
+base_objs = base_lib.extract_all_objects(recursive: true)
diff --git a/drivers/net/fm10k/base/meson.build 
b/drivers/net/fm10k/base/meson.build
index ca98d34d4e..bd19df27f7 100644
--- a/drivers/net/fm10k/base/meson.build
+++ b/drivers/net/fm10k/base/meson.build
@@ -25,4 +25,4 @@ endforeach
 base_lib = static_library('fm10k_base', sources,
 dependencies: static_rte_eal,
 c_args: c_args)
-base_objs = base_lib.extract_all_objects()
+base_objs = base_lib.extract_all_objects(recursive: true)
diff --git a/drivers/net/hinic/base/meson.build 
b/drivers/net/hinic/base/meson.build
index a00c90c14e..3aa53df881 100644
--- a/drivers/net/hinic/base/meson.build
+++ b/drivers/net/hinic/base/meson.build
@@ -34,4 +34,4 @@ c_args = cflags
 base_lib = static_library('hinic_base', sources,
 dependencies: [static_rte_eal, static_rte_ethdev, static_rte_bus_pci, 
static_rte_hash],
 c_args: c_args)
-base_objs = base_lib.extract_all_objects()
+base_objs = base_lib.extract_all_objects(recursive: true)
diff --git a/drivers/net/i40e/base/meson.build 
b/drivers/net/i40e/base/meson.build
index 79a887a297..d94108629b 100644
--- a/drivers/net/i40e/base/meson.build
+++ b/drivers/net/i40e/base/meson.build
@@ -27,4 +27,4 @@ endforeach
 base_lib = static_library('i40e_base', sources,
 dependencies: static_rte_eal,
 c_args: c_args)
-base_objs = base_lib.extract_all_objects()
+base_objs = base_lib.extract_all_objects(recursive: true)
diff --git a/drivers/net/ice/base/meson.build b/drivers/net/ice/base/meson.build
index 3305e5dd18..30e251876d 100644
--- a/drivers/net/ice/base/meson.build
+++ b/drivers/net/ice/base/meson.build
@@ -43,4 +43,4 @@ endforeach
 base_lib = static_library('ice_base', sources,
 dependencies: static_rte_eal,
 c_args: c_args)
-base_objs = base_lib.extract_all_objects()
+base_objs = base_lib.extract_all_objects(recursive: true)
diff --git a/drivers/net/igc/base/meson.build b/drivers/net/igc/base/meson.build
index 8affc72e65..f52421f7a9 100644
--- a/drivers/net/igc/base/meson.build
+++ b/drivers/net/igc/base/meson.build
@@ -16,4 +16,4 @@ base_lib = static_library('igc_base', sources,
 dependencies: static_rte_eal,
 c_args: cflags)

-base_objs = base_lib.ext

[dpdk-dev] [PATCH v2 0/2] vhost: v21.11 deprecation notices

2021-07-30 Thread Maxime Coquelin
Two deprecations planned for DPDK v21.11 in Vhost:
 - marking vDPA driver API as internal
 - prefixing Vhost ops struct with rte_

Maxime Coquelin (2):
  vhost: announce vDPA driver API marking as internal
  vhost: notice Vhost ops struct renaming

 doc/guides/rel_notes/deprecation.rst | 7 +++
 1 file changed, 7 insertions(+)

-- 
2.31.1



[dpdk-dev] [PATCH v2 1/2] vhost: announce vDPA driver API marking as internal

2021-07-30 Thread Maxime Coquelin
This patch announces the marking of all the vDPA driver APIs
as internal.

Acked-by: Chenbo Xia 
Signed-off-by: Maxime Coquelin 
---
 doc/guides/rel_notes/deprecation.rst | 4 
 1 file changed, 4 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst 
b/doc/guides/rel_notes/deprecation.rst
index 9584d6bfd7..b34bed61a6 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -147,3 +147,7 @@ Deprecation Notices
 * cmdline: ``cmdline`` structure will be made opaque to hide platform-specific
   content. On Linux and FreeBSD, supported prior to DPDK 20.11,
   original structure will be kept until DPDK 21.11.
+
+* vhost: ``rte_vdpa_register_device``, ``rte_vdpa_unregister_device``,
+  ``rte_vhost_host_notifier_ctrl`` and ``rte_vdpa_relay_vring_used`` vDPA
+  driver API will be marked as internal in DPDK v21.11.
-- 
2.31.1



[dpdk-dev] [PATCH v2 2/2] vhost: notice Vhost ops struct renaming

2021-07-30 Thread Maxime Coquelin
This patch announces the renaming of struct
vhost_device_ops to rte_vhost_device_ops in DPDK v21.11.

Acked-by: Chenbo Xia 
Signed-off-by: Maxime Coquelin 
---
 doc/guides/rel_notes/deprecation.rst | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst 
b/doc/guides/rel_notes/deprecation.rst
index b34bed61a6..76ebf162bd 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -151,3 +151,6 @@ Deprecation Notices
 * vhost: ``rte_vdpa_register_device``, ``rte_vdpa_unregister_device``,
   ``rte_vhost_host_notifier_ctrl`` and ``rte_vdpa_relay_vring_used`` vDPA
   driver API will be marked as internal in DPDK v21.11.
+
+* vhost: rename ``struct vhost_device_ops`` to ``struct rte_vhost_device_ops``
+  int DPDK v21.11.
-- 
2.31.1



[dpdk-dev] [PATCH] vhost: announce experimental tag removal of vhost APIs

2021-07-30 Thread Chenbo Xia
This patch announces the experimental tag removal of 10 vhost APIs,
which have been experimental for more than 2 years. All APIs could
be made stable in DPDK 21.11.

Signed-off-by: Chenbo Xia 
Acked-by: Maxime Coquelin 
---
 doc/guides/rel_notes/deprecation.rst | 8 
 1 file changed, 8 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst 
b/doc/guides/rel_notes/deprecation.rst
index 9584d6bfd7..f97a9d0058 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -147,3 +147,11 @@ Deprecation Notices
 * cmdline: ``cmdline`` structure will be made opaque to hide platform-specific
   content. On Linux and FreeBSD, supported prior to DPDK 20.11,
   original structure will be kept until DPDK 21.11.
+
+* vhost: The experimental tags of ``rte_vhost_driver_get_protocol_features``,
+  ``rte_vhost_driver_get_queue_num``, ``rte_vhost_crypto_create``,
+  ``rte_vhost_crypto_free``, ``rte_vhost_crypto_fetch_requests``,
+  ``rte_vhost_crypto_finalize_requests``, ``rte_vhost_crypto_set_zero_copy``,
+  ``rte_vhost_va_from_guest_pa``, ``rte_vhost_extern_callback_register``,
+  and ``rte_vhost_driver_set_protocol_features`` APIs will be removed and the
+  APIs will be made stable in DPDK 21.11.
\ No newline at end of file
-- 
2.17.1



[dpdk-dev] [PATCH v2] net/ena: enable multi segment in Tx offload flags

2021-07-30 Thread Olivier Matz
From: Ghalem Boudour 

The DPDK ENA driver does not provide multi-segment tx offload capability.
Let's add DEV_TX_OFFLOAD_MULTI_SEGS to ports offload capability by
default, and always set it in dev->data->dev_conf.txmode.offload.

This flag in not listed in doc/guides/nics/features/default.ini, so
ena.ini does not need to be updated.

Fixes: 1173fca25af9 ("ena: add polling-mode driver")
Cc: sta...@dpdk.org

Signed-off-by: Ghalem Boudour 
Signed-off-by: Olivier Matz 
---

v2
* set DEV_TX_OFFLOAD_MULTI_SEGS in dev->data->dev_conf.txmode.offload
* add Fixes and Cc stable

 drivers/net/ena/ena_ethdev.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
index dfe68279fa..b59451034c 100644
--- a/drivers/net/ena/ena_ethdev.c
+++ b/drivers/net/ena/ena_ethdev.c
@@ -1981,6 +1981,7 @@ static int ena_dev_configure(struct rte_eth_dev *dev)
 
if (dev->data->dev_conf.rxmode.mq_mode & ETH_MQ_RX_RSS_FLAG)
dev->data->dev_conf.rxmode.offloads |= DEV_RX_OFFLOAD_RSS_HASH;
+   dev->data->dev_conf.txmode.offloads |= DEV_TX_OFFLOAD_MULTI_SEGS;
 
adapter->tx_selected_offloads = dev->data->dev_conf.txmode.offloads;
adapter->rx_selected_offloads = dev->data->dev_conf.rxmode.offloads;
@@ -2055,6 +2056,7 @@ static int ena_infos_get(struct rte_eth_dev *dev,
DEV_RX_OFFLOAD_TCP_CKSUM;
 
rx_feat |= DEV_RX_OFFLOAD_JUMBO_FRAME;
+   tx_feat |= DEV_TX_OFFLOAD_MULTI_SEGS;
 
/* Inform framework about available features */
dev_info->rx_offload_capa = rx_feat;
-- 
2.29.2



[dpdk-dev] [PATCH v2] vhost: announce experimental tag removal of vhost APIs

2021-07-30 Thread Chenbo Xia
This patch announces the experimental tag removal of 10 vhost APIs,
which have been experimental for more than 2 years. All APIs could
be made stable in DPDK 21.11.

Signed-off-by: Chenbo Xia 
Acked-by: Maxime Coquelin 
---
 doc/guides/rel_notes/deprecation.rst | 8 
 1 file changed, 8 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst 
b/doc/guides/rel_notes/deprecation.rst
index 9584d6bfd7..5d5b7884d7 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -147,3 +147,11 @@ Deprecation Notices
 * cmdline: ``cmdline`` structure will be made opaque to hide platform-specific
   content. On Linux and FreeBSD, supported prior to DPDK 20.11,
   original structure will be kept until DPDK 21.11.
+
+* vhost: The experimental tags of ``rte_vhost_driver_get_protocol_features``,
+  ``rte_vhost_driver_get_queue_num``, ``rte_vhost_crypto_create``,
+  ``rte_vhost_crypto_free``, ``rte_vhost_crypto_fetch_requests``,
+  ``rte_vhost_crypto_finalize_requests``, ``rte_vhost_crypto_set_zero_copy``,
+  ``rte_vhost_va_from_guest_pa``, ``rte_vhost_extern_callback_register``,
+  and ``rte_vhost_driver_set_protocol_features`` APIs will be removed and the
+  APIs will be made stable in DPDK 21.11.
-- 
2.17.1



Re: [dpdk-dev] [PATCH v4] app/testpmd: fix TX checksum calculation for tunnel

2021-07-30 Thread Olivier Matz
On Thu, Jul 29, 2021 at 08:01:41PM +0300, Gregory Etelson wrote:
> csumonly engine calculates TX checksum of a tunnelled packet for outer
> headers only or separately for outer and inner headers. The
> calculation method is determined by checksum configuration options.
> If TX checksum calculation is separated, the inner headers are
> processed before outer headers.
> 
> Inner headers processing sets checksum values to 0 unconditionally.
> If TX configuration offloads inner checksums only, outer checksum
> calculation in software will read 0 instead of real values and
> produce wrong result.
> 
> The patch zeroes inner checksums only before software calculation.
> 
> Fixes: 6b520d54ebfe ("app/testpmd: use Tx preparation in checksum engine")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Gregory Etelson 

Acked-by: Olivier Matz 


[dpdk-dev] 0/6] support oops handling

2021-07-30 Thread jerinj
From: Jerin Jacob 

It is handy to get detailed OOPS information like Linux kernel
when DPDK application crashes without losing any of the features
provided by coredump infrastructure by the OS.

This patch series introduces the APIs to handle OOPS in DPDK.

Following section details the implementation and API interface to application.

On rte_eal_init() invocation, the EAL library installs the oops handler for
the essential signals. The rte_oops_signals_enabled() API provides the list
of signals the library installed by the EAL.

The default EAL oops handler decodes the oops message using rte_oops_decode()
and then calls the signal handler installed by the application 
before invoking the rte_eal_init(). This scheme will also enable the use of
the default coredump handler(for gdb etc.) provided by OS 
if the application does not install any specific signal handler. 

The second case where the application installs the signal handler after 
the rte_eal_init() invocation, rte_oops_decode() provides the means of
decoding the oops message in the application's fault handler.


Patch split:

Patch 1/6: defines the API and stub implementation for Unix systems
Patch 2/6: The API implementation
Patch 3/6: add an optional libunwind dependency to DPDK for better backtrace in 
oops.
Patch 4/6: x86 specific archinfo like x86 register dump on oops
Patch 5/6: arm64 specific archinfo like arm64 register dump on oops
Patch 6/6: UT for the new APIs


Example command for the build, run, and output logs of an x86-64 linux machine.
  

meson --buildtype debug build
ninja -C build

echo "oops_autotest" | ./build/app/test/dpdk-test --no-huge  -c 0x2

Signal info:

PID:   2439496
Signal number: 11
Fault address: 0x5

Backtrace:
--
[  0x55e8b56d5cee]: test_oops_generate()+0x75
[  0x55e8b5459843]: unit_test_suite_runner()+0x1aa
[  0x55e8b56d605c]: test_oops()+0x13
[  0x55e8b544bdfc]: cmd_autotest_parsed()+0x55
[  0x55e8b6063a0d]: cmdline_parse()+0x319
[  0x55e8b6061dea]: cmdline_valid_buffer()+0x35
[  0x55e8b6066bd8]: rdline_char_in()+0xc48
[  0x55e8b606221c]: cmdline_in()+0x62
[  0x55e8b6062495]: cmdline_interact()+0x56
[  0x55e8b5459314]: main()+0x65e
[  0x7f54b25d2b25]: __libc_start_main()+0xd5
[  0x55e8b544bc9e]: _start()+0x2e

Arch info:
--
R8 : 0x  R9 : 0x
R10: 0x7f54b25b8b48  R11: 0x7f54b25e7930
R12: 0x7fffc695e610  R13: 0x
R14: 0x  R15: 0x
RAX: 0x0005  RBX: 0x0001
RCX: 0x7f54b278a943  RDX: 0x3769043bf13a2594
RBP: 0x7fffc6958340  RSP: 0x7fffc6958330
RSI: 0x  RDI: 0x55e8c4c1e380
RIP: 0x55e8b56d5cee  EFL: 0x00010246

Stack dump:
--
0x7fffc6958330: 0x600
0x7fffc6958334: 0x0
0x7fffc6958338: 0x30cfeac5
0x7fffc695833c: 0x0
0x7fffc6958340: 0xe08395c6
0x7fffc6958344: 0xff7f
0x7fffc6958348: 0x439845b5
0x7fffc695834c: 0xe855
0x7fffc6958350: 0x0
0x7fffc6958354: 0xb00
0x7fffc6958358: 0x20445bb9
0x7fffc695835c: 0xe855
0x7fffc6958360: 0x925506b6
0x7fffc6958364: 0x0
0x7fffc6958368: 0x0
0x7fffc695836c: 0x0

Code dump:
--
0x55e8b56d5cee: 0xc700
0x55e8b56d5cf2: 0xeb12
0x55e8b56d5cf6: 0xfb6054b
0x55e8b56d5cfa: 0x87540f84
0x55e8b56d5cfe: 0xc07407b8
0x55e8b56d5d02: 0x0
0x55e8b56d5d06: 0xeb05b8ff
0x55e8b56d5d0a: 0xffc9
0x55e8b56d5d0e: 0xc3554889
0x55e8b56d5d12: 0xe54881ec
0x55e8b56d5d16: 0xc000
0x55e8b56d5d1a: 0x89bd4cff
0x55e8b56d5d1e: 0x4889
0x55e8b56d5d22: 0xb540

Jerin Jacob (6):
  eal: introduce oops handling API
  eal: oops handling API implementation
  eal: support libunwind based backtrace
  eal/x86: support register dump for oops
  eal/arm64: support register dump for oops
  test/oops: support unit test case for oops handling APIs

 .github/workflows/build.yml  |   2 +-
 .travis.yml  |   2 +-
 app/test/meson.build |   2 +
 app/test/test_oops.c | 121 ++
 config/meson.build   |   8 +
 doc/api/doxy-api-index.md|   3 +-
 lib/eal/common/eal_private.h |   3 +
 lib/eal/freebsd/eal.c|   6 +
 lib/eal/include/meson.build  |   1 +
 lib/eal/include/rte_oops.h   | 100 
 lib/eal/linux/eal.c  |   6 +
 lib/eal/unix/eal_oops.c  | 297 +++
 lib/eal/unix/meson.build |   1 +
 lib/eal/version.map  |   4 +
 14 files changed, 553 insertions(+), 3 deletions(-)
 create mode 100644 app/test/test_oops.c
 create mode 100644 lib/eal/include/rte_oops.h
 create mode 100644 lib/eal/unix/eal_oops.c

-- 
2.32.0



[dpdk-dev] 1/6] eal: introduce oops handling API

2021-07-30 Thread jerinj
From: Jerin Jacob 

Introducing oops handling API with following specification
and enable stub implementation for Linux and FreeBSD.

On rte_eal_init() invocation, the EAL library installs the
oops handler for the essential signals.
The rte_oops_signals_enabled() API provides the list
of signals the library installed by the EAL.

The default EAL oops handler decodes the oops message using
rte_oops_decode() and then calls the signal handler
installed by the application before invoking the rte_eal_init().
This scheme will also enable the use of the default coredump
handler(for gdb etc.) provided by OS if the application does
not install any specific signal handler.

The second case where the application installs the signal
handler after the rte_eal_init() invocation, rte_oops_decode()
provides the means of decoding the oops message in
the application's fault handler.

Signed-off-by: Jerin Jacob 
---
 doc/api/doxy-api-index.md|   3 +-
 lib/eal/common/eal_private.h |   3 ++
 lib/eal/freebsd/eal.c|   6 +++
 lib/eal/include/meson.build  |   1 +
 lib/eal/include/rte_oops.h   | 100 +++
 lib/eal/linux/eal.c  |   6 +++
 lib/eal/unix/eal_oops.c  |  36 +
 lib/eal/unix/meson.build |   1 +
 lib/eal/version.map  |   4 ++
 9 files changed, 159 insertions(+), 1 deletion(-)
 create mode 100644 lib/eal/include/rte_oops.h
 create mode 100644 lib/eal/unix/eal_oops.c

diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 1992107a03..0d0da35205 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -215,7 +215,8 @@ The public API headers are grouped by topics:
   [log](@ref rte_log.h),
   [errno]  (@ref rte_errno.h),
   [trace]  (@ref rte_trace.h),
-  [trace_point](@ref rte_trace_point.h)
+  [trace_point](@ref rte_trace_point.h),
+  [oops]   (@ref rte_oops.h)
 
 - **misc**:
   [EAL config] (@ref rte_eal.h),
diff --git a/lib/eal/common/eal_private.h b/lib/eal/common/eal_private.h
index 64cf4e81c8..c3a490d803 100644
--- a/lib/eal/common/eal_private.h
+++ b/lib/eal/common/eal_private.h
@@ -716,6 +716,9 @@ void __rte_thread_init(unsigned int lcore_id, rte_cpuset_t 
*cpuset);
  */
 void __rte_thread_uninit(void);
 
+int eal_oops_init(void);
+void eal_oops_fini(void);
+
 /**
  * asprintf(3) replacement for Windows.
  */
diff --git a/lib/eal/freebsd/eal.c b/lib/eal/freebsd/eal.c
index 6cee5ae369..3c098708c6 100644
--- a/lib/eal/freebsd/eal.c
+++ b/lib/eal/freebsd/eal.c
@@ -692,6 +692,11 @@ rte_eal_init(int argc, char **argv)
return -1;
}
 
+   if (eal_oops_init()) {
+   rte_eal_init_alert("oops init failed.");
+   rte_errno = ENOENT;
+   }
+
thread_id = pthread_self();
 
eal_reset_internal_config(internal_conf);
@@ -974,6 +979,7 @@ rte_eal_cleanup(void)
rte_trace_save();
eal_trace_fini();
eal_cleanup_config(internal_conf);
+   eal_oops_fini();
return 0;
 }
 
diff --git a/lib/eal/include/meson.build b/lib/eal/include/meson.build
index 88a9eba12f..6c74bdb7b5 100644
--- a/lib/eal/include/meson.build
+++ b/lib/eal/include/meson.build
@@ -30,6 +30,7 @@ headers += files(
 'rte_malloc.h',
 'rte_memory.h',
 'rte_memzone.h',
+'rte_oops.h',
 'rte_pci_dev_feature_defs.h',
 'rte_pci_dev_features.h',
 'rte_per_lcore.h',
diff --git a/lib/eal/include/rte_oops.h b/lib/eal/include/rte_oops.h
new file mode 100644
index 00..ff82c409ec
--- /dev/null
+++ b/lib/eal/include/rte_oops.h
@@ -0,0 +1,100 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2020 Marvell.
+ */
+
+#ifndef _RTE_OOPS_H_
+#define _RTE_OOPS_H_
+
+#include 
+#include 
+#include 
+
+/**
+ * @file
+ *
+ * RTE oops API
+ *
+ * This file provides the oops handling APIs to RTE applications.
+ *
+ * On rte_eal_init() invocation, the EAL library installs the oops handler for
+ * the essential signals. The rte_oops_signals_enabled() API provides the list
+ * of signals the library installed by the EAL.
+ *
+ * The default EAL oops handler decodes the oops message using 
rte_oops_decode()
+ * and then calls the signal handler installed by the application before
+ * invoking the rte_eal_init(). This scheme will also enable the use of
+ * the default coredump handler(for gdb etc.) provided by OS if the application
+ * does not install any specific signal handler.
+ *
+ * The second case where the application installs the signal handler after
+ * the rte_eal_init() invocation, rte_oops_decode() provides the means of
+ * decoding the oops message in the application's fault handler.
+ *
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Maximum number of oops signals enabled in EAL.
+ * @see rte_oops_signals_enabled()
+ */
+#define RTE_OOPS_SIGNALS_MA

[dpdk-dev] 2/6] eal: oops handling API implementation

2021-07-30 Thread jerinj
From: Jerin Jacob 

Implement the base oops handling APIs.

Signed-off-by: Jerin Jacob 
---
 lib/eal/unix/eal_oops.c | 175 ++--
 1 file changed, 168 insertions(+), 7 deletions(-)

diff --git a/lib/eal/unix/eal_oops.c b/lib/eal/unix/eal_oops.c
index 53b580f733..1120c8ad8c 100644
--- a/lib/eal/unix/eal_oops.c
+++ b/lib/eal/unix/eal_oops.c
@@ -2,35 +2,196 @@
  * Copyright(C) 2021 Marvell.
  */
 
+#include 
+#include 
+#include 
+#include 
 
+#include 
+#include 
 #include 
 
 #include "eal_private.h"
 
-void
-rte_oops_decode(int sig, siginfo_t *info, ucontext_t *uc)
+#define oops_print(...) rte_log(RTE_LOG_ERR, RTE_LOGTYPE_EAL, __VA_ARGS__)
+
+static int oops_signals[] = {SIGSEGV, SIGBUS, SIGILL, SIGABRT, SIGFPE, SIGSYS};
+
+struct oops_signal {
+   int sig;
+   bool enabled;
+   struct sigaction sa;
+};
+
+static struct oops_signal signals_db[RTE_DIM(oops_signals)];
+
+static void
+back_trace_dump(ucontext_t *context)
+{
+   RTE_SET_USED(context);
+
+   rte_dump_stack();
+}
+static void
+siginfo_dump(int sig, siginfo_t *info)
+{
+   oops_print("PID:   %" PRIdMAX "\n", (intmax_t)getpid());
+
+   if (info == NULL)
+   return;
+   if (sig != info->si_signo)
+   oops_print("Invalid signal info\n");
+
+   oops_print("Signal number: %d\n", info->si_signo);
+   oops_print("Fault address: %p\n", info->si_addr);
+}
+
+static void
+mem32_dump(void *ptr)
+{
+   uint32_t *p = ptr;
+   int i;
+
+   for (i = 0; i < 16; i++)
+   oops_print("%p: 0x%x\n", p + i, rte_be_to_cpu_32(p[i]));
+}
+
+static void
+stack_dump_header(void)
+{
+   oops_print("Stack dump:\n");
+   oops_print("--\n");
+}
+
+static void
+code_dump_header(void)
+{
+   oops_print("Code dump:\n");
+   oops_print("--\n");
+}
+
+static void
+stack_code_dump(void *stack, void *code)
+{
+   if (stack == NULL || code == NULL)
+   return;
+
+   oops_print("\n");
+   stack_dump_header();
+   mem32_dump(stack);
+   oops_print("\n");
+
+   code_dump_header();
+   mem32_dump(code);
+   oops_print("\n");
+}
+static void
+archinfo_dump(ucontext_t *uc)
 {
-   RTE_SET_USED(sig);
-   RTE_SET_USED(info);
RTE_SET_USED(uc);
 
+   stack_code_dump(NULL, NULL);
+}
+
+static void
+default_signal_handler_invoke(int sig)
+{
+   unsigned int idx;
+
+   for (idx = 0; idx < RTE_DIM(oops_signals); idx++) {
+   /* Skip disabled signals */
+   if (signals_db[idx].sig != sig)
+   continue;
+   if (!signals_db[idx].enabled)
+   continue;
+   /* Replace with stored handler */
+   sigaction(sig, &signals_db[idx].sa, NULL);
+   kill(getpid(), sig);
+   }
+}
+
+void
+rte_oops_decode(int sig, siginfo_t *info, ucontext_t *uc)
+{
+   oops_print("Signal info:\n");
+   oops_print("\n");
+   siginfo_dump(sig, info);
+   oops_print("\n");
+
+   oops_print("Backtrace:\n");
+   oops_print("--\n");
+   back_trace_dump(uc);
+   oops_print("\n");
+
+   oops_print("Arch info:\n");
+   oops_print("--\n");
+   if (uc)
+   archinfo_dump(uc);
+}
+
+static void
+eal_oops_handler(int sig, siginfo_t *info, void *ctx)
+{
+   ucontext_t *uc = ctx;
+
+   rte_oops_decode(sig, info, uc);
+   default_signal_handler_invoke(sig);
 }
 
 int
 rte_oops_signals_enabled(int *signals)
 {
-   RTE_SET_USED(signals);
+   int count = 0, sig[RTE_OOPS_SIGNALS_MAX];
+   unsigned int idx = 0;
 
-   return 0;
+   for (idx = 0; idx < RTE_DIM(oops_signals); idx++) {
+   if (signals_db[idx].enabled) {
+   sig[count] = signals_db[idx].sig;
+   count++;
+   }
+   }
+   if (signals)
+   memcpy(signals, sig, sizeof(*signals) * count);
+
+   return count;
 }
 
 int
 eal_oops_init(void)
 {
-   return 0;
+   unsigned int idx, rc = 0;
+   struct sigaction sa;
+
+   RTE_BUILD_BUG_ON(RTE_DIM(oops_signals) > RTE_OOPS_SIGNALS_MAX);
+
+   sigemptyset(&sa.sa_mask);
+   sa.sa_sigaction = &eal_oops_handler;
+   sa.sa_flags = SA_RESTART | SA_SIGINFO | SA_ONSTACK;
+
+   for (idx = 0; idx < RTE_DIM(oops_signals); idx++) {
+   signals_db[idx].sig = oops_signals[idx];
+   /* Get exiting sigaction */
+   rc = sigaction(signals_db[idx].sig, NULL, &signals_db[idx].sa);
+   if (rc)
+   continue;
+   /* Replace with oops handler */
+   rc = sigaction(signals_db[idx].sig, &sa, NULL);
+   if (rc)
+   continue;
+   signals_db[idx].enabled = true;
+   }
+   return rc;
 }
 
 void
 eal_oops_fini(void)
 {
+   unsigned int idx;
+
+   for (idx = 0;

[dpdk-dev] 3/6] eal: support libunwind based backtrace

2021-07-30 Thread jerinj
From: Jerin Jacob 

adding optional libwind library dependency to DPDK for
enhanced backtrace based on ucontext.

Signed-off-by: Jerin Jacob 
---
 .github/workflows/build.yml |  2 +-
 .travis.yml |  2 +-
 config/meson.build  |  8 +++
 lib/eal/unix/eal_oops.c | 47 +
 4 files changed, 57 insertions(+), 2 deletions(-)

diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
index 7dac20ddeb..caaca207a6 100644
--- a/.github/workflows/build.yml
+++ b/.github/workflows/build.yml
@@ -93,7 +93,7 @@ jobs:
   run: sudo apt install -y ccache libnuma-dev python3-setuptools
 python3-wheel python3-pip python3-pyelftools ninja-build libbsd-dev
 libpcap-dev libibverbs-dev libcrypto++-dev libfdt-dev libjansson-dev
-libarchive-dev
+libarchive-dev libunwind-dev
 - name: Install libabigail build dependencies if no cache is available
   if: env.ABI_CHECKS == 'true' && steps.libabigail-cache.outputs.cache-hit 
!= 'true'
   run: sudo apt install -y autoconf automake libtool pkg-config libxml2-dev
diff --git a/.travis.yml b/.travis.yml
index 23067d9e3c..e72b156014 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -16,7 +16,7 @@ addons:
 packages: &required_packages
   - [libnuma-dev, python3-setuptools, python3-wheel, python3-pip, 
python3-pyelftools, ninja-build]
   - [libbsd-dev, libpcap-dev, libibverbs-dev, libcrypto++-dev, libfdt-dev, 
libjansson-dev]
-  - [libarchive-dev]
+  - [libarchive-dev, libunwind-dev]
 
 _aarch64_packages: &aarch64_packages
   - *required_packages
diff --git a/config/meson.build b/config/meson.build
index e80421003b..26a85dab6b 100644
--- a/config/meson.build
+++ b/config/meson.build
@@ -236,6 +236,14 @@ if cc.get_id() == 'clang' and dpdk_conf.get('RTE_ARCH_64') 
== false
 dpdk_extra_ldflags += '-latomic'
 endif
 
+# check for libunwind
+unwind_dep = dependency('libunwind', required: false, method: 'pkg-config')
+if unwind_dep.found() and cc.has_header('libunwind.h', dependencies: 
unwind_dep)
+dpdk_conf.set('RTE_USE_LIBUNWIND', 1)
+add_project_link_arguments('-lunwind', language: 'c')
+dpdk_extra_ldflags += '-lunwind'
+endif
+
 # add -include rte_config to cflags
 add_project_arguments('-include', 'rte_config.h', language: 'c')
 
diff --git a/lib/eal/unix/eal_oops.c b/lib/eal/unix/eal_oops.c
index 1120c8ad8c..118b236f35 100644
--- a/lib/eal/unix/eal_oops.c
+++ b/lib/eal/unix/eal_oops.c
@@ -25,6 +25,50 @@ struct oops_signal {
 
 static struct oops_signal signals_db[RTE_DIM(oops_signals)];
 
+#if defined(RTE_USE_LIBUNWIND)
+
+#define BACKTRACE_DEPTH 256
+#define UNW_LOCAL_ONLY
+#include 
+
+static void
+back_trace_dump(ucontext_t *context)
+{
+   unw_cursor_t cursor;
+   unw_word_t ip, off;
+   int rc, level = 0;
+   char name[256];
+
+   if (context == NULL) {
+   rte_dump_stack();
+   return;
+   }
+
+   rc = unw_init_local(&cursor, (unw_context_t *)context);
+   if (rc < 0)
+   goto fail;
+
+   for (;;) {
+   rc = unw_get_reg(&cursor, UNW_REG_IP, &ip);
+   if (rc < 0)
+   goto fail;
+   rc = unw_get_proc_name(&cursor, name, sizeof(name), &off);
+   if (rc == 0)
+   oops_print("[%16p]: %s()+0x%" PRIx64 "\n", (void *)ip,
+  name, (uint64_t)off);
+   else
+   oops_print("[%16p]: \n", (void *)ip);
+   rc = unw_step(&cursor);
+   if (rc <= 0 || ++level >= BACKTRACE_DEPTH)
+   break;
+   }
+   return;
+fail:
+   oops_print("libunwind call failed %s\n", unw_strerror(rc));
+}
+
+#else
+
 static void
 back_trace_dump(ucontext_t *context)
 {
@@ -32,6 +76,9 @@ back_trace_dump(ucontext_t *context)
 
rte_dump_stack();
 }
+
+#endif
+
 static void
 siginfo_dump(int sig, siginfo_t *info)
 {
-- 
2.32.0



[dpdk-dev] 4/6] eal/x86: support register dump for oops

2021-07-30 Thread jerinj
From: Jerin Jacob 

Dump the x86 arch state register in oops
handling routine.

Signed-off-by: Jerin Jacob 
---
 lib/eal/unix/eal_oops.c | 34 ++
 1 file changed, 34 insertions(+)

diff --git a/lib/eal/unix/eal_oops.c b/lib/eal/unix/eal_oops.c
index 118b236f35..da71481ade 100644
--- a/lib/eal/unix/eal_oops.c
+++ b/lib/eal/unix/eal_oops.c
@@ -132,6 +132,38 @@ stack_code_dump(void *stack, void *code)
mem32_dump(code);
oops_print("\n");
 }
+
+#if defined(RTE_ARCH_X86_64) && defined(RTE_EXEC_ENV_LINUX)
+static void
+archinfo_dump(ucontext_t *uc)
+{
+
+   mcontext_t *mc = &uc->uc_mcontext;
+
+   oops_print("R8 : 0x%.16llx  ", mc->gregs[REG_R8]);
+   oops_print("R9 : 0x%.16llx\n", mc->gregs[REG_R9]);
+   oops_print("R10: 0x%.16llx  ", mc->gregs[REG_R10]);
+   oops_print("R11: 0x%.16llx\n", mc->gregs[REG_R11]);
+   oops_print("R12: 0x%.16llx  ", mc->gregs[REG_R12]);
+   oops_print("R13: 0x%.16llx\n", mc->gregs[REG_R13]);
+   oops_print("R14: 0x%.16llx  ", mc->gregs[REG_R14]);
+   oops_print("R15: 0x%.16llx\n", mc->gregs[REG_R15]);
+   oops_print("RAX: 0x%.16llx  ", mc->gregs[REG_RAX]);
+   oops_print("RBX: 0x%.16llx\n", mc->gregs[REG_RBX]);
+   oops_print("RCX: 0x%.16llx  ", mc->gregs[REG_RCX]);
+   oops_print("RDX: 0x%.16llx\n", mc->gregs[REG_RDX]);
+   oops_print("RBP: 0x%.16llx  ", mc->gregs[REG_RBP]);
+   oops_print("RSP: 0x%.16llx\n", mc->gregs[REG_RSP]);
+   oops_print("RSI: 0x%.16llx  ", mc->gregs[REG_RSI]);
+   oops_print("RDI: 0x%.16llx\n", mc->gregs[REG_RDI]);
+   oops_print("RIP: 0x%.16llx  ", mc->gregs[REG_RIP]);
+   oops_print("EFL: 0x%.16llx\n", mc->gregs[REG_EFL]);
+
+   stack_code_dump((void *)mc->gregs[REG_RSP], (void *)mc->gregs[REG_RIP]);
+}
+
+#else
+
 static void
 archinfo_dump(ucontext_t *uc)
 {
@@ -140,6 +172,8 @@ archinfo_dump(ucontext_t *uc)
stack_code_dump(NULL, NULL);
 }
 
+#endif
+
 static void
 default_signal_handler_invoke(int sig)
 {
-- 
2.32.0



[dpdk-dev] 5/6] eal/arm64: support register dump for oops

2021-07-30 Thread jerinj
From: Jerin Jacob 

Dump the arm64 arch state register in oops
handling routine.

Signed-off-by: Jerin Jacob 
---
 lib/eal/unix/eal_oops.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/lib/eal/unix/eal_oops.c b/lib/eal/unix/eal_oops.c
index da71481ade..7469610d96 100644
--- a/lib/eal/unix/eal_oops.c
+++ b/lib/eal/unix/eal_oops.c
@@ -162,6 +162,25 @@ archinfo_dump(ucontext_t *uc)
stack_code_dump((void *)mc->gregs[REG_RSP], (void *)mc->gregs[REG_RIP]);
 }
 
+#elif defined(RTE_ARCH_ARM64) && defined(RTE_EXEC_ENV_LINUX)
+
+static void
+archinfo_dump(ucontext_t *uc)
+{
+   mcontext_t *mc = &uc->uc_mcontext;
+   int i;
+
+   oops_print("PC : 0x%.16llx", mc->pc);
+   oops_print("SP : 0x%.16llx\n", mc->sp);
+   for (i = 0; i < 31; i++)
+   oops_print("X%.2d: 0x%.16llx%s", i, mc->regs[i],
+  i & 0x1 ? "\n" : " ");
+
+   oops_print("PSTATE: 0x%.16llx\n", mc->pstate);
+
+   stack_code_dump((void *)mc->sp, (void *)mc->pc);
+}
+
 #else
 
 static void
-- 
2.32.0



[dpdk-dev] 6/6] test/oops: support unit test case for oops handling APIs

2021-07-30 Thread jerinj
From: Jerin Jacob 

Added unit test cases for all the oops handling APIs.

Signed-off-by: Jerin Jacob 
---
 app/test/meson.build |   2 +
 app/test/test_oops.c | 121 +++
 2 files changed, 123 insertions(+)
 create mode 100644 app/test/test_oops.c

diff --git a/app/test/meson.build b/app/test/meson.build
index a7611686ad..1e471ab351 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -97,6 +97,7 @@ test_sources = files(
 'test_metrics.c',
 'test_mcslock.c',
 'test_mp_secondary.c',
+'test_oops.c',
 'test_per_lcore.c',
 'test_pflock.c',
 'test_pmd_perf.c',
@@ -236,6 +237,7 @@ fast_tests = [
 ['memzone_autotest', false],
 ['meter_autotest', true],
 ['multiprocess_autotest', false],
+['oops_autotest', true],
 ['per_lcore_autotest', true],
 ['pflock_autotest', true],
 ['prefetch_autotest', true],
diff --git a/app/test/test_oops.c b/app/test/test_oops.c
new file mode 100644
index 00..60a7f259c7
--- /dev/null
+++ b/app/test/test_oops.c
@@ -0,0 +1,121 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2021 Marvell
+ */
+
+#include 
+#include 
+
+#include 
+#include 
+
+#include "test.h"
+
+static jmp_buf pc;
+static bool detected_segfault;
+
+static void
+segv_handler(int sig, siginfo_t *info, void *ctx)
+{
+   detected_segfault = true;
+   rte_oops_decode(sig, info, (ucontext_t *)ctx);
+   longjmp(pc, 1);
+}
+
+/* OS specific way install the signal segfault handler*/
+static int
+segv_handler_install(void)
+{
+   struct sigaction sa;
+
+   sigemptyset(&sa.sa_mask);
+   sa.sa_sigaction = &segv_handler;
+   sa.sa_flags = SA_SIGINFO;
+
+   return sigaction(SIGSEGV, &sa, NULL);
+}
+
+static int
+test_oops_generate(void)
+{
+   int rc;
+
+   rc = segv_handler_install();
+   TEST_ASSERT_EQUAL(rc, 0, "rc=%d\n", rc);
+
+   detected_segfault = false;
+   rc = setjmp(pc); /* Save the execution state */
+   if (rc == 0) {
+   /* Generate a segfault */
+   *(volatile int *)0x05 = 0;
+   } else { /* logjump from segv_handler */
+   if (detected_segfault)
+   return TEST_SUCCESS;
+
+   }
+   return TEST_FAILED;
+}
+
+static int
+test_signal_handler_installed(int count, int *signals)
+{
+   int i, rc, verified = 0;
+   struct sigaction sa;
+
+   for (i = 0; i < count; i++) {
+   rc = sigaction(signals[i], NULL, &sa);
+   if (rc) {
+   printf("Failed to get sigaction for %d", signals[i]);
+   continue;
+   }
+   if (sa.sa_handler != SIG_DFL)
+   verified++;
+   }
+   TEST_ASSERT_EQUAL(count, verified, "count=%d verified=%d\n", count,
+ verified);
+   return TEST_SUCCESS;
+}
+
+static int
+test_oops_signals_enabled(void)
+{
+   int *signals = NULL;
+   int i, rc;
+
+   rc = rte_oops_signals_enabled(signals);
+   TEST_ASSERT_NOT_EQUAL(rc, 0, "rc=%d\n", rc);
+
+   signals = malloc(sizeof(int) * rc);
+   rc = rte_oops_signals_enabled(signals);
+   TEST_ASSERT_NOT_EQUAL(rc, 0, "rc=%d\n", rc);
+   free(signals);
+
+   signals = malloc(sizeof(int) * RTE_OOPS_SIGNALS_MAX);
+   rc = rte_oops_signals_enabled(signals);
+   TEST_ASSERT_NOT_EQUAL(rc, 0, "rc=%d\n", rc);
+
+   for (i = 0; i < rc; i++)
+   TEST_ASSERT_NOT_EQUAL(signals[i], 0, "idx=%d val=%d\n", i,
+ signals[i]);
+
+   rc = test_signal_handler_installed(rc, signals);
+   free(signals);
+
+   return rc;
+}
+
+static struct unit_test_suite oops_tests = {
+   .suite_name = "oops autotest",
+   .setup = NULL,
+   .teardown = NULL,
+   .unit_test_cases = {
+   TEST_CASE(test_oops_signals_enabled),
+   TEST_CASE(test_oops_generate),
+   TEST_CASES_END()}};
+
+static int
+test_oops(void)
+{
+   return unit_test_suite_runner(&oops_tests);
+}
+
+REGISTER_TEST_COMMAND(oops_autotest, test_oops);
-- 
2.32.0



Re: [dpdk-dev] [dpdk-announce] release candidate 21.08-rc2

2021-07-30 Thread Jiang, YuX
Hi All,
Update the test status for Intel part. Till now dpdk21.08-rc2 test is finished. 
No critical issue is found.
# Basic Intel(R) NIC testing
* Build or compile:
*Build: cover the build test combination with latest GCC/Clang/ICC 
version and the popular OS revision such as Ubuntu20.04, Fedora34, etc.
- All tests are done. All passed.
*Compile: cover the CFLAGES(O0/O1/O2/O3) with popular OS such as 
Ubuntu20.04 and Fedora34.
- All tests are done. All passed.
*PF(i40e, ixgbe): test scenarios including 
RTE_FLOW/TSO/Jumboframe/checksum offload/VLAN/VXLAN, etc.
- All tests are done. No new issue is found.
*VF(i40e, ixgbe): test scenarios including 
VF-RTE_FLOW/TSO/Jumboframe/checksum offload/VLAN/VXLAN, etc.
- All tests are done. One new issue is found.
- One new issue about vm_hotplug(vf port can be found after 
executing "device_del dev1") is found in rc2, Intel dev has provied patch to 
fix.
Patch link: 
https://patches.dpdk.org/project/dpdk/patch/20210728134848.353258-1-paulis.grib...@intel.com/
*PF/VF(ice): test scenarios including Switch features/Package 
Management/Flow Director/Advanced Tx/Advanced RSS/ACL/DCF/Share code 
update/Flexible Descriptor, etc.
- All tests are done.
- Three new issues about cvl_iavf_rss_configure/CVL_Qos are 
found. Others are known issues. Intel Dev. are working on them.
*Intel NIC single core/NIC performance: test scenarios including PF/VF 
single core performance test, RFC2544 Zero packet loss performance test, etc.
- All tests are done. No big performance drop.
*Power and IPsec:
* Power: test scenarios including bi-direction/Telemetry/Empty 
Poll Lib/Priority Base Frequency, etc.
- All tests are done. All passed.
* IPsec: test scenarios including ipsec/ipsec-gw/ipsec library 
basic test - QAT&SW/FIB library, etc.
- All tests are done. All passed.
# Basic cryptodev and virtio testing
*Virtio: both function and performance test are covered. Such as 
PVP/Virtio_loopback/virtio-user loopback/virtio-net VM2VM perf testing/VMAWARE 
ESXI 7.0u2, etc.
- All tests are done. Three new issues are found except known 
issue.
1),Split ring pvp can't receive imix pkts when using vswitch app
2),virtio memory access out of bounds in 
virtio_check_scatter_on_all_rx_queues
3),vswitch_sample_cbdma: forward 8k packet failed when relaunch 
dpdk-vhost
*Cryptodev:
*Function test: test scenarios including Cryptodev API 
testing/CompressDev ISA-L/QAT/ZLIB PMD Testing/FIPS, etc.
- All tests are done. No new issue is found except 
known issue.
*Performance test: test scenarios including Thoughput 
Performance /Cryptodev Latency, etc.
- All tests are done. No big performance drop.
Best regards,
Yu Jiang

> -Original Message-
> From: Jiang, YuX
> Sent: Wednesday, July 28, 2021 7:06 PM
> To: Thomas Monjalon ; dev (dev@dpdk.org)
> 
> Cc: Devlin, Michelle ; Mcnamara, John
> ; Yigit, Ferruh ; Yu,
> PingX 
> Subject: RE: [dpdk-dev] [dpdk-announce] release candidate 21.08-rc2
>
> Hi All,
> Update the test status for Intel part. Till now dpdk21.08-rc2 test execution
> rate is 90%. No critical issue is found.
> # Basic Intel(R) NIC testing
> * Build or compile:
>   *Build: cover the build test combination with latest GCC/Clang/ICC
> version and the popular OS revision such as Ubuntu20.04, Fedora34, etc.
>   - All tests are done. All passed.
>   *Compile: cover the CFLAGES(O0/O1/O2/O3) with popular OS such
> as Ubuntu20.04 and Fedora34.
>   - All tests are done. All passed.
>   * PF(i40e, ixgbe): test scenarios including
> RTE_FLOW/TSO/Jumboframe/checksum offload/VLAN/VXLAN, etc.
>   - All tests are done. No new issue is found.
>   - Fixed two issues in rc2: 1), distributor: Core dumped occurs
> when execute distributor_autotest & 2),
> https://bugs.dpdk.org/show_bug.cgi?id=687
>   * VF(i40e, ixgbe): test scenarios including VF-
> RTE_FLOW/TSO/Jumboframe/checksum offload/VLAN/VXLAN, etc.
>   - All tests are done. No new issue is found.
>   - One new issue about vm_hotplug(vf port can be found
> after executing "device_del dev1") is found in rc2, Intel dev is working on 
> it.
>   * PF/VF(ice): test scenarios including Switch features/Package
> Management/Flow Director/Advanced Tx/Advanced RSS/ACL/DCF/Share
> code update/Flexible Descriptor, etc.
>   - Execution rate is 80%. More than 10 bugs about
> cvl_fdir/iavf_fdir/cvl_advanced_rss/CVL_Qos are fixed in rc2.
>   - One new issue about cvl_iavf_rss_configure is found.
> Others are known issues. Intel Dev

[dpdk-dev] [PATCH] net/i40e: fix clang warning on non-x86

2021-07-30 Thread Ruifeng Wang
Build on aarch64 with clang-10 has warning:
i40e_rxtx.c:3228:1: warning: unused function 'get_avx_supported' 
[-Wunused-function]

The function is used in x86 specific path. Moved it into ifdef
to fix build on non-x86.

Fixes: c30751afc360 ("net/i40e: fix data path selection in secondary process")
Cc: dapengx...@intel.com

Signed-off-by: Ruifeng Wang 
---
 drivers/net/i40e/i40e_rxtx.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index 026cda948c..8329cbdd4e 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -3224,10 +3224,10 @@ i40e_txq_info_get(struct rte_eth_dev *dev, uint16_t 
queue_id,
qinfo->conf.offloads = txq->offloads;
 }
 
+#ifdef RTE_ARCH_X86
 static inline bool
 get_avx_supported(bool request_avx512)
 {
-#ifdef RTE_ARCH_X86
if (request_avx512) {
if (rte_vect_get_max_simd_bitwidth() >= RTE_VECT_SIMD_512 &&
rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512F) == 1 &&
@@ -3251,12 +3251,10 @@ get_avx_supported(bool request_avx512)
return false;
 #endif
}
-#else
-   RTE_SET_USED(request_avx512);
-#endif /* RTE_ARCH_X86 */
 
return false;
 }
+#endif /* RTE_ARCH_X86 */
 
 
 void __rte_cold
-- 
2.25.1



[dpdk-dev] [PATCH 0/2] fixes to bnxt PMD

2021-07-30 Thread Ajit Khaparde
Fixes to bnxt PMD to address compatibility issues with different FW versions.

Jay Ding (1):
  net/bnxt: fix resource qcap list handling

Kishore Padmanabha (1):
  net/bnxt: fix stats counter resource

 drivers/net/bnxt/tf_core/tf_msg.c| 12 ++--
 .../tf_ulp/generic_templates/ulp_template_db_tbl.c   |  4 ++--
 2 files changed, 8 insertions(+), 8 deletions(-)

-- 
2.21.1 (Apple Git-122.3)



[dpdk-dev] [PATCH 2/2] net/bnxt: fix stats counter resource

2021-07-30 Thread Ajit Khaparde
From: Kishore Padmanabha 

The flow counters is reduced from 8192 to 6912 for Whitney
for compatibility with different versions of FW.

Fixes: 6fad9115101c ("net/bnxt: reorganize ULP template directory structure")
Cc: sta...@dpdk.org
Signed-off-by: Kishore Padmanabha 
Reviewed-by: Randy Schacher 
Acked-by: Ajit Khaparde 
---
 .../net/bnxt/tf_ulp/generic_templates/ulp_template_db_tbl.c   | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/bnxt/tf_ulp/generic_templates/ulp_template_db_tbl.c 
b/drivers/net/bnxt/tf_ulp/generic_templates/ulp_template_db_tbl.c
index 6664353764..7951de8a4e 100644
--- a/drivers/net/bnxt/tf_ulp/generic_templates/ulp_template_db_tbl.c
+++ b/drivers/net/bnxt/tf_ulp/generic_templates/ulp_template_db_tbl.c
@@ -1154,7 +1154,7 @@ struct bnxt_ulp_resource_resv_info 
ulp_resource_resv_list[] = {
.direction   = TF_DIR_RX,
.resource_func   = BNXT_ULP_RESOURCE_FUNC_INDEX_TABLE,
.resource_type   = TF_TBL_TYPE_ACT_STATS_64,
-   .count   = 8192
+   .count   = 6912
},
{
.app_id  = 0,
@@ -1298,7 +1298,7 @@ struct bnxt_ulp_resource_resv_info 
ulp_resource_resv_list[] = {
.direction   = TF_DIR_TX,
.resource_func   = BNXT_ULP_RESOURCE_FUNC_INDEX_TABLE,
.resource_type   = TF_TBL_TYPE_ACT_STATS_64,
-   .count   = 8192
+   .count   = 6912
},
{
.app_id  = 0,
-- 
2.21.1 (Apple Git-122.3)



[dpdk-dev] [PATCH 1/2] net/bnxt: fix resource qcap list handling

2021-07-30 Thread Ajit Khaparde
From: Jay Ding 

The size of resource qcap list could be different when FW
and application are not match. Application should be able
to handle it when the FW is older and the size of qcap is
smaller.

This patch is needed for backward compatibility on older
firmware versions.

Fixes: 873661aa641a1 ("net/bnxt: support shared session")
Cc: sta...@dpdk.org
Signed-off-by: Jay Ding 
Reviewed-by: Randy Schacher 
Acked-by: Ajit Khaparde 
---
 drivers/net/bnxt/tf_core/tf_msg.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/bnxt/tf_core/tf_msg.c 
b/drivers/net/bnxt/tf_core/tf_msg.c
index 6717710dbd..e07d9168be 100644
--- a/drivers/net/bnxt/tf_core/tf_msg.c
+++ b/drivers/net/bnxt/tf_core/tf_msg.c
@@ -440,18 +440,18 @@ tf_msg_session_resc_qcaps(struct tf *tfp,
 * Should always get expected number of entries
 */
if (tfp_le_to_cpu_32(resp.size) != size) {
-   TFP_DRV_LOG(ERR,
-   "%s: QCAPS message size error, rc:%s\n",
+   TFP_DRV_LOG(WARNING,
+   "%s: QCAPS message size error, rc:%s, request %d vs 
response %d\n",
tf_dir_2_str(dir),
-   strerror(EINVAL));
-   rc = -EINVAL;
-   goto cleanup;
+   strerror(EINVAL),
+   size,
+   resp.size);
}
 
/* Post process the response */
data = (struct tf_rm_resc_req_entry *)qcaps_buf.va_addr;
 
-   for (i = 0; i < size; i++) {
+   for (i = 0; i < resp.size; i++) {
query[i].type = tfp_le_to_cpu_32(data[i].type);
query[i].min = tfp_le_to_cpu_16(data[i].min);
query[i].max = tfp_le_to_cpu_16(data[i].max);
-- 
2.21.1 (Apple Git-122.3)



Re: [dpdk-dev] [PATCH v2] net/ena: enable multi segment in Tx offload flags

2021-07-30 Thread Michał Krawczyk
pt., 30 lip 2021 o 10:35 Olivier Matz  napisał(a):
>
> From: Ghalem Boudour 
>
> The DPDK ENA driver does not provide multi-segment tx offload capability.
> Let's add DEV_TX_OFFLOAD_MULTI_SEGS to ports offload capability by
> default, and always set it in dev->data->dev_conf.txmode.offload.
>
> This flag in not listed in doc/guides/nics/features/default.ini, so
> ena.ini does not need to be updated.
>
> Fixes: 1173fca25af9 ("ena: add polling-mode driver")
> Cc: sta...@dpdk.org
>
> Signed-off-by: Ghalem Boudour 
> Signed-off-by: Olivier Matz 
Acked-by: Michal Krawczyk 
> ---
>
> v2
> * set DEV_TX_OFFLOAD_MULTI_SEGS in dev->data->dev_conf.txmode.offload
> * add Fixes and Cc stable
>
>  drivers/net/ena/ena_ethdev.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
> index dfe68279fa..b59451034c 100644
> --- a/drivers/net/ena/ena_ethdev.c
> +++ b/drivers/net/ena/ena_ethdev.c
> @@ -1981,6 +1981,7 @@ static int ena_dev_configure(struct rte_eth_dev *dev)
>
> if (dev->data->dev_conf.rxmode.mq_mode & ETH_MQ_RX_RSS_FLAG)
> dev->data->dev_conf.rxmode.offloads |= 
> DEV_RX_OFFLOAD_RSS_HASH;
> +   dev->data->dev_conf.txmode.offloads |= DEV_TX_OFFLOAD_MULTI_SEGS;
>
> adapter->tx_selected_offloads = dev->data->dev_conf.txmode.offloads;
> adapter->rx_selected_offloads = dev->data->dev_conf.rxmode.offloads;
> @@ -2055,6 +2056,7 @@ static int ena_infos_get(struct rte_eth_dev *dev,
> DEV_RX_OFFLOAD_TCP_CKSUM;
>
> rx_feat |= DEV_RX_OFFLOAD_JUMBO_FRAME;
> +   tx_feat |= DEV_TX_OFFLOAD_MULTI_SEGS;
>
> /* Inform framework about available features */
> dev_info->rx_offload_capa = rx_feat;
> --
> 2.29.2
>


Re: [dpdk-dev] [PATCH v2] net/ena: enable multi segment in Tx offload flags

2021-07-30 Thread Thomas Monjalon
30/07/2021 11:37, Michał Krawczyk:
> pt., 30 lip 2021 o 10:35 Olivier Matz  napisał(a):
> >
> > From: Ghalem Boudour 
> >
> > The DPDK ENA driver does not provide multi-segment tx offload capability.
> > Let's add DEV_TX_OFFLOAD_MULTI_SEGS to ports offload capability by
> > default, and always set it in dev->data->dev_conf.txmode.offload.
> >
> > This flag in not listed in doc/guides/nics/features/default.ini, so
> > ena.ini does not need to be updated.
> >
> > Fixes: 1173fca25af9 ("ena: add polling-mode driver")
> > Cc: sta...@dpdk.org
> >
> > Signed-off-by: Ghalem Boudour 
> > Signed-off-by: Olivier Matz 
> Acked-by: Michal Krawczyk 

Applied, thanks.





Re: [dpdk-dev] [PATCH v2] eventdev: fix event port setup in tx adapter

2021-07-30 Thread Jerin Jacob
On Wed, Jul 28, 2021 at 4:27 PM Jayatheerthan, Jay
 wrote:
>
> > -Original Message-
> > From: Jerin Jacob 
> > Sent: Wednesday, July 28, 2021 3:01 PM
> > To: Jayatheerthan, Jay 
> > Cc: Naga Harish K, S V ; dev@dpdk.org
> > Subject: Re: [PATCH v2] eventdev: fix event port setup in tx adapter

> >
> > >
> > > >
> > > > Signed-off-by: Naga Harish K, S V 
> >
> >
> > @Naga Harish K, S V Can we remove "," from the Signoff name.
> >
> > Some suggestions for consideration in preferred order. Let me know
> > your preferred Signoff name, I will change it accordingly.
> >
> > Naga Harish 
> >
> > or
> >
> > Naga Harish K 
> >
> > or
> >
> > Naga Harish K S V 
>
> This option is good. (Harish is out and hence I am responding)

1)  Option 'Signed-off-by: Naga Harish K S V
 is warned by checkpatch

[for-main]dell[dpdk-next-eventdev] $ ./devtools/checkpatches.sh -n 1

### eventdev: fix event port setup in Tx adapter

WARNING:FROM_SIGN_OFF_MISMATCH: From:/Signed-off-by: email name
mismatch: 'From: "Naga Harish K, S V" '
!= 'Signed-off-by: Naga Harish K S V '

2)  Option 'Signed-off-by: Naga Harish K, S V
' is warning by check-git-log.sh

[for-main]dell[dpdk-next-eventdev] $ ./devtools/check-git-log.sh -n 1
Wrong tag:
Signed-off-by: Naga Harish K, S V 


@thomas @David Marchand

Could you suggest what Sign-off to take?


>
> >
> > > > ---
> > > >  lib/eventdev/rte_event_eth_tx_adapter.c | 1 -
> > > >  1 file changed, 1 deletion(-)
> > > >
> > > > diff --git a/lib/eventdev/rte_event_eth_tx_adapter.c 
> > > > b/lib/eventdev/rte_event_eth_tx_adapter.c
> > > > index db260bfb68..18c0359db7 100644
> > > > --- a/lib/eventdev/rte_event_eth_tx_adapter.c
> > > > +++ b/lib/eventdev/rte_event_eth_tx_adapter.c
> > > > @@ -286,7 +286,6 @@ txa_service_conf_cb(uint8_t __rte_unused id, 
> > > > uint8_t dev_id,
> > > >   return ret;
> > > >   }
> > > >
> > > > - pc->event_port_cfg = 0;
> > > >   ret = rte_event_port_setup(dev_id, port_id, pc);
> > > >   if (ret) {
> > > >   RTE_EDEV_LOG_ERR("failed to setup event port %u\n",
> > > > --
> > > > 2.25.1


Re: [dpdk-dev] [PATCH v2] eventdev: fix event port setup in tx adapter

2021-07-30 Thread Thomas Monjalon
30/07/2021 12:24, Jerin Jacob:
> On Wed, Jul 28, 2021 at 4:27 PM Jayatheerthan, Jay
>  wrote:
> >
> > > -Original Message-
> > > From: Jerin Jacob 
> > > Sent: Wednesday, July 28, 2021 3:01 PM
> > > To: Jayatheerthan, Jay 
> > > Cc: Naga Harish K, S V ; dev@dpdk.org
> > > Subject: Re: [PATCH v2] eventdev: fix event port setup in tx adapter
> 
> > >
> > > >
> > > > >
> > > > > Signed-off-by: Naga Harish K, S V 
> > >
> > >
> > > @Naga Harish K, S V Can we remove "," from the Signoff name.
> > >
> > > Some suggestions for consideration in preferred order. Let me know
> > > your preferred Signoff name, I will change it accordingly.
> > >
> > > Naga Harish 
> > >
> > > or
> > >
> > > Naga Harish K 
> > >
> > > or
> > >
> > > Naga Harish K S V 
> >
> > This option is good. (Harish is out and hence I am responding)
> 
> 1)  Option 'Signed-off-by: Naga Harish K S V
>  is warned by checkpatch
> 
> [for-main]dell[dpdk-next-eventdev] $ ./devtools/checkpatches.sh -n 1
> 
> ### eventdev: fix event port setup in Tx adapter
> 
> WARNING:FROM_SIGN_OFF_MISMATCH: From:/Signed-off-by: email name
> mismatch: 'From: "Naga Harish K, S V" '
> != 'Signed-off-by: Naga Harish K S V '
> 
> 2)  Option 'Signed-off-by: Naga Harish K, S V
> ' is warning by check-git-log.sh
> 
> [for-main]dell[dpdk-next-eventdev] $ ./devtools/check-git-log.sh -n 1
> Wrong tag:
> Signed-off-by: Naga Harish K, S V 
> 
> 
> @thomas @David Marchand
> 
> Could you suggest what Sign-off to take?

The first option is probably OK but the problem is a mismatch with the author 
name.
Make sure to change the authorship with --amend --author=




Re: [dpdk-dev] [PATCH v2] eventdev: fix event port setup in tx adapter

2021-07-30 Thread Jerin Jacob
On Fri, Jul 30, 2021 at 4:00 PM Thomas Monjalon  wrote:
>
> 30/07/2021 12:24, Jerin Jacob:
> > On Wed, Jul 28, 2021 at 4:27 PM Jayatheerthan, Jay
> >  wrote:
> > >
> > > > -Original Message-
> > > > From: Jerin Jacob 
> > > > Sent: Wednesday, July 28, 2021 3:01 PM
> > > > To: Jayatheerthan, Jay 
> > > > Cc: Naga Harish K, S V ; dev@dpdk.org
> > > > Subject: Re: [PATCH v2] eventdev: fix event port setup in tx adapter
> >
> > > >
> > > > >
> > > > > >
> > > > > > Signed-off-by: Naga Harish K, S V 
> > > >
> > > >
> > > > @Naga Harish K, S V Can we remove "," from the Signoff name.
> > > >
> > > > Some suggestions for consideration in preferred order. Let me know
> > > > your preferred Signoff name, I will change it accordingly.
> > > >
> > > > Naga Harish 
> > > >
> > > > or
> > > >
> > > > Naga Harish K 
> > > >
> > > > or
> > > >
> > > > Naga Harish K S V 
> > >
> > > This option is good. (Harish is out and hence I am responding)
> >
> > 1)  Option 'Signed-off-by: Naga Harish K S V
> >  is warned by checkpatch
> >
> > [for-main]dell[dpdk-next-eventdev] $ ./devtools/checkpatches.sh -n 1
> >
> > ### eventdev: fix event port setup in Tx adapter
> >
> > WARNING:FROM_SIGN_OFF_MISMATCH: From:/Signed-off-by: email name
> > mismatch: 'From: "Naga Harish K, S V" '
> > != 'Signed-off-by: Naga Harish K S V '
> >
> > 2)  Option 'Signed-off-by: Naga Harish K, S V
> > ' is warning by check-git-log.sh
> >
> > [for-main]dell[dpdk-next-eventdev] $ ./devtools/check-git-log.sh -n 1
> > Wrong tag:
> > Signed-off-by: Naga Harish K, S V 
> >
> >
> > @thomas @David Marchand
> >
> > Could you suggest what Sign-off to take?
>
> The first option is probably OK but the problem is a mismatch with the author 
> name.
> Make sure to change the authorship with --amend --author=

Thanks for the input.


>
>


Re: [dpdk-dev] [PATCH 1/3] net: avoid cast-align warning in VLAN insert function

2021-07-30 Thread Olivier Matz
On Tue, Jul 13, 2021 at 09:49:08AM +0300, Eli Britstein wrote:
> In rte_vlan_insert there is a casting of rte_pktmbuf_prepend returned
> value to (struct rte_ether_hdr *), which causes cast-align warning when
> using gcc flags '-Werror -Wcast-align':
> 
> In file included from .../include/rte_ethdev.h:165,
>  from lib/netdev-dpdk.c:33:
> .../include/rte_ether.h: In function 'rte_vlan_insert':
> .../include/rte_ether.h:375:7: error: cast increases required alignment
> of target type [-Werror=cast-align]
>   375 |  nh = (struct rte_ether_hdr *)
>   |   ^
> 
> As the code assumes correct alignment, add first a (void *) casting, to
> avoid the warning.
> 
> Fixes: c974021a5949 ("ether: add soft vlan encap/decap")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Eli Britstein 

Acked-by: Olivier Matz 


Re: [dpdk-dev] [PATCH] event/cnxk: fix reading stale Tx queue depth

2021-07-30 Thread Jerin Jacob
On Tue, Jul 20, 2021 at 12:35 PM  wrote:
>
> From: Pavan Nikhilesh 
>
> Reads to Tx queue FC memory need to be atomic to avoid cores using
> same Tx queue spinning on stale values.
>
> Fixes: 313e884a22fd ("event/cnxk: support Tx adapter fast path")
>
> Signed-off-by: Pavan Nikhilesh 

Applied to dpdk-next-net-eventdev/for-main. Thanks


Re: [dpdk-dev] [PATCH v2] eventdev: fix event port setup in tx adapter

2021-07-30 Thread Jerin Jacob
On Wed, Jul 28, 2021 at 4:27 PM Jayatheerthan, Jay
 wrote:
>
> > -Original Message-
> > From: Jerin Jacob 
> > Sent: Wednesday, July 28, 2021 3:01 PM
> > To: Jayatheerthan, Jay 
> > Cc: Naga Harish K, S V ; dev@dpdk.org
> > Subject: Re: [PATCH v2] eventdev: fix event port setup in tx adapter
> >
> > On Wed, Jul 28, 2021 at 12:14 PM Jayatheerthan, Jay
> >  wrote:
> > >
> > > > -Original Message-
> > > > From: Naga Harish K, S V 
> > > > Sent: Saturday, July 24, 2021 7:41 PM
> > > > To: Jayatheerthan, Jay ; 
> > > > jerinjac...@gmail.com
> > > > Cc: dev@dpdk.org
> > > > Subject: [PATCH v2] eventdev: fix event port setup in tx adapter
> > > >
> > > > The event port config set by application in
> > > > rte_event_eth_tx_adapter_create API is modified in
> > > > default configuration callback function. This patch removes
> > > > this hardcode to use application provided event port
> > > > config value.
> > > >
> > > > Fixes: ("eventdev: fix event port config override in tx adapter")
> > >
> > > @Jerin, does this look good to you ?
> >
> > Yes. I will merge this. See below.


Applied to dpdk-next-net-eventdev/for-main. Thanks


>
> Thanks!
>
> >
> > >
> > > >
> > > > Signed-off-by: Naga Harish K, S V 
> >
> >
> > @Naga Harish K, S V Can we remove "," from the Signoff name.
> >
> > Some suggestions for consideration in preferred order. Let me know
> > your preferred Signoff name, I will change it accordingly.
> >
> > Naga Harish 
> >
> > or
> >
> > Naga Harish K 
> >
> > or
> >
> > Naga Harish K S V 
>
> This option is good. (Harish is out and hence I am responding)
>
> >
> > > > ---
> > > >  lib/eventdev/rte_event_eth_tx_adapter.c | 1 -
> > > >  1 file changed, 1 deletion(-)
> > > >
> > > > diff --git a/lib/eventdev/rte_event_eth_tx_adapter.c 
> > > > b/lib/eventdev/rte_event_eth_tx_adapter.c
> > > > index db260bfb68..18c0359db7 100644
> > > > --- a/lib/eventdev/rte_event_eth_tx_adapter.c
> > > > +++ b/lib/eventdev/rte_event_eth_tx_adapter.c
> > > > @@ -286,7 +286,6 @@ txa_service_conf_cb(uint8_t __rte_unused id, 
> > > > uint8_t dev_id,
> > > >   return ret;
> > > >   }
> > > >
> > > > - pc->event_port_cfg = 0;
> > > >   ret = rte_event_port_setup(dev_id, port_id, pc);
> > > >   if (ret) {
> > > >   RTE_EDEV_LOG_ERR("failed to setup event port %u\n",
> > > > --
> > > > 2.25.1


[dpdk-dev] [pull-request] dpdk-next-eventdev - v21.08-rc3

2021-07-30 Thread Jerin Jacob Kollanukkaran
The following changes since commit f8c42c53ce744180e1fb4203b2cf976054726360:

  net/mlx5: fix meter hierarchy validation with yellow (2021-07-29 22:06:43 
+0200)

are available in the Git repository at:

  http://dpdk.org/git/next/dpdk-next-eventdev

for you to fetch changes up to 9661f6ac433d3dd84a9fe001a45f3cb0ae612d78:

  eventdev: fix event port setup in Tx adapter (2021-07-30 16:25:19 +0530)


Naga Harish K S V (1):
  eventdev: fix event port setup in Tx adapter

Pavan Nikhilesh (1):
  event/cnxk: fix reading stale Tx queue depth

 drivers/event/cnxk/cn9k_worker.h| 3 ++-
 lib/eventdev/rte_event_eth_tx_adapter.c | 1 -
 2 files changed, 2 insertions(+), 2 deletions(-)


Re: [dpdk-dev] [PATCH 2/3] mbuf: avoid cast-align warning in pktmbuf mtod offset macro

2021-07-30 Thread Olivier Matz
Hi Eli,

On Thu, Jul 29, 2021 at 10:13:45AM +0300, Eli Britstein wrote:
> 
> On 7/28/2021 6:28 PM, Olivier Matz wrote:
> > External email: Use caution opening links or attachments
> > 
> > 
> > On Tue, Jul 13, 2021 at 09:49:09AM +0300, Eli Britstein wrote:
> > > In rte_pktmbuf_mtod_offset macro, there is a casting from char * to type
> > > 't', which may cause cast-align warning when using gcc flags
> > > '-Werror -Wcast-align':
> > > 
> > > .../include/rte_mbuf_core.h:723:3: error: cast increases required 
> > > alignment
> > >  of target type [-Werror=cast-align]
> > >723 |  ((t)((char *)(m)->buf_addr + (m)->data_off + (o)))
> > >|   ^
> > > 
> > > As the code assumes correct alignment, add first a (void *) casting, to
> > > avoid the warning.
> > > 
> > > Fixes: af75078fece3 ("first public release")
> > > Cc: sta...@dpdk.org
> > > 
> > > Signed-off-by: Eli Britstein 
> > My initial thinking was that it's the problem of the application: if
> > -Werror=cast-align is used, it is up to the application to cast the
> > return value of rte_pktmbuf_mtod_offset() to (void *) before casting it
> > to the network type.
> > 
> > But, if I understand correctly, the problem is not about the application
> > code itself, but about inlined code in the header files of dpdk
> > (i.e. compiling an empty C file that just includes the dpdk headers with
> > -Werror=cast-align). Is it correct? If yes I think it should be
> > highlighted in the commit log.
> 
> I think yes, though in this specific patch it is not even an inline
> function, but a macro.
> 
> However, I don't have a synthetic application example to show those
> warnings, thus didn't put such in the commit msg.

For this patch, I think it would be useful to have a way to reproduce
the issue first, so we can check whether it is the proper place to fix
the problem.

To me, it is assumed in the DPDK project that we can mmap a network
structure on mbuf data (maybe I'm wrong?). If an external application
like OVS wants to use -Werror=cast-align, it has to cast the result of
calls to rte_pktmbuf_mtod() family.

The only corner cases are DPDK header files which have static inline
functions or macro that forces the use of rte_pktmbuf_mtod() family
without a cast (like for your patch 1/3), because it cannot be fixed in
the external project.

I think we have to make our header files compliant to projects that want
to use -Werror=cast-align, like we do to make our header files compliant
to C++.

What you suggest in this patch forces the cast to (void *) for all users
of rte_pktmbuf_mtod() family. This could be a problem for projects that
want to see these warnings.

Would it be possible instead to add a cast in DPDK headers, in inline
functions that make use of these mtod functions?

Regards,
Olivier



> > 
> > Out of curiosity, how did you find the errors? I mean, is it possible
> > that some casts are missing some other headers, or is this patchset
> > exhaustive?
> Currently OVS-DPDK is compiled only with -Wno-cast-align.
> 
> Following complaint that a recent commit introduced a degradation in OVS
> [1], I compiled OVS without this warning deprecation.
> The fixes in OVS are [2] and [3] (already merged). The fixes in DPDK are in
> this patch-set.
> 
> [1] https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385084.html
> [2] https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/386278.html
>     e8cccd3a3589 ("netdev-offload-dpdk: Fix IPv6 rewrite cast-align
> warning.")
> [3] https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/386279.html
>     1f7f557603a5 ("netdev-offload-dpdk: Fix vxlan vni cast-align warnings.")
> > Thanks,
> > Olivier
> > 
> > 
> > > ---
> > >   lib/mbuf/rte_mbuf_core.h | 2 +-
> > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
> > > index bb38d7f581..dabdeee604 100644
> > > --- a/lib/mbuf/rte_mbuf_core.h
> > > +++ b/lib/mbuf/rte_mbuf_core.h
> > > @@ -720,7 +720,7 @@ struct rte_mbuf_ext_shared_info {
> > >*   The type to cast the result into.
> > >*/
> > >   #define rte_pktmbuf_mtod_offset(m, t, o) \
> > > - ((t)((char *)(m)->buf_addr + (m)->data_off + (o)))
> > > + ((t)(void *)((char *)(m)->buf_addr + (m)->data_off + (o)))
> > > 
> > >   /**
> > >* A macro that points to the start of the data in the mbuf.
> > > --
> > > 2.28.0.2311.g225365fb51
> > > 


Re: [dpdk-dev] [PATCH v3] net: fix Intel-specific Prepare the outer ipv4 hdr for checksum

2021-07-30 Thread Olivier Matz
On Wed, Jul 28, 2021 at 06:46:53PM +0300, Andrew Rybchenko wrote:
> On 7/7/21 12:40 PM, Mohsin Kazmi wrote:
> > Preparation the headers for the hardware offload
> > misses the outer ipv4 checksum offload.
> > It results in bad checksum computed by hardware NIC.
> > 
> > This patch fixes the issue by setting the outer ipv4
> > checksum field to 0.
> > 
> > Fixes: 4fb7e803eb1a ("ethdev: add Tx preparation")
> > Cc: sta...@dpdk.org
> > 
> > Signed-off-by: Mohsin Kazmi 
> > Acked-by: Qi Zhang 
> > ---
> > v3:
> > * Update the conditional test with PKT_TX_OUTER_IP_CKSUM.
> > * Update the commit title with "Intel-specific".
> > 
> > v2:
> > * Update the commit message with Fixes.
> > 
> >   lib/net/rte_net.h | 15 +--
> >   1 file changed, 13 insertions(+), 2 deletions(-)
> > 
> > diff --git a/lib/net/rte_net.h b/lib/net/rte_net.h
> > index 434435ffa2..3f4c8c58b9 100644
> > --- a/lib/net/rte_net.h
> > +++ b/lib/net/rte_net.h
> > @@ -125,11 +125,22 @@ rte_net_intel_cksum_flags_prepare(struct rte_mbuf *m, 
> > uint64_t ol_flags)
> >  * Mainly it is required to avoid fragmented headers check if
> >  * no offloads are requested.
> >  */
> > -   if (!(ol_flags & (PKT_TX_IP_CKSUM | PKT_TX_L4_MASK | PKT_TX_TCP_SEG)))
> > +   if (!(ol_flags & (PKT_TX_IP_CKSUM | PKT_TX_L4_MASK | PKT_TX_TCP_SEG |
> > + PKT_TX_OUTER_IP_CKSUM)))
> > return 0;
> > -   if (ol_flags & (PKT_TX_OUTER_IPV4 | PKT_TX_OUTER_IPV6))
> > +   if (ol_flags & (PKT_TX_OUTER_IPV4 | PKT_TX_OUTER_IPV6)) {
> > inner_l3_offset += m->outer_l2_len + m->outer_l3_len;
> > +   /*
> > +* prepare outer ipv4 header checksum by setting it to 0,
> > +* in order to be computed by hardware NICs.
> > +*/
> > +   if (ol_flags & PKT_TX_OUTER_IP_CKSUM) {
> > +   ipv4_hdr = rte_pktmbuf_mtod_offset(m,
> > +   struct rte_ipv4_hdr *, m->outer_l2_len);
> > +   ipv4_hdr->hdr_checksum = 0;
> 
> Here we assume that the field is located in the first segment.
> Unlikely but it still could be false. We must handle it properly.

This is specified in the API comment, so I think it has to be checked
by the caller.

> > +   }
> > +   }
> > /*
> >  * Check if headers are fragmented.
> > 
> 


Re: [dpdk-dev] [PATCH] common/sfc_efx/base: do not validate MAE action COUNT order

2021-07-30 Thread Thomas Monjalon
29/07/2021 11:32, Ivan Malov:
> In DPDK + Open vSwitch use case, action COUNT is always the
> first one to be added. In particular, it goes before action
> DECAP in that use case. The current code enforces the right
> order (DECAP goes before COUNT), and this provokes failures.
> As an exception, do not validate the order for action COUNT.
> 
> Signed-off-by: Ivan Malov 
> Reviewed-by: Andrew Rybchenko 
> Reviewed-by: Andy Moreton 

Applied, thanks






Re: [dpdk-dev] RFC: Enahancements to Rx adapter for DPDK 21.11

2021-07-30 Thread Jerin Jacob
On Wed, Jul 28, 2021 at 11:53 AM Kundapura, Ganapati
 wrote:
>
> Comments inlined

Please fix your email client for adding proper >

>
> -Original Message-
> From: Jerin Jacob 
> Sent: 28 July 2021 11:38
> To: Kundapura, Ganapati 
> Cc: dpdk-dev ; Jayatheerthan, Jay 
> Subject: Re: RFC: Enahancements to Rx adapter for DPDK 21.11
>
> On Mon, Jul 26, 2021 at 6:37 PM Kundapura, Ganapati 
>  wrote:
> >
> > A gentle ping for comments.
> >
> > -Original Message-
> > From: dev  On Behalf Of Kundapura, Ganapati
> > Sent: 23 July 2021 12:33
> > To: dpdk-dev ; Jerin Jacob ;
> > Jayatheerthan, Jay 
> > Subject: [dpdk-dev] RFC: Enahancements to Rx adapter for DPDK 21.11
> >
> > Hi dpdk-dev,
> >
> > We would like to submit series of patches to Rx adapters that will enhance 
> > the configuration and performance.
> > Please find the details below.
> >
> > (1) Configure Rx event buffer at run time
> > Add new api to configure the size of the Rx event buffer at run time.
> > This api allows setting the size of the event buffer at adapter level.
>
> Since we can change ABI for 21.11, Not prefer to add a new API instead add a 
> param to config structure.
> Please send the deprecation notice for ABI change.
>
> Config structure passed to rte_event_eth_rx_adapter_create() is of type 
> rte_event_port_conf which
> comes from event framework(rte_eventdev.h).
> Does it make sense to pass adapter event buffer size in rte_event_port_conf 
> structure?

I see. Then new API is better to set the buffer is OK.


>
> >
> > (2) Change packet enqueue buffer in Rx adapter to circular buffer
> > Rx adapter uses memmove() to move unprocessed events to the begining
> > of packet enqueue buffer which consumes good amount of CPU cycles.
>
> Looks good.
>
>
> >
> > (3) Add API to retrieve the Rx queue info
> > Rx queue info containinin  flags for handling received packets,
> > event queue identifier, schedular type, event priority,
> > polling frequence of the receive queue and flow identifier
>
> Looks good. Please implement it as adaptor ops so that it can be adapter 
> specific to support HW implementations.
>
>
>
> >
> > (4) Add adapter_stats cli to retrive Rx/Tx adapter stats and rxq info
> > This cli displays Rx and Tx adapter stats containing recieved packet 
> > count,
> > eventdev enqueue count, enqueue retry count, event buffer size, queue 
> > poll count,
> > transmitted packet count, packet dropped count, transmit fail count etc 
> > and rx queue info.
>
> Generally, we don't entertain CLI in the library. You can add command-line 
> arguments to app/test-eventdev to test this.
>
> Adapter_stats is standalone application not part of library and it'll be in 
> app/adapter_stats.

No need for a new app. Please add stats as telemetry, then it can be
pull through
usertools/dpdk-telemetry.py



> >
> > (5) Update Rx timestamp in mbuf using mbuf dynamic field
> > Add support to register timestamp dynamic field in mbuf
> > Update the timestamp in mbuf for each packet before eventdev
> > enqueue
>
> Cool.
>
> >
> > We look forward to feedback on this proposal. Once we have initial 
> > feedback, patches will be submitted for review.
> >
> > Thanks,
> > Ganapati


Re: [dpdk-dev] [PATCH v2] net/memif: fix abstract socket addr_len

2021-07-30 Thread Thomas Monjalon
> > This fixes using abstract sockets with memifs.
> > we were not passing the exact addr_len, which requires zeroing the remaining
> > sun_path and doesn't appear well in other utilities (e.g.
> > lsof -U)
> > 
> > Signed-off-by: Nathan Skrzypczak 
> 
> Looks ok to me.
> 
> Reviewed-by: Jakub Grajciar 

Applied, thanks.




Re: [dpdk-dev] [PATCH v1] test/func_reentrancy: free memzones after creating test case

2021-07-30 Thread Olivier Matz
Hi Joyce,

On Wed, Jul 28, 2021 at 02:33:22AM -0500, Joyce Kong wrote:
> Function reentrancy test limits maximum number of iterations
> simultaneously, however it doesn't free the 'fr_test_once'
> memzones after the fact, so introduce freeing 'fr_test_once'
> in ring/mempool/hash/fbk/lpm_clean.
> 
> Fixes: 104a92bd026f ("app: add reentrancy tests")
> Fixes: 995eec619024 ("test: clean up memory for function reentrancy test")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Joyce Kong 
> Reviewed-by: Ruifeng Wang 
> Reviewed-by: Feifei Wang 
> ---
>  app/test/test_func_reentrancy.c | 24 +++-
>  1 file changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/app/test/test_func_reentrancy.c b/app/test/test_func_reentrancy.c
> index 231c99a9eb..e4e9c2cc7c 100644
> --- a/app/test/test_func_reentrancy.c
> +++ b/app/test/test_func_reentrancy.c
> @@ -89,6 +89,10 @@ ring_clean(unsigned int lcore_id)
>   char ring_name[MAX_STRING_SIZE];
>   int i;
>  
> + rp = rte_ring_lookup("fr_test_once");
> + if (rp != NULL)
> + rte_ring_free(rp);
> +
>   for (i = 0; i < MAX_ITER_MULTI; i++) {
>   snprintf(ring_name, sizeof(ring_name),
>   "fr_test_%d_%d", lcore_id, i);
> @@ -148,7 +152,10 @@ mempool_clean(unsigned int lcore_id)
>   char mempool_name[MAX_STRING_SIZE];
>   int i;
>  
> - /* verify all ring created successful */
> + mp = rte_mempool_lookup("fr_test_once");
> + if (mp != NULL)
> + rte_mempool_free(mp);
> +
>   for (i = 0; i < MAX_ITER_MULTI; i++) {
>   snprintf(mempool_name, sizeof(mempool_name), "fr_test_%d_%d",
>lcore_id, i);
> @@ -208,6 +215,10 @@ hash_clean(unsigned lcore_id)
>   struct rte_hash *handle;
>   int i;
>  
> + handle = rte_hash_find_existing("fr_test_once");
> + if (handle != NULL)
> + rte_hash_free(handle);
> +
>   for (i = 0; i < MAX_ITER_MULTI; i++) {
>   snprintf(hash_name, sizeof(hash_name), "fr_test_%d_%d",  
> lcore_id, i);
>  
> @@ -272,6 +283,10 @@ fbk_clean(unsigned lcore_id)
>   struct rte_fbk_hash_table *handle;
>   int i;
>  
> + handle = rte_fbk_hash_find_existing("fr_test_once");
> + if (handle != NULL)
> + rte_fbk_hash_free(handle);
> +
>   for (i = 0; i < MAX_ITER_MULTI; i++) {
>   snprintf(fbk_name, sizeof(fbk_name), "fr_test_%d_%d",  
> lcore_id, i);
>  
> @@ -338,6 +353,10 @@ lpm_clean(unsigned int lcore_id)
>   struct rte_lpm *lpm;
>   int i;
>  
> + lpm = rte_lpm_find_existing("fr_test_once");
> + if (lpm != NULL)
> + rte_lpm_free(lpm);
> +
>   for (i = 0; i < MAX_LPM_ITER_TIMES; i++) {
>   snprintf(lpm_name, sizeof(lpm_name), "fr_test_%d_%d",  
> lcore_id, i);
>  
> @@ -454,6 +473,9 @@ launch_test(struct test_case *pt_case)
>   pt_case->clean(lcore_id);
>   }
>  
> + if (pt_case->clean != NULL)
> + pt_case->clean(rte_get_main_lcore());
> +

Is it the same issue? It looks it adds the missing frees for the main thread
(not only "fr_test_once"). I don't think it requires another patch, but a word
could be added about it in the commit log.


>   count = rte_atomic32_read(&obj_count);
>   if (count != 1) {
>   printf("%s: common object allocated %d times (should be 1)\n",
> -- 
> 2.17.1
> 


Re: [dpdk-dev] [PATCH] net/softnic: fix null pointer dereference

2021-07-30 Thread Thomas Monjalon
> > From: Dapeng Yu 
> > 
> > When there is no "firmware" in arguments, the "firmware" pointer is null,
> > and will be dereferenced by rte_strscpy().
> > 
> > This patch moves the code block which copies character string from
> > "firmware" to "p->firmware" into the "if" statements where "firmware"
> > argument exists and it is duplicated successfully.
> > 
> > Coverity issue: 372136
> > Fixes: d8f852f5f369 ("net/softnic: fix memory leak in arguments parsing")
> > Cc: sta...@dpdk.org
> > 
> > Signed-off-by: Dapeng Yu 
> 
> Acked-by: Jasvinder Singh 

Applied, thanks.





Re: [dpdk-dev] [PATCH v2] net/softnic: fix memory leak as profile is freed

2021-07-30 Thread Thomas Monjalon
28/07/2021 08:05, dapengx...@intel.com:
> From: Dapeng Yu 
> 
> In function softnic_table_action_profile_free(), the memory referenced
> by pointer "ap" in the instance of "struct softnic_table_action_profile"
> is not freed.
> 
> This patch fixes it.
> 
> Fixes: a737dd4e5863 ("net/softnic: add table action profile")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Dapeng Yu 
> Acked-by: Jasvinder Singh 

Applied, thanks.




Re: [dpdk-dev] [PATCH v4] app/testpmd: fix TX checksum calculation for tunnel

2021-07-30 Thread Thomas Monjalon
30/07/2021 10:39, Olivier Matz:
> On Thu, Jul 29, 2021 at 08:01:41PM +0300, Gregory Etelson wrote:
> > csumonly engine calculates TX checksum of a tunnelled packet for outer
> > headers only or separately for outer and inner headers. The
> > calculation method is determined by checksum configuration options.
> > If TX checksum calculation is separated, the inner headers are
> > processed before outer headers.
> > 
> > Inner headers processing sets checksum values to 0 unconditionally.
> > If TX configuration offloads inner checksums only, outer checksum
> > calculation in software will read 0 instead of real values and
> > produce wrong result.
> > 
> > The patch zeroes inner checksums only before software calculation.
> > 
> > Fixes: 6b520d54ebfe ("app/testpmd: use Tx preparation in checksum engine")
> > Cc: sta...@dpdk.org
> > 
> > Signed-off-by: Gregory Etelson 
> 
> Acked-by: Olivier Matz 
> 

The previous acks were forgotten (it should be added manually in the patch):

Acked-by: Ori Kam 
Acked-by: Ajit Khaparde 
Acked-by: Xiaoyun Li 

Applied, thanks.




Re: [dpdk-dev] [EXT] Re: [v1, 1/3] telemetry: enable storing pointer value

2021-07-30 Thread Gowrishankar Muthukrishnan
Hi Bruce,

> I'm a little curious as to the usefulness of having a pointer value in 
> telemetry
> output? How would a telemetry user be expected to use pointer information
> returned? Printing pointers seems something more useful for a debugging or
> tracing interface than a telemetry one.
> 

Thanks for the quick review. I enabled _ptr API keeping few things in mind:

1. User need to explicitly type cast pointer value (ie address) to uint64_t
which otherwise can cause compiler warning (Wint-conversion). Although
u64 is large enough for holding address as value, type casting is 
problematic
for non-64 bit machines (eg 32 bit). One other option is to use uintptr_t
as a holder.

2. With this API, code walk could be easier as user can interpret the accessed
 data better (ie ptr is address value). _ptr API is meant for pointer 
variables,
 though it is up to user to choose.

3. Also while debugging telemetry date using script like 
usertools/dpdk-telemetry.py,
perceiving address as hex is quicker than same as u64.

Answering on returned data, user needs to convert stringified hex to pointer 
value.

Regards,
Gowrishankar


Re: [dpdk-dev] [PATCH] app/testpmd: fix vm_hotplug after removal of rte_eth_devices

2021-07-30 Thread Thomas Monjalon
> > After removing rte_eth_devices from testpmd the vm_hotplug no longer
> > recovered after removal of a device, this patch fixes this issue.
> > 
> > Fixes: 0a0821bcf312 ("app/testpmd: remove most uses of internal ethdev 
> > array")
> > 
> > Signed-off-by: Paulis Gributs 
> > ---
> >  app/test-pmd/testpmd.c | 8 +---
> >  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> Acked-by: Xiaoyun Li 

Adding a bit of explanation:
"because the port was closed before querying it."
About the title, vm_hotplug is nothing in testpmd,
and the problem is not limited to VMs.

Applied with rewording.




Re: [dpdk-dev] [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free

2021-07-30 Thread Olivier Matz
Hi Thomas,

On Sat, Jul 24, 2021 at 10:47:34AM +0200, Thomas Monjalon wrote:
> What's the follow-up for this patch?

Unfortunatly, I still don't have the time to work on this topic yet.

In my initial tests, in our lab, I didn't notice any performance
regression, but Ali has seen an impact (0.5M PPS, but I don't know how
much in percent).


> 19/01/2021 15:04, Slava Ovsiienko:
> > Hi, All
> > 
> > Could we postpose this patch at least to rc2? We would like to conduct more 
> > investigations?
> > 
> > With best regards, Slava
> > 
> > From: Olivier Matz 
> > > On Mon, Jan 18, 2021 at 05:52:32PM +, Ali Alnubani wrote:
> > > > Hi,
> > > > (Sorry had to resend this to some recipients due to mail server 
> > > > problems).
> > > >
> > > > Just confirming that I can still reproduce the regression with single 
> > > > core and
> > > 64B frames on other servers.
> > > 
> > > Many thanks for the feedback. Can you please detail what is the amount of
> > > performance loss in percent, and confirm the test case? (I suppose it is
> > > testpmd io forward).
> > > 
> > > Unfortunatly, I won't be able to spend a lot of time on this soon (sorry 
> > > for
> > > that). So I see at least these 2 options:
> > > 
> > > - postpone the patch again, until I can find more time to analyze
> > >   and optimize
> > > - apply the patch if the performance loss is acceptable compared to
> > >   the added value of fixing a bug
> > > 
> [...]

Statu quo...

Olivier

> > > > > Assuming that pw86457 doesn't have an effect on this test, it looks
> > > > > to me that this patch caused a regression in Intel hardware as well.
> > > > >
> > > > > Can someone update the baseline's expected values for the Intel NICs
> > > > > and rerun the test on this patch?
> > > > >
> > > > > Thanks,
> > > > > Ali
> 
> 
> 
> 


Re: [dpdk-dev] [PATCH] doc: note KNI alternatives and deprecation plan

2021-07-30 Thread Olivier Matz
Hi Ferruh,

Few minor comments below.

On Wed, Jun 23, 2021 at 06:31:42PM +0100, Ferruh Yigit wrote:
> Add a note that KNI kernel module will be moved to dpdk-kmods git repo
> and there is a long term plan to deprecate it.
> 
> Also add some more details on the alternatives to KNI and cons of the
> KNI against these alternatives.
> 
> Signed-off-by: Ferruh Yigit 
> ---
>  doc/guides/nics/tap.rst   |  2 +
>  .../prog_guide/kernel_nic_interface.rst   | 38 +--
>  2 files changed, 37 insertions(+), 3 deletions(-)
> 
> diff --git a/doc/guides/nics/tap.rst b/doc/guides/nics/tap.rst
> index 3ce696b605d1..07315fe32422 100644
> --- a/doc/guides/nics/tap.rst
> +++ b/doc/guides/nics/tap.rst
> @@ -1,6 +1,8 @@
>  ..  SPDX-License-Identifier: BSD-3-Clause
>  Copyright(c) 2016 Intel Corporation.
>  
> +.. _TunTap_PMD:
> +
>  Tun|Tap Poll Mode Driver
>  
>  
> diff --git a/doc/guides/prog_guide/kernel_nic_interface.rst 
> b/doc/guides/prog_guide/kernel_nic_interface.rst
> index 1ce03ec1a374..29f8c92fd9d6 100644
> --- a/doc/guides/prog_guide/kernel_nic_interface.rst
> +++ b/doc/guides/prog_guide/kernel_nic_interface.rst
> @@ -6,16 +6,48 @@
>  Kernel NIC Interface
>  
>  
> +.. Note::
> +
> +   KNI kernel module will be removed from main git repository to `dpdk-kmods 
> `_

removed -> moved?

Or "removed from main and moved to"

> +   repository by the `DPDK technical board decision 
> `_.
> +   Also there is a `long term plan 
> `_ to deprecate the 
> KNI.
> +
> +   :ref:`virtio_user_as_exceptional_path` alternative is preferred way for
> +   interfacing with Linux network stack as it is being in-kernel solution and
> +   similar performance expectations.
> +
>  The DPDK Kernel NIC Interface (KNI) allows userspace applications access to 
> the Linux* control plane.
>  
> -The benefits of using the DPDK KNI are:
> +KNI allows an interface with the kernel network stack and allows management 
> of
> +DPDK ports using standard Linux net tools such as ``ethtool``, ``ifconfig`` 
> and
> +``tcpdump``.
> +
> +Main use case of KNI is get/receive exception packets from/to Linux network
> +stack while main datapath IO is done bypassing the networking stack.
> +
> +There are other alternatives to KNI, all are available in the upstream Linux:
> +
> +#. :ref:`TunTap_PMD` as wrapper to `Linux tun/tap
> +   `_
> +
> +#. :ref:`virtio_user_as_exceptional_path`

Shouldn't virtio_user be the first item?

> +
> +The benefits of using the DPDK KNI against alternatives are:
>  
>  *   Faster than existing Linux TUN/TAP interfaces
>  (by eliminating system calls and copy_to_user()/copy_from_user() 
> operations.
>  
> -*   Allows management of DPDK ports using standard Linux net tools such as 
> ethtool, ifconfig and tcpdump.
> +The cons of the DPDK KNI are:
> +
> +* It is out-of-tree Linux kernel module and it can't be distributed as 
> binary as
> +  part of OSV DPDK packages. This makes it harder to consume, although it is

OSV -> OVS

> +  always possible to compile it from the source code.
> +
> +* As it shares memory between userspace and kernelspace, and kernel part
> +  directly uses input provided by userspace, it is not safe. This makes hard 
> to
> +  upstream the module.
>  
> -*   Allows an interface with the kernel network stack.
> +* Only a subset of control commands are supported by KNI.
>  
>  The components of an application using the DPDK Kernel NIC Interface are 
> shown in :numref:`figure_kernel_nic_intf`.
>  
> -- 
> 2.31.1
> 


[dpdk-dev] [RFC PATCH v2 0/7] heterogeneous computing library

2021-07-30 Thread Thomas Monjalon
From: Elena Agostini 

In heterogeneous computing system, processing is not only in the CPU.
Some tasks can be delegated to devices working in parallel.

The goal of this new library is to enhance the collaboration between
DPDK, that's primarily a CPU framework, and other type of devices like GPUs.

When mixing network activity with task processing on a non-CPU device,
there may be the need to put in communication the CPU with the device
in order to manage the memory, synchronize operations, exchange info, etc..

This library provides a number of new features:
- Interoperability with device specific library with generic handlers
- Possibility to allocate and free memory on the device
- Possibility to allocate and free memory on the CPU but visible from the device
- Communication functions to enhance the dialog between the CPU and the device

The infrastructure is prepared to welcome drivers in drivers/hc/
as the upcoming NVIDIA one, implementing the hcdev API.

Some parts are not complete:
  - locks
  - memory allocation table
  - memory freeing
  - guide documentation
  - integration in devtools/check-doc-vs-code.sh
  - unit tests
  - integration in testpmd to enable Rx/Tx to/from GPU memory.

Below is a pseudo-code to give an example about how to use functions
in this library in case of a CUDA application.


Elena Agostini (4):
  hcdev: introduce heterogeneous computing device library
  hcdev: add memory API
  hcdev: add communication flag
  hcdev: add communication list

Thomas Monjalon (3):
  hcdev: add event notification
  hcdev: add child device representing a device context
  hcdev: support multi-process

 .gitignore |   1 +
 MAINTAINERS|   6 +
 doc/api/doxy-api-index.md  |   1 +
 doc/api/doxy-api.conf.in   |   1 +
 doc/guides/conf.py |   8 +
 doc/guides/hcdevs/features/default.ini |  13 +
 doc/guides/hcdevs/index.rst|  11 +
 doc/guides/hcdevs/overview.rst |  11 +
 doc/guides/index.rst   |   1 +
 doc/guides/prog_guide/hcdev.rst|   5 +
 doc/guides/prog_guide/index.rst|   1 +
 doc/guides/rel_notes/release_21_08.rst |   5 +
 drivers/hc/meson.build |   4 +
 drivers/meson.build|   1 +
 lib/hcdev/hcdev.c  | 789 +
 lib/hcdev/hcdev_driver.h   |  96 +++
 lib/hcdev/meson.build  |  12 +
 lib/hcdev/rte_hcdev.h  | 592 +++
 lib/hcdev/version.map  |  35 ++
 lib/meson.build|   1 +
 20 files changed, 1594 insertions(+)
 create mode 100644 doc/guides/hcdevs/features/default.ini
 create mode 100644 doc/guides/hcdevs/index.rst
 create mode 100644 doc/guides/hcdevs/overview.rst
 create mode 100644 doc/guides/prog_guide/hcdev.rst
 create mode 100644 drivers/hc/meson.build
 create mode 100644 lib/hcdev/hcdev.c
 create mode 100644 lib/hcdev/hcdev_driver.h
 create mode 100644 lib/hcdev/meson.build
 create mode 100644 lib/hcdev/rte_hcdev.h
 create mode 100644 lib/hcdev/version.map




/ HCDEV library + CUDA functions

#define GPU_PAGE_SHIFT 16
#define GPU_PAGE_SIZE (1UL << GPU_PAGE_SHIFT)

int main() {
struct rte_hcdev_flag quit_flag;
struct rte_hcdev_comm_list *comm_list;
int nb_rx = 0;
int comm_list_entry = 0;
struct rte_mbuf * rx_mbufs[max_rx_mbufs];
cudaStream_t cstream;
struct rte_mempool *mpool_payload, *mpool_header;
struct rte_pktmbuf_extmem ext_mem;
int16_t dev_id;

/* Initialize CUDA objects (cstream, context, etc..). */
/* Use hcdev library to register a new CUDA context if any */
/* Let's assume the application wants to use the default context of the GPU 
device 0 */
dev_id = 0;

/* Create an external memory mempool using memory allocated on the GPU. */
ext_mem.elt_size = mbufs_headroom_size;
ext_mem.buf_len = RTE_ALIGN_CEIL(mbufs_num * ext_mem.elt_size, 
GPU_PAGE_SIZE);
ext_mem.buf_iova = RTE_BAD_IOVA;
ext_mem.buf_ptr = rte_hcdev_malloc(dev_id, ext_mem.buf_len, 0);
rte_extmem_register(ext_mem.buf_ptr, ext_mem.buf_len, NULL, 
ext_mem.buf_iova, GPU_PAGE_SIZE);
rte_dev_dma_map(rte_eth_devices[l2fwd_port_id].device, ext_mem.buf_ptr, 
ext_mem.buf_iova, ext_mem.buf_len);
mpool_payload = rte_pktmbuf_pool_create_extbuf("gpu_mempool", mbufs_num,
0, 0, ext_mem.elt_size,
rte_socket_id(), &ext_mem, 
1);

/*
 * Create CPU - device communication flag. With this flag, the CPU can tell 
to the CUDA kernel
 * to exit from the main loop.
 */
rte_hcdev_comm_create_flag(dev_id, &quit_flag, RTE_HCDEV_COMM_FLAG_CPU);
rte_hcdev_comm_set_flag(&q

[dpdk-dev] [RFC PATCH v2 1/7] hcdev: introduce heterogeneous computing device library

2021-07-30 Thread Thomas Monjalon
From: Elena Agostini 

In heterogeneous computing system, processing is not only in the CPU.
Some tasks can be delegated to devices working in parallel.

The new library hcdev is for dealing with computing devices
from a DPDK application running on the CPU.

The infrastructure is prepared to welcome drivers in drivers/hc/.

Signed-off-by: Elena Agostini 
Signed-off-by: Thomas Monjalon 
---
 .gitignore |   1 +
 MAINTAINERS|   6 +
 doc/api/doxy-api-index.md  |   1 +
 doc/api/doxy-api.conf.in   |   1 +
 doc/guides/conf.py |   8 +
 doc/guides/hcdevs/features/default.ini |  10 +
 doc/guides/hcdevs/index.rst|  11 ++
 doc/guides/hcdevs/overview.rst |  11 ++
 doc/guides/index.rst   |   1 +
 doc/guides/prog_guide/hcdev.rst|   5 +
 doc/guides/prog_guide/index.rst|   1 +
 doc/guides/rel_notes/release_21_08.rst |   4 +
 drivers/hc/meson.build |   4 +
 drivers/meson.build|   1 +
 lib/hcdev/hcdev.c  | 249 +
 lib/hcdev/hcdev_driver.h   |  67 +++
 lib/hcdev/meson.build  |  10 +
 lib/hcdev/rte_hcdev.h  | 169 +
 lib/hcdev/version.map  |  20 ++
 lib/meson.build|   1 +
 20 files changed, 581 insertions(+)
 create mode 100644 doc/guides/hcdevs/features/default.ini
 create mode 100644 doc/guides/hcdevs/index.rst
 create mode 100644 doc/guides/hcdevs/overview.rst
 create mode 100644 doc/guides/prog_guide/hcdev.rst
 create mode 100644 drivers/hc/meson.build
 create mode 100644 lib/hcdev/hcdev.c
 create mode 100644 lib/hcdev/hcdev_driver.h
 create mode 100644 lib/hcdev/meson.build
 create mode 100644 lib/hcdev/rte_hcdev.h
 create mode 100644 lib/hcdev/version.map

diff --git a/.gitignore b/.gitignore
index b19c0717e6..97e57e5897 100644
--- a/.gitignore
+++ b/.gitignore
@@ -14,6 +14,7 @@ doc/guides/compressdevs/overview_feature_table.txt
 doc/guides/regexdevs/overview_feature_table.txt
 doc/guides/vdpadevs/overview_feature_table.txt
 doc/guides/bbdevs/overview_feature_table.txt
+doc/guides/hcdevs/overview_feature_table.txt
 
 # ignore generated ctags/cscope files
 cscope.out.po
diff --git a/MAINTAINERS b/MAINTAINERS
index 8013ba1f14..71e850ae44 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -452,6 +452,12 @@ F: app/test-regex/
 F: doc/guides/prog_guide/regexdev.rst
 F: doc/guides/regexdevs/features/default.ini
 
+Heterogeneous Computing API - EXPERIMENTAL
+M: Elena Agostini 
+F: lib/hcdev/
+F: doc/guides/prog_guide/hcdev.rst
+F: doc/guides/hcdevs/features/default.ini
+
 Eventdev API
 M: Jerin Jacob 
 T: git://dpdk.org/next/dpdk-next-eventdev
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 1992107a03..2e5256ccc1 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -21,6 +21,7 @@ The public API headers are grouped by topics:
   [compressdev](@ref rte_compressdev.h),
   [compress]   (@ref rte_comp.h),
   [regexdev]   (@ref rte_regexdev.h),
+  [hcdev]  (@ref rte_hcdev.h),
   [eventdev]   (@ref rte_eventdev.h),
   [event_eth_rx_adapter]   (@ref rte_event_eth_rx_adapter.h),
   [event_eth_tx_adapter]   (@ref rte_event_eth_tx_adapter.h),
diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in
index 325a0195c6..549f373b8a 100644
--- a/doc/api/doxy-api.conf.in
+++ b/doc/api/doxy-api.conf.in
@@ -44,6 +44,7 @@ INPUT   = @TOPDIR@/doc/api/doxy-api-index.md \
   @TOPDIR@/lib/gro \
   @TOPDIR@/lib/gso \
   @TOPDIR@/lib/hash \
+  @TOPDIR@/lib/hcdev \
   @TOPDIR@/lib/ip_frag \
   @TOPDIR@/lib/ipsec \
   @TOPDIR@/lib/jobstats \
diff --git a/doc/guides/conf.py b/doc/guides/conf.py
index 67d2dd62c7..67ad2c8090 100644
--- a/doc/guides/conf.py
+++ b/doc/guides/conf.py
@@ -152,6 +152,9 @@ def generate_overview_table(output_filename, table_id, 
section, table_name, titl
 name = ini_filename[:-4]
 name = name.replace('_vf', 'vf')
 pmd_names.append(name)
+if not pmd_names:
+# Add an empty column if table is empty (required by RST syntax)
+pmd_names.append(' ')
 
 # Pad the table header names.
 max_header_len = len(max(pmd_names, key=len))
@@ -388,6 +391,11 @@ def setup(app):
 'Features',
 'Features availability in bbdev drivers',
 'Feature')
+table_file = dirname(__file__) + '/hcdevs/overview_feature_table.txt'
+generate_overview_table(table_file, 1,
+'Features',
+'Features availability in hcdev drivers',
+'Feature')
 
 if L

[dpdk-dev] [RFC PATCH v2 2/7] hcdev: add event notification

2021-07-30 Thread Thomas Monjalon
Callback functions may be registered for a device event.
Callback management is per-process and not thread-safe.

The events RTE_HCDEV_EVENT_NEW and RTE_HCDEV_EVENT_DEL
are notified respectively after creation and before removal
of a device, as part of the library functions.
Some future events may be emitted from drivers.

Signed-off-by: Thomas Monjalon 
---
 lib/hcdev/hcdev.c| 137 +++
 lib/hcdev/hcdev_driver.h |   7 ++
 lib/hcdev/rte_hcdev.h|  71 
 lib/hcdev/version.map|   3 +
 4 files changed, 218 insertions(+)

diff --git a/lib/hcdev/hcdev.c b/lib/hcdev/hcdev.c
index ea587b3713..2a7ce1ccd8 100644
--- a/lib/hcdev/hcdev.c
+++ b/lib/hcdev/hcdev.c
@@ -3,6 +3,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -27,6 +28,15 @@ static int16_t hcdev_max;
 /* Number of currently valid devices */
 static int16_t hcdev_count;
 
+/* Event callback object */
+struct rte_hcdev_callback {
+   TAILQ_ENTRY(rte_hcdev_callback) next;
+   rte_hcdev_callback_t *function;
+   void *user_data;
+   enum rte_hcdev_event event;
+};
+static void hcdev_free_callbacks(struct rte_hcdev *dev);
+
 int
 rte_hcdev_init(size_t dev_max)
 {
@@ -166,6 +176,7 @@ rte_hcdev_allocate(const char *name)
dev->info.name = dev->name;
dev->info.dev_id = dev_id;
dev->info.numa_node = -1;
+   TAILQ_INIT(&dev->callbacks);
 
hcdev_count++;
HCDEV_LOG(DEBUG, "new device %s (id %d) of total %d",
@@ -180,6 +191,7 @@ rte_hcdev_complete_new(struct rte_hcdev *dev)
return;
 
dev->state = RTE_HCDEV_STATE_INITIALIZED;
+   rte_hcdev_notify(dev, RTE_HCDEV_EVENT_NEW);
 }
 
 int
@@ -192,6 +204,9 @@ rte_hcdev_release(struct rte_hcdev *dev)
 
HCDEV_LOG(DEBUG, "free device %s (id %d)",
dev->info.name, dev->info.dev_id);
+   rte_hcdev_notify(dev, RTE_HCDEV_EVENT_DEL);
+
+   hcdev_free_callbacks(dev);
dev->state = RTE_HCDEV_STATE_UNUSED;
hcdev_count--;
 
@@ -224,6 +239,128 @@ rte_hcdev_close(int16_t dev_id)
return firsterr;
 }
 
+int
+rte_hcdev_callback_register(int16_t dev_id, enum rte_hcdev_event event,
+   rte_hcdev_callback_t *function, void *user_data)
+{
+   int16_t next_dev, last_dev;
+   struct rte_hcdev_callback_list *callbacks;
+   struct rte_hcdev_callback *callback;
+
+   if (!rte_hcdev_is_valid(dev_id) && dev_id != RTE_HCDEV_ID_ANY) {
+   HCDEV_LOG(ERR, "register callback of invalid ID %d", dev_id);
+   rte_errno = ENODEV;
+   return -rte_errno;
+   }
+   if (function == NULL) {
+   HCDEV_LOG(ERR, "cannot register callback without function");
+   rte_errno = EINVAL;
+   return -rte_errno;
+   }
+
+   if (dev_id == RTE_HCDEV_ID_ANY) {
+   next_dev = 0;
+   last_dev = hcdev_max - 1;
+   } else {
+   next_dev = last_dev = dev_id;
+   }
+   do {
+   callbacks = &hcdevs[next_dev].callbacks;
+
+   /* check if not already registered */
+   TAILQ_FOREACH(callback, callbacks, next) {
+   if (callback->event == event &&
+   callback->function == function &&
+   callback->user_data == user_data) {
+   HCDEV_LOG(INFO, "callback already registered");
+   return 0;
+   }
+   }
+
+   callback = malloc(sizeof(*callback));
+   if (callback == NULL) {
+   HCDEV_LOG(ERR, "cannot allocate callback");
+   return -ENOMEM;
+   }
+   callback->function = function;
+   callback->user_data = user_data;
+   callback->event = event;
+   TAILQ_INSERT_TAIL(callbacks, callback, next);
+
+   } while (++next_dev <= last_dev);
+
+   return 0;
+}
+
+int
+rte_hcdev_callback_unregister(int16_t dev_id, enum rte_hcdev_event event,
+   rte_hcdev_callback_t *function, void *user_data)
+{
+   int16_t next_dev, last_dev;
+   struct rte_hcdev_callback_list *callbacks;
+   struct rte_hcdev_callback *callback, *next_callback;
+
+   if (!rte_hcdev_is_valid(dev_id) && dev_id != RTE_HCDEV_ID_ANY) {
+   HCDEV_LOG(ERR, "unregister callback of invalid ID %d", dev_id);
+   rte_errno = ENODEV;
+   return -rte_errno;
+   }
+   if (function == NULL) {
+   HCDEV_LOG(ERR, "cannot unregister callback without function");
+   rte_errno = EINVAL;
+   return -rte_errno;
+   }
+
+   if (dev_id == RTE_HCDEV_ID_ANY) {
+   next_dev = 0;
+   last_dev = hcdev_max - 1;
+   } else {
+   next_dev = last_dev = dev_id;
+   }
+
+   d

[dpdk-dev] [RFC PATCH v2 3/7] hcdev: add child device representing a device context

2021-07-30 Thread Thomas Monjalon
The computing device may operate in some isolated contexts.
Memory and processing are isolated in a silo represented by
a child device.
The context is provided as an opaque by the caller of
rte_hcdev_add_child().

Signed-off-by: Thomas Monjalon 
---
 lib/hcdev/hcdev.c| 45 --
 lib/hcdev/hcdev_driver.h |  2 +-
 lib/hcdev/rte_hcdev.h| 69 +---
 lib/hcdev/version.map|  1 +
 4 files changed, 110 insertions(+), 7 deletions(-)

diff --git a/lib/hcdev/hcdev.c b/lib/hcdev/hcdev.c
index 2a7ce1ccd8..d40010749a 100644
--- a/lib/hcdev/hcdev.c
+++ b/lib/hcdev/hcdev.c
@@ -79,13 +79,22 @@ rte_hcdev_is_valid(int16_t dev_id)
return false;
 }
 
+static bool
+hcdev_match_parent(int16_t dev_id, int16_t parent)
+{
+   if (parent == RTE_HCDEV_ID_ANY)
+   return true;
+   return hcdevs[dev_id].info.parent == parent;
+}
+
 int16_t
-rte_hcdev_find_next(int16_t dev_id)
+rte_hcdev_find_next(int16_t dev_id, int16_t parent)
 {
if (dev_id < 0)
dev_id = 0;
while (dev_id < hcdev_max &&
-   hcdevs[dev_id].state == RTE_HCDEV_STATE_UNUSED)
+   (hcdevs[dev_id].state == RTE_HCDEV_STATE_UNUSED ||
+   !hcdev_match_parent(dev_id, parent)))
dev_id++;
 
if (dev_id >= hcdev_max)
@@ -176,6 +185,7 @@ rte_hcdev_allocate(const char *name)
dev->info.name = dev->name;
dev->info.dev_id = dev_id;
dev->info.numa_node = -1;
+   dev->info.parent = RTE_HCDEV_ID_NONE;
TAILQ_INIT(&dev->callbacks);
 
hcdev_count++;
@@ -184,6 +194,28 @@ rte_hcdev_allocate(const char *name)
return dev;
 }
 
+int16_t
+rte_hcdev_add_child(const char *name, int16_t parent, uint64_t child_context)
+{
+   struct rte_hcdev *dev;
+
+   if (!rte_hcdev_is_valid(parent)) {
+   HCDEV_LOG(ERR, "add child to invalid parent ID %d", parent);
+   rte_errno = ENODEV;
+   return -rte_errno;
+   }
+
+   dev = rte_hcdev_allocate(name);
+   if (dev == NULL)
+   return -rte_errno;
+
+   dev->info.parent = parent;
+   dev->info.context = child_context;
+
+   rte_hcdev_complete_new(dev);
+   return dev->info.dev_id;
+}
+
 void
 rte_hcdev_complete_new(struct rte_hcdev *dev)
 {
@@ -197,10 +229,19 @@ rte_hcdev_complete_new(struct rte_hcdev *dev)
 int
 rte_hcdev_release(struct rte_hcdev *dev)
 {
+   int16_t dev_id, child;
+
if (dev == NULL) {
rte_errno = ENODEV;
return -rte_errno;
}
+   dev_id = dev->info.dev_id;
+   RTE_HCDEV_FOREACH_CHILD(child, dev_id) {
+   HCDEV_LOG(ERR, "cannot release device %d with child %d",
+   dev_id, child);
+   rte_errno = EBUSY;
+   return -rte_errno;
+   }
 
HCDEV_LOG(DEBUG, "free device %s (id %d)",
dev->info.name, dev->info.dev_id);
diff --git a/lib/hcdev/hcdev_driver.h b/lib/hcdev/hcdev_driver.h
index 80d11bd612..39f6fc57ab 100644
--- a/lib/hcdev/hcdev_driver.h
+++ b/lib/hcdev/hcdev_driver.h
@@ -31,7 +31,7 @@ typedef int (rte_hcdev_info_get_t)(struct rte_hcdev *dev, 
struct rte_hcdev_info
 struct rte_hcdev_ops {
/* Get device info. If NULL, info is just copied. */
rte_hcdev_info_get_t *dev_info_get;
-   /* Close device. */
+   /* Close device or child context. */
rte_hcdev_close_t *dev_close;
 };
 
diff --git a/lib/hcdev/rte_hcdev.h b/lib/hcdev/rte_hcdev.h
index 8131e4045a..518020fd2f 100644
--- a/lib/hcdev/rte_hcdev.h
+++ b/lib/hcdev/rte_hcdev.h
@@ -42,8 +42,12 @@ extern "C" {
 struct rte_hcdev_info {
/** Unique identifier name. */
const char *name;
+   /** Opaque handler of the device context. */
+   uint64_t context;
/** Device ID. */
int16_t dev_id;
+   /** ID of the parent device, RTE_HCDEV_ID_NONE if no parent */
+   int16_t parent;
/** Total processors available on device. */
uint32_t processor_count;
/** Total memory available on device. */
@@ -112,6 +116,33 @@ uint16_t rte_hcdev_count_avail(void);
 __rte_experimental
 bool rte_hcdev_is_valid(int16_t dev_id);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Create a virtual device representing a context in the parent device.
+ *
+ * @param name
+ *   Unique string to identify the device.
+ * @param parent
+ *   Device ID of the parent.
+ * @param child_context
+ *   Opaque context handler.
+ *
+ * @return
+ *   Device ID of the new created child, -rte_errno otherwise:
+ *   - EINVAL if empty name
+ *   - ENAMETOOLONG if long name
+ *   - EEXIST if existing device name
+ *   - ENODEV if invalid parent
+ *   - EPERM if secondary process
+ *   - ENOENT if too many devices
+ *   - ENOMEM if out of space
+ */
+__rte_experimental
+int16_t rte_hcdev_add_child(const char *name,

[dpdk-dev] [RFC PATCH v2 4/7] hcdev: support multi-process

2021-07-30 Thread Thomas Monjalon
The device data shared between processes are moved in a struct
allocated in a shared memory (a new memzone for all hcdevs).
The main struct rte_hcdev references the shared memory
via the pointer mpshared.

The API function rte_hcdev_attach() is added to attach a device
from the secondary process.
The function rte_hcdev_allocate() can be used only by primary process.

Signed-off-by: Thomas Monjalon 
---
 lib/hcdev/hcdev.c| 114 ---
 lib/hcdev/hcdev_driver.h |  23 ++--
 lib/hcdev/rte_hcdev.h|   3 +-
 lib/hcdev/version.map|   1 +
 4 files changed, 115 insertions(+), 26 deletions(-)

diff --git a/lib/hcdev/hcdev.c b/lib/hcdev/hcdev.c
index d40010749a..a7badd122b 100644
--- a/lib/hcdev/hcdev.c
+++ b/lib/hcdev/hcdev.c
@@ -5,6 +5,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -28,6 +29,12 @@ static int16_t hcdev_max;
 /* Number of currently valid devices */
 static int16_t hcdev_count;
 
+/* Shared memory between processes. */
+static const char *HCDEV_MEMZONE = "rte_hcdev_shared";
+static struct {
+   __extension__ struct rte_hcdev_mpshared hcdevs[0];
+} *hcdev_shared_mem;
+
 /* Event callback object */
 struct rte_hcdev_callback {
TAILQ_ENTRY(rte_hcdev_callback) next;
@@ -40,6 +47,8 @@ static void hcdev_free_callbacks(struct rte_hcdev *dev);
 int
 rte_hcdev_init(size_t dev_max)
 {
+   const struct rte_memzone *memzone;
+
if (dev_max == 0 || dev_max > INT16_MAX) {
HCDEV_LOG(ERR, "invalid array size");
rte_errno = EINVAL;
@@ -60,6 +69,23 @@ rte_hcdev_init(size_t dev_max)
return -rte_errno;
}
 
+   if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+   memzone = rte_memzone_reserve(HCDEV_MEMZONE,
+   sizeof(*hcdev_shared_mem) +
+   sizeof(*hcdev_shared_mem->hcdevs) * dev_max,
+   SOCKET_ID_ANY, 0);
+   } else {
+   memzone = rte_memzone_lookup(HCDEV_MEMZONE);
+   }
+   if (memzone == NULL) {
+   HCDEV_LOG(ERR, "cannot initialize shared memory");
+   free(hcdevs);
+   hcdevs = NULL;
+   rte_errno = ENOMEM;
+   return -rte_errno;
+   }
+   hcdev_shared_mem = memzone->addr;
+
hcdev_max = dev_max;
return 0;
 }
@@ -74,7 +100,7 @@ bool
 rte_hcdev_is_valid(int16_t dev_id)
 {
if (dev_id >= 0 && dev_id < hcdev_max &&
-   hcdevs[dev_id].state == RTE_HCDEV_STATE_INITIALIZED)
+   hcdevs[dev_id].process_state == RTE_HCDEV_STATE_INITIALIZED)
return true;
return false;
 }
@@ -84,7 +110,7 @@ hcdev_match_parent(int16_t dev_id, int16_t parent)
 {
if (parent == RTE_HCDEV_ID_ANY)
return true;
-   return hcdevs[dev_id].info.parent == parent;
+   return hcdevs[dev_id].mpshared->info.parent == parent;
 }
 
 int16_t
@@ -93,7 +119,7 @@ rte_hcdev_find_next(int16_t dev_id, int16_t parent)
if (dev_id < 0)
dev_id = 0;
while (dev_id < hcdev_max &&
-   (hcdevs[dev_id].state == RTE_HCDEV_STATE_UNUSED ||
+   (hcdevs[dev_id].process_state == RTE_HCDEV_STATE_UNUSED 
||
!hcdev_match_parent(dev_id, parent)))
dev_id++;
 
@@ -108,7 +134,7 @@ hcdev_find_free_id(void)
int16_t dev_id;
 
for (dev_id = 0; dev_id < hcdev_max; dev_id++) {
-   if (hcdevs[dev_id].state == RTE_HCDEV_STATE_UNUSED)
+   if (hcdevs[dev_id].process_state == RTE_HCDEV_STATE_UNUSED)
return dev_id;
}
return RTE_HCDEV_ID_NONE;
@@ -135,7 +161,7 @@ rte_hcdev_get_by_name(const char *name)
 
RTE_HCDEV_FOREACH(dev_id) {
dev = &hcdevs[dev_id];
-   if (strncmp(name, dev->name, RTE_DEV_NAME_MAX_LEN) == 0)
+   if (strncmp(name, dev->mpshared->name, RTE_DEV_NAME_MAX_LEN) == 
0)
return dev;
}
return NULL;
@@ -177,16 +203,20 @@ rte_hcdev_allocate(const char *name)
dev = &hcdevs[dev_id];
memset(dev, 0, sizeof(*dev));
 
-   if (rte_strscpy(dev->name, name, RTE_DEV_NAME_MAX_LEN) < 0) {
+   dev->mpshared = &hcdev_shared_mem->hcdevs[dev_id];
+   memset(dev->mpshared, 0, sizeof(*dev->mpshared));
+
+   if (rte_strscpy(dev->mpshared->name, name, RTE_DEV_NAME_MAX_LEN) < 0) {
HCDEV_LOG(ERR, "device name too long: %s", name);
rte_errno = ENAMETOOLONG;
return NULL;
}
-   dev->info.name = dev->name;
-   dev->info.dev_id = dev_id;
-   dev->info.numa_node = -1;
-   dev->info.parent = RTE_HCDEV_ID_NONE;
+   dev->mpshared->info.name = dev->mpshared->name;
+   dev->mpshared->info.dev_id = dev_id;
+   dev->mpshared->info.numa_node = -1;
+   dev->mpshared->info.parent = RTE_HC

[dpdk-dev] [RFC PATCH v2 5/7] hcdev: add memory API

2021-07-30 Thread Thomas Monjalon
From: Elena Agostini 

In heterogeneous computing system, processing is not only in the CPU.
Some tasks can be delegated to devices working in parallel.
Such workload distribution can be achieved by sharing some memory.

As a first step, the features are focused on memory management.
A function allows to allocate memory inside the device,
or in the main (CPU) memory while making it visible for the device.
This memory may be used to save packets or for synchronization data.

The next step should focus on GPU processing task control.

Signed-off-by: Elena Agostini 
Signed-off-by: Thomas Monjalon 
---
 doc/guides/hcdevs/features/default.ini |  3 +
 doc/guides/rel_notes/release_21_08.rst |  1 +
 lib/hcdev/hcdev.c  | 88 ++
 lib/hcdev/hcdev_driver.h   |  9 +++
 lib/hcdev/rte_hcdev.h  | 53 
 lib/hcdev/version.map  |  2 +
 6 files changed, 156 insertions(+)

diff --git a/doc/guides/hcdevs/features/default.ini 
b/doc/guides/hcdevs/features/default.ini
index f988ee73d4..ee32753d94 100644
--- a/doc/guides/hcdevs/features/default.ini
+++ b/doc/guides/hcdevs/features/default.ini
@@ -8,3 +8,6 @@
 ;
 [Features]
 Get device info=
+Share CPU memory with device   =
+Allocate device memory =
+Free memory=
diff --git a/doc/guides/rel_notes/release_21_08.rst 
b/doc/guides/rel_notes/release_21_08.rst
index fb350b4706..e955a331a6 100644
--- a/doc/guides/rel_notes/release_21_08.rst
+++ b/doc/guides/rel_notes/release_21_08.rst
@@ -58,6 +58,7 @@ New Features
 * **Introduced Heterogeneous Computing Device library with first features:**
 
   * Device information
+  * Memory management
 
 * **Added auxiliary bus support.**
 
diff --git a/lib/hcdev/hcdev.c b/lib/hcdev/hcdev.c
index a7badd122b..621e0b99bd 100644
--- a/lib/hcdev/hcdev.c
+++ b/lib/hcdev/hcdev.c
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -501,3 +502,90 @@ rte_hcdev_info_get(int16_t dev_id, struct rte_hcdev_info 
*info)
}
return HCDEV_DRV_RET(dev->ops.dev_info_get(dev, info));
 }
+
+#define RTE_HCDEV_MALLOC_FLAGS_ALL \
+   RTE_HCDEV_MALLOC_REGISTER_FROM_CPU
+#define RTE_HCDEV_MALLOC_FLAGS_RESERVED ~RTE_HCDEV_MALLOC_FLAGS_ALL
+
+void *
+rte_hcdev_malloc(int16_t dev_id, size_t size, uint32_t flags)
+{
+   struct rte_hcdev *dev;
+   void *ptr;
+   int ret;
+
+   dev = hcdev_get_by_id(dev_id);
+   if (dev == NULL) {
+   HCDEV_LOG(ERR, "alloc mem for invalid device ID %d", dev_id);
+   rte_errno = ENODEV;
+   return NULL;
+   }
+   if (flags & RTE_HCDEV_MALLOC_FLAGS_RESERVED) {
+   HCDEV_LOG(ERR, "alloc mem with reserved flag 0x%x",
+   flags & RTE_HCDEV_MALLOC_FLAGS_RESERVED);
+   rte_errno = EINVAL;
+   return NULL;
+   }
+
+   if (flags & RTE_HCDEV_MALLOC_REGISTER_FROM_CPU) {
+   if (dev->ops.mem_register == NULL) {
+   HCDEV_LOG(ERR, "mem registration not supported");
+   rte_errno = ENOTSUP;
+   return NULL;
+   }
+   } else {
+   if (dev->ops.mem_alloc == NULL) {
+   HCDEV_LOG(ERR, "mem allocation not supported");
+   rte_errno = ENOTSUP;
+   return NULL;
+   }
+   }
+
+   if (size == 0) /* dry-run */
+   return NULL;
+
+   if (flags & RTE_HCDEV_MALLOC_REGISTER_FROM_CPU) {
+   ptr = rte_zmalloc(NULL, size, 0);
+   if (ptr == NULL) {
+   HCDEV_LOG(ERR, "cannot allocate CPU memory");
+   rte_errno = ENOMEM;
+   return NULL;
+   }
+   ret = dev->ops.mem_register(dev, size, ptr);
+   } else {
+   ret = dev->ops.mem_alloc(dev, size, &ptr);
+   }
+   /* TODO maintain a table of chunks registered/allocated */
+   switch (ret) {
+   case 0:
+   return ptr;
+   case -ENOMEM:
+   case -E2BIG:
+   rte_errno = -ret;
+   return NULL;
+   default:
+   rte_errno = EPERM;
+   return NULL;
+   }
+}
+
+int
+rte_hcdev_free(int16_t dev_id, void *ptr)
+{
+   struct rte_hcdev *dev;
+
+   dev = hcdev_get_by_id(dev_id);
+   if (dev == NULL) {
+   HCDEV_LOG(ERR, "free mem for invalid device ID %d", dev_id);
+   rte_errno = ENODEV;
+   return -rte_errno;
+   }
+
+   if (dev->ops.mem_free == NULL) {
+   rte_errno = ENOTSUP;
+   return -rte_errno;
+   }
+   return HCDEV_DRV_RET(dev->ops.mem_free(dev, ptr));
+   /* TODO unregister callback */
+   /* TODO rte_free CPU memory */
+}
diff --git a/lib/hcdev/hcdev_driver.h b/lib/hcdev/hcdev_driver.h
index f33b56947b..

[dpdk-dev] [RFC PATCH v2 6/7] hcdev: add communication flag

2021-07-30 Thread Thomas Monjalon
From: Elena Agostini 

In heterogeneous computing system, processing is not only in the CPU.
Some tasks can be delegated to devices working in parallel.
When mixing network activity with task processing there may be the need
to put in communication the CPU with the device in order to synchronize
operations.

The purpose of this flag is to allow the CPU and the device to
exchange ACKs. A possible use-case is described below.

CPU:
- Trigger some task on the device
- Prepare some data
- Signal to the device the data is ready updating the communication flag

Device:
- Do some pre-processing
- Wait for more data from the CPU polling on the communication flag
- Consume the data prepared by the CPU

Signed-off-by: Elena Agostini 
---
 lib/hcdev/hcdev.c |  71 
 lib/hcdev/rte_hcdev.h | 107 ++
 lib/hcdev/version.map |   4 ++
 3 files changed, 182 insertions(+)

diff --git a/lib/hcdev/hcdev.c b/lib/hcdev/hcdev.c
index 621e0b99bd..e391988e73 100644
--- a/lib/hcdev/hcdev.c
+++ b/lib/hcdev/hcdev.c
@@ -589,3 +589,74 @@ rte_hcdev_free(int16_t dev_id, void *ptr)
/* TODO unregister callback */
/* TODO rte_free CPU memory */
 }
+
+int
+rte_hcdev_comm_create_flag(uint16_t dev_id, struct rte_hcdev_comm_flag *hcflag,
+   enum rte_hcdev_comm_flag_type mtype)
+{
+   size_t flag_size;
+
+   if (hcflag == NULL) {
+   rte_errno = EINVAL;
+   return -rte_errno;
+   }
+   if (mtype != RTE_HCDEV_COMM_FLAG_CPU) {
+   rte_errno = EINVAL;
+   return -rte_errno;
+   }
+
+   flag_size = sizeof(uint32_t);
+
+   hcflag->ptr = rte_hcdev_malloc(dev_id, flag_size,
+   RTE_HCDEV_MALLOC_REGISTER_FROM_CPU);
+   if (hcflag->ptr == NULL)
+   return -rte_errno;
+
+   hcflag->mtype = mtype;
+   return 0;
+}
+
+int
+rte_hcdev_comm_destroy_flag(uint16_t dev_id, struct rte_hcdev_comm_flag 
*hcflag)
+{
+   if (hcflag == NULL) {
+   rte_errno = EINVAL;
+   return -rte_errno;
+   }
+
+   return rte_hcdev_free(dev_id, hcflag->ptr);
+}
+
+int
+rte_hcdev_comm_set_flag(struct rte_hcdev_comm_flag *hcflag, uint32_t val)
+{
+   if (hcflag == NULL) {
+   rte_errno = EINVAL;
+   return -rte_errno;
+   }
+   if (hcflag->mtype != RTE_HCDEV_COMM_FLAG_CPU) {
+   rte_errno = EINVAL;
+   return -rte_errno;
+   }
+
+   RTE_HCDEV_VOLATILE(*hcflag->ptr) = val;
+
+   return 0;
+}
+
+int
+rte_hcdev_comm_get_flag_value(struct rte_hcdev_comm_flag *hcflag, uint32_t 
*val)
+{
+   if (hcflag == NULL) {
+   rte_errno = EINVAL;
+   return -rte_errno;
+   }
+   if (hcflag->mtype != RTE_HCDEV_COMM_FLAG_CPU) {
+   rte_errno = EINVAL;
+   return -rte_errno;
+   }
+
+   *val = RTE_HCDEV_VOLATILE(*hcflag->ptr);
+
+   return 0;
+}
diff --git a/lib/hcdev/rte_hcdev.h b/lib/hcdev/rte_hcdev.h
index 11895d9486..7b58041b3c 100644
--- a/lib/hcdev/rte_hcdev.h
+++ b/lib/hcdev/rte_hcdev.h
@@ -38,6 +38,9 @@ extern "C" {
 /** Catch-all callback data. */
 #define RTE_HCDEV_CALLBACK_ANY_DATA ((void *)-1)
 
+/** Access variable as volatile. */
+#define RTE_HCDEV_VOLATILE(x) (*(volatile typeof(x)*)&(x))
+
 /** Store device info. */
 struct rte_hcdev_info {
/** Unique identifier name. */
@@ -68,6 +71,18 @@ enum rte_hcdev_event {
 typedef void (rte_hcdev_callback_t)(int16_t dev_id,
enum rte_hcdev_event event, void *user_data);
 
+/** Memory where communication flag is allocated. */
+enum rte_hcdev_comm_flag_type {
+   /** Allocate flag on CPU memory visible from device. */
+   RTE_HCDEV_COMM_FLAG_CPU = 0,
+};
+
+/** Communication flag to coordinate CPU with the device. */
+struct rte_hcdev_comm_flag {
+   uint32_t *ptr;
+   enum rte_hcdev_comm_flag_type mtype;
+};
+
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice.
@@ -346,6 +361,98 @@ __rte_alloc_size(2);
 __rte_experimental
 int rte_hcdev_free(int16_t dev_id, void *ptr);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Create a communication flag that can be shared
+ * between CPU threads and device workload to exchange some status info
+ * (e.g. work is done, processing can start, etc..).
+ *
+ * @param dev_id
+ *   Reference device ID.
+ * @param hcflag
+ *   Pointer to the memory area of the hcflag structure.
+ * @param mtype
+ *   Type of memory to allocate the communication flag.
+ *
+ * @return
+ *   0 on success, -rte_errno otherwise:
+ *   - ENODEV if invalid dev_id
+ *   - EINVAL if invalid inputs
+ *   - ENOTSUP if operation not supported by the driver
+ *   - ENOMEM if out of space
+ *   - EPERM if driver error
+ */
+__rte_experimental
+int rte_hcdev_comm_create_flag(uint16_t dev_id,
+   struct rte_hcdev_comm_flag *hcflag,
+  

[dpdk-dev] [RFC PATCH v2 7/7] hcdev: add communication list

2021-07-30 Thread Thomas Monjalon
From: Elena Agostini 

In heterogeneous computing system, processing is not only in the CPU.
Some tasks can be delegated to devices working in parallel.
When mixing network activity with task processing there may be the need
to put in communication the CPU with the device in order to synchronize
operations.

An example could be a receive-and-process application
where CPU is responsible for receiving packets in multiple mbufs
and the device is responsible for processing the content of those packets.

The purpose of this list is to provide a buffer in CPU memory visible
from the device that can be treated as a circular buffer
to let the CPU provide fondamental info of received packets to the device.

A possible use-case is described below.

CPU:
- Trigger some task on the device
- in a loop:
- receive a number of packets
- provide packets info to the device

Device:
- Do some pre-processing
- Wait to receive a new set of packet to be processed

Layout of a communication list would be:

 ---
|   0| => pkt_list
| status |
| #pkts  |
 ---
|   1| => pkt_list
| status |
| #pkts  |
 ---
|   2| => pkt_list
| status |
| #pkts  |
 ---
|    | => pkt_list
 ---

Signed-off-by: Elena Agostini 
---
 lib/hcdev/hcdev.c | 127 
 lib/hcdev/meson.build |   2 +
 lib/hcdev/rte_hcdev.h | 132 ++
 lib/hcdev/version.map |   4 ++
 4 files changed, 265 insertions(+)

diff --git a/lib/hcdev/hcdev.c b/lib/hcdev/hcdev.c
index e391988e73..572f1713fc 100644
--- a/lib/hcdev/hcdev.c
+++ b/lib/hcdev/hcdev.c
@@ -660,3 +660,130 @@ rte_hcdev_comm_get_flag_value(struct rte_hcdev_comm_flag 
*hcflag, uint32_t *val)
 
return 0;
 }
+
+struct rte_hcdev_comm_list *
+rte_hcdev_comm_create_list(uint16_t dev_id,
+   uint32_t num_comm_items)
+{
+   struct rte_hcdev_comm_list *comm_list;
+   uint32_t idx_l;
+
+   if (num_comm_items == 0) {
+   rte_errno = EINVAL;
+   return NULL;
+   }
+
+   comm_list = rte_hcdev_malloc(dev_id,
+   sizeof(struct rte_hcdev_comm_list) * num_comm_items,
+   RTE_HCDEV_MALLOC_REGISTER_FROM_CPU);
+   if (comm_list == NULL) {
+   rte_errno = ENOMEM;
+   return NULL;
+   }
+
+   for (idx_l = 0; idx_l < num_comm_items; idx_l++) {
+   comm_list[idx_l].pkt_list =
+   rte_hcdev_malloc(dev_id,
+   sizeof(struct rte_hcdev_comm_pkt) *
+   RTE_HCDEV_COMM_LIST_PKTS_MAX,
+   RTE_HCDEV_MALLOC_REGISTER_FROM_CPU);
+   if (comm_list[idx_l].pkt_list == NULL) {
+   rte_errno = ENOMEM;
+   return NULL;
+   }
+
+   RTE_HCDEV_VOLATILE(comm_list[idx_l].status) =
+   RTE_HCDEV_COMM_LIST_FREE;
+   comm_list[idx_l].num_pkts = 0;
+   }
+
+   return comm_list;
+}
+
+int
+rte_hcdev_comm_destroy_list(uint16_t dev_id,
+   struct rte_hcdev_comm_list *comm_list,
+   uint32_t num_comm_items)
+{
+   uint32_t idx_l;
+
+   if (comm_list == NULL) {
+   rte_errno = EINVAL;
+   return -rte_errno;
+   }
+
+   for (idx_l = 0; idx_l < num_comm_items; idx_l++)
+   rte_hcdev_free(dev_id, comm_list[idx_l].pkt_list);
+   rte_hcdev_free(dev_id, comm_list);
+
+   return 0;
+}
+
+int
+rte_hcdev_comm_populate_list_pkts(struct rte_hcdev_comm_list *comm_list_item,
+   struct rte_mbuf **mbufs, uint32_t num_mbufs)
+{
+   uint32_t idx;
+
+   if (comm_list_item == NULL || comm_list_item->pkt_list == NULL ||
+   mbufs == NULL || num_mbufs > 
RTE_HCDEV_COMM_LIST_PKTS_MAX) {
+   rte_errno = EINVAL;
+   return -rte_errno;
+   }
+
+   for (idx = 0; idx < num_mbufs; idx++) {
+   /* support only unchained mbufs */
+   if (unlikely((mbufs[idx]->nb_segs > 1) ||
+   (mbufs[idx]->next != NULL) ||
+   (mbufs[idx]->data_len != mbufs[idx]->pkt_len))) 
{
+   rte_errno = ENOTSUP;
+   return -rte_errno;
+   }
+   comm_list_item->pkt_list[idx].addr =
+   rte_pktmbuf_mtod_offset(mbufs[idx], uintptr_t, 
0);
+   comm_list_item->pkt_list[idx].size = mbufs[idx]->pkt_len;
+   comm_list_item->pkt_list[idx].opaque = mbufs[idx];
+   }
+
+   RTE_HCDEV_VOLATILE(comm_list_item->num_pkts) = num_mbufs;
+   rte_mb();
+   RTE_HCDEV_VOLATILE(comm_list_item->status) = RTE_HCDEV_COMM_LIST_READY;
+
+   return 0;
+}
+
+int
+rte_hcdev_comm_cleanup_list(struct rte_hcdev_comm_list *comm_li

Re: [dpdk-dev] [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free

2021-07-30 Thread Morten Brørup
> From: Olivier Matz [mailto:olivier.m...@6wind.com]
> Sent: Friday, 30 July 2021 14.37
> 
> Hi Thomas,
> 
> On Sat, Jul 24, 2021 at 10:47:34AM +0200, Thomas Monjalon wrote:
> > What's the follow-up for this patch?
> 
> Unfortunatly, I still don't have the time to work on this topic yet.
> 
> In my initial tests, in our lab, I didn't notice any performance
> regression, but Ali has seen an impact (0.5M PPS, but I don't know how
> much in percent).
> 
> 
> > 19/01/2021 15:04, Slava Ovsiienko:
> > > Hi, All
> > >
> > > Could we postpose this patch at least to rc2? We would like to
> conduct more investigations?
> > >
> > > With best regards, Slava
> > >
> > > From: Olivier Matz 
> > > > On Mon, Jan 18, 2021 at 05:52:32PM +, Ali Alnubani wrote:
> > > > > Hi,
> > > > > (Sorry had to resend this to some recipients due to mail server
> problems).
> > > > >
> > > > > Just confirming that I can still reproduce the regression with
> single core and
> > > > 64B frames on other servers.
> > > >
> > > > Many thanks for the feedback. Can you please detail what is the
> amount of
> > > > performance loss in percent, and confirm the test case? (I
> suppose it is
> > > > testpmd io forward).
> > > >
> > > > Unfortunatly, I won't be able to spend a lot of time on this soon
> (sorry for
> > > > that). So I see at least these 2 options:
> > > >
> > > > - postpone the patch again, until I can find more time to analyze
> > > >   and optimize
> > > > - apply the patch if the performance loss is acceptable compared
> to
> > > >   the added value of fixing a bug
> > > >
> > [...]
> 
> Statu quo...
> 
> Olivier
> 

The decision should be simple:

Does the DPDK project support segmented packets?
If yes, then apply the patch to fix the bug!

If anyone seriously cares about the regression it introduces, optimization 
patches are welcome later. We shouldn't wait for it.

If the patch is not applied, the documentation must be updated to mention that 
we are releasing DPDK with a known bug: that segmented packets are handled 
incorrectly in the scenario described in this patch.


Generally, there could be some performance to gain by not supporting segmented 
packets at all, as a compile time option. But that is a different discussion.


-Morten



[dpdk-dev] [PATCH] net/virtio: fix repeated memory free of vq

2021-07-30 Thread Gaoxiang Liu
When virtio_init_queue returns error, the memory of vq is freed.
But the value of hw->vqs[queue_idx] does not restore.`
If hw->vqs[queue_idx] != NULL, the memory of vq is freed again
in virtio_free_queues.

Fixes: 69c80d4ef89b ("net/virtio: allocate queue at init stage")
Cc: sta...@dpdk.org

Signed-off-by: Gaoxiang Liu 
---
 drivers/net/virtio/virtio_ethdev.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index 056830566..fc72d71cb 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -631,6 +631,7 @@ virtio_init_queue(struct rte_eth_dev *dev, uint16_t 
queue_idx)
rte_memzone_free(mz);
 free_vq:
rte_free(vq);
+   hw->vqs[queue_idx] = NULL;
 
return ret;
 }
-- 
2.32.0




Re: [dpdk-dev] [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free

2021-07-30 Thread Thomas Monjalon
30/07/2021 16:35, Morten Brørup:
> > From: Olivier Matz [mailto:olivier.m...@6wind.com]
> > Sent: Friday, 30 July 2021 14.37
> > 
> > Hi Thomas,
> > 
> > On Sat, Jul 24, 2021 at 10:47:34AM +0200, Thomas Monjalon wrote:
> > > What's the follow-up for this patch?
> > 
> > Unfortunatly, I still don't have the time to work on this topic yet.
> > 
> > In my initial tests, in our lab, I didn't notice any performance
> > regression, but Ali has seen an impact (0.5M PPS, but I don't know how
> > much in percent).
> > 
> > 
> > > 19/01/2021 15:04, Slava Ovsiienko:
> > > > Hi, All
> > > >
> > > > Could we postpose this patch at least to rc2? We would like to
> > conduct more investigations?
> > > >
> > > > With best regards, Slava
> > > >
> > > > From: Olivier Matz 
> > > > > On Mon, Jan 18, 2021 at 05:52:32PM +, Ali Alnubani wrote:
> > > > > > Hi,
> > > > > > (Sorry had to resend this to some recipients due to mail server
> > problems).
> > > > > >
> > > > > > Just confirming that I can still reproduce the regression with
> > single core and
> > > > > 64B frames on other servers.
> > > > >
> > > > > Many thanks for the feedback. Can you please detail what is the
> > amount of
> > > > > performance loss in percent, and confirm the test case? (I
> > suppose it is
> > > > > testpmd io forward).
> > > > >
> > > > > Unfortunatly, I won't be able to spend a lot of time on this soon
> > (sorry for
> > > > > that). So I see at least these 2 options:
> > > > >
> > > > > - postpone the patch again, until I can find more time to analyze
> > > > >   and optimize
> > > > > - apply the patch if the performance loss is acceptable compared
> > to
> > > > >   the added value of fixing a bug
> > > > >
> > > [...]
> > 
> > Statu quo...
> > 
> > Olivier
> > 
> 
> The decision should be simple:
> 
> Does the DPDK project support segmented packets?
> If yes, then apply the patch to fix the bug!
> 
> If anyone seriously cares about the regression it introduces, optimization 
> patches are welcome later. We shouldn't wait for it.

You're right, but the regression is flagged to a 4-years old patch,
that's why I don't consider it as urgent.

> If the patch is not applied, the documentation must be updated to mention 
> that we are releasing DPDK with a known bug: that segmented packets are 
> handled incorrectly in the scenario described in this patch.

Yes, would be good to document the known issue,
no matter how old it is.

> Generally, there could be some performance to gain by not supporting 
> segmented packets at all, as a compile time option. But that is a different 
> discussion.
> 
> 
> -Morten





Re: [dpdk-dev] [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free

2021-07-30 Thread Olivier Matz
Hi,

On Fri, Jul 30, 2021 at 04:54:05PM +0200, Thomas Monjalon wrote:
> 30/07/2021 16:35, Morten Brørup:
> > > From: Olivier Matz [mailto:olivier.m...@6wind.com]
> > > Sent: Friday, 30 July 2021 14.37
> > > 
> > > Hi Thomas,
> > > 
> > > On Sat, Jul 24, 2021 at 10:47:34AM +0200, Thomas Monjalon wrote:
> > > > What's the follow-up for this patch?
> > > 
> > > Unfortunatly, I still don't have the time to work on this topic yet.
> > > 
> > > In my initial tests, in our lab, I didn't notice any performance
> > > regression, but Ali has seen an impact (0.5M PPS, but I don't know how
> > > much in percent).
> > > 
> > > 
> > > > 19/01/2021 15:04, Slava Ovsiienko:
> > > > > Hi, All
> > > > >
> > > > > Could we postpose this patch at least to rc2? We would like to
> > > conduct more investigations?
> > > > >
> > > > > With best regards, Slava
> > > > >
> > > > > From: Olivier Matz 
> > > > > > On Mon, Jan 18, 2021 at 05:52:32PM +, Ali Alnubani wrote:
> > > > > > > Hi,
> > > > > > > (Sorry had to resend this to some recipients due to mail server
> > > problems).
> > > > > > >
> > > > > > > Just confirming that I can still reproduce the regression with
> > > single core and
> > > > > > 64B frames on other servers.
> > > > > >
> > > > > > Many thanks for the feedback. Can you please detail what is the
> > > amount of
> > > > > > performance loss in percent, and confirm the test case? (I
> > > suppose it is
> > > > > > testpmd io forward).
> > > > > >
> > > > > > Unfortunatly, I won't be able to spend a lot of time on this soon
> > > (sorry for
> > > > > > that). So I see at least these 2 options:
> > > > > >
> > > > > > - postpone the patch again, until I can find more time to analyze
> > > > > >   and optimize
> > > > > > - apply the patch if the performance loss is acceptable compared
> > > to
> > > > > >   the added value of fixing a bug
> > > > > >
> > > > [...]
> > > 
> > > Statu quo...
> > > 
> > > Olivier
> > > 
> > 
> > The decision should be simple:
> > 
> > Does the DPDK project support segmented packets?
> > If yes, then apply the patch to fix the bug!
> > 
> > If anyone seriously cares about the regression it introduces, optimization 
> > patches are welcome later. We shouldn't wait for it.
> 
> You're right, but the regression is flagged to a 4-years old patch,
> that's why I don't consider it as urgent.
> 
> > If the patch is not applied, the documentation must be updated to mention 
> > that we are releasing DPDK with a known bug: that segmented packets are 
> > handled incorrectly in the scenario described in this patch.
> 
> Yes, would be good to document the known issue,
> no matter how old it is.

The problem description could be something like this:

  It is expected that free mbufs have their field m->nb_seg set to 1, so
  that when it is allocated, the user does not need to set its
  value. The mbuf free functions are responsible of resetting this field
  to 1 before returning the mbuf to the pool.

  When a multi-segment mbuf is freed, the m->nb_seg field is not reset
  to 1 for the last segment of the chain. On next allocation of this
  segment, if the field is not explicitly reset by the user, an invalid
  mbuf can be created, and can cause an undefined behavior.


> > Generally, there could be some performance to gain by not supporting 
> > segmented packets at all, as a compile time option. But that is a different 
> > discussion.
> > 
> > 
> > -Morten
> 
> 
> 


Re: [dpdk-dev] [pull-request] dpdk-next-eventdev - v21.08-rc3

2021-07-30 Thread Thomas Monjalon
30/07/2021 13:02, Jerin Jacob Kollanukkaran:
>   http://dpdk.org/git/next/dpdk-next-eventdev

Pulled, thanks.




Re: [dpdk-dev] [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free

2021-07-30 Thread Morten Brørup
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Olivier Matz
> Sent: Friday, 30 July 2021 17.15
> 
> Hi,
> 
> On Fri, Jul 30, 2021 at 04:54:05PM +0200, Thomas Monjalon wrote:
> > 30/07/2021 16:35, Morten Brørup:
> > > > From: Olivier Matz [mailto:olivier.m...@6wind.com]
> > > > Sent: Friday, 30 July 2021 14.37
> > > >
> > > > Hi Thomas,
> > > >
> > > > On Sat, Jul 24, 2021 at 10:47:34AM +0200, Thomas Monjalon wrote:
> > > > > What's the follow-up for this patch?
> > > >
> > > > Unfortunatly, I still don't have the time to work on this topic
> yet.
> > > >
> > > > In my initial tests, in our lab, I didn't notice any performance
> > > > regression, but Ali has seen an impact (0.5M PPS, but I don't
> know how
> > > > much in percent).
> > > >
> > > >
> > > > > 19/01/2021 15:04, Slava Ovsiienko:
> > > > > > Hi, All
> > > > > >
> > > > > > Could we postpose this patch at least to rc2? We would like
> to
> > > > conduct more investigations?
> > > > > >
> > > > > > With best regards, Slava
> > > > > >
> > > > > > From: Olivier Matz 
> > > > > > > On Mon, Jan 18, 2021 at 05:52:32PM +, Ali Alnubani
> wrote:
> > > > > > > > Hi,
> > > > > > > > (Sorry had to resend this to some recipients due to mail
> server
> > > > problems).
> > > > > > > >
> > > > > > > > Just confirming that I can still reproduce the regression
> with
> > > > single core and
> > > > > > > 64B frames on other servers.
> > > > > > >
> > > > > > > Many thanks for the feedback. Can you please detail what is
> the
> > > > amount of
> > > > > > > performance loss in percent, and confirm the test case? (I
> > > > suppose it is
> > > > > > > testpmd io forward).
> > > > > > >
> > > > > > > Unfortunatly, I won't be able to spend a lot of time on
> this soon
> > > > (sorry for
> > > > > > > that). So I see at least these 2 options:
> > > > > > >
> > > > > > > - postpone the patch again, until I can find more time to
> analyze
> > > > > > >   and optimize
> > > > > > > - apply the patch if the performance loss is acceptable
> compared
> > > > to
> > > > > > >   the added value of fixing a bug
> > > > > > >
> > > > > [...]
> > > >
> > > > Statu quo...
> > > >
> > > > Olivier
> > > >
> > >
> > > The decision should be simple:
> > >
> > > Does the DPDK project support segmented packets?
> > > If yes, then apply the patch to fix the bug!
> > >
> > > If anyone seriously cares about the regression it introduces,
> optimization patches are welcome later. We shouldn't wait for it.
> >
> > You're right, but the regression is flagged to a 4-years old patch,
> > that's why I don't consider it as urgent.
> >
> > > If the patch is not applied, the documentation must be updated to
> mention that we are releasing DPDK with a known bug: that segmented
> packets are handled incorrectly in the scenario described in this
> patch.
> >
> > Yes, would be good to document the known issue,
> > no matter how old it is.
> 
> The problem description could be something like this:
> 
>   It is expected that free mbufs have their field m->nb_seg set to 1,
> so
>   that when it is allocated, the user does not need to set its
>   value. The mbuf free functions are responsible of resetting this
> field
>   to 1 before returning the mbuf to the pool.
> 
>   When a multi-segment mbuf is freed, the m->nb_seg field is not reset
>   to 1 for the last segment of the chain. On next allocation of this
>   segment, if the field is not explicitly reset by the user, an invalid
>   mbuf can be created, and can cause an undefined behavior.
> 

And it needs to be put somewhere very prominent if we expect the users to read 
it.

Would adding an RTE_VERIFY() - instead of fixing the bug - cause a regression? 
If not, then any affected user will know what went wrong and where. This would 
still be an improvement, if the bugfix patch cannot be applied.

> 
> > > Generally, there could be some performance to gain by not
> supporting segmented packets at all, as a compile time option. But that
> is a different discussion.
> > >
> > >
> > > -Morten
> >
> >
> >



[dpdk-dev] [PATCH] doc: announce renaming of mbuf offload flags

2021-07-30 Thread Olivier Matz
The mbuf offload flags do not match the DPDK namespace (they are
not prefixed by RTE_). Announce their rename in 21.11, and the
removal of the old names in 22.11.

A draft coccinelle script is provided to anticipate what the
renaming will be.

Signed-off-by: Olivier Matz 
---
 .../cocci/prefix_mbuf_offload_flags.cocci | 436 ++
 doc/guides/rel_notes/deprecation.rst  |   5 +
 2 files changed, 441 insertions(+)
 create mode 100644 devtools/cocci/prefix_mbuf_offload_flags.cocci

diff --git a/devtools/cocci/prefix_mbuf_offload_flags.cocci 
b/devtools/cocci/prefix_mbuf_offload_flags.cocci
new file mode 100644
index 00..8bfc7c29d1
--- /dev/null
+++ b/devtools/cocci/prefix_mbuf_offload_flags.cocci
@@ -0,0 +1,436 @@
+//
+// Rename mbuf offload flags (and some other defines) to have
+// an RTE_ prefix.
+// This only replaces usages in C code, so it is advised to
+// check for missing occurences with:
+//   for f in $(git ls-tree --full-tree -r --name-only HEAD); do
+// if [ "$(file -b --mime-encoding $f)" != binary ]; then
+//   sed -i -e 's,PKT_RX_,RTE_MBUF_F_RX_,g' \
+// -e 's,PKT_TX_,RTE_MBUF_F_TX_,g' \
+// -e 's,EXT_ATTACHED_MBUF,RTE_MBUF_F_EXTERNAL,g' \
+// -e 's,IND_ATTACHED_MBUF,RTE_MBUF_F_INDIRECT,g' \
+// -e 's,PKT_FIRST_FREE,RTE_MBUF_F_FIRST_FREE,g' \
+// -e 's,PKT_LAST_FREE,RTE_MBUF_F_LAST_FREE,g' $f
+// fi
+//   done
+//
+@@
+@@
+
+- PKT_RX_VLAN
++ RTE_MBUF_F_RX_VLAN
+
+@@
+@@
+
+- PKT_RX_RSS_HASH
++ RTE_MBUF_F_RX_RSS_HASH
+
+@@
+@@
+
+- PKT_RX_FDIR
++ RTE_MBUF_F_RX_FDIR
+
+@@
+@@
+
+- PKT_RX_L4_CKSUM_BAD
++ RTE_MBUF_F_RX_L4_CKSUM_BAD
+
+@@
+@@
+
+- PKT_RX_IP_CKSUM_BAD
++ RTE_MBUF_F_RX_IP_CKSUM_BAD
+
+@@
+@@
+
+- PKT_RX_OUTER_IP_CKSUM_BAD
++ RTE_MBUF_F_RX_OUTER_IP_CKSUM_BAD
+
+@@
+@@
+
+- PKT_RX_EIP_CKSUM_BAD
++ RTE_MBUF_F_RX_EIP_CKSUM_BAD
+
+@@
+@@
+
+- PKT_RX_VLAN_STRIPPED
++ RTE_MBUF_F_RX_VLAN_STRIPPED
+
+@@
+@@
+
+- PKT_RX_IP_CKSUM_MASK
++ RTE_MBUF_F_RX_IP_CKSUM_MASK
+
+@@
+@@
+
+- PKT_RX_IP_CKSUM_UNKNOWN
++ RTE_MBUF_F_RX_IP_CKSUM_UNKNOWN
+
+@@
+@@
+
+- PKT_RX_IP_CKSUM_BAD
++ RTE_MBUF_F_RX_IP_CKSUM_BAD
+
+@@
+@@
+
+- PKT_RX_IP_CKSUM_GOOD
++ RTE_MBUF_F_RX_IP_CKSUM_GOOD
+
+@@
+@@
+
+- PKT_RX_IP_CKSUM_NONE
++ RTE_MBUF_F_RX_IP_CKSUM_NONE
+
+@@
+@@
+
+- PKT_RX_L4_CKSUM_MASK
++ RTE_MBUF_F_RX_L4_CKSUM_MASK
+
+@@
+@@
+
+- PKT_RX_L4_CKSUM_UNKNOWN
++ RTE_MBUF_F_RX_L4_CKSUM_UNKNOWN
+
+@@
+@@
+
+- PKT_RX_L4_CKSUM_BAD
++ RTE_MBUF_F_RX_L4_CKSUM_BAD
+
+@@
+@@
+
+- PKT_RX_L4_CKSUM_GOOD
++ RTE_MBUF_F_RX_L4_CKSUM_GOOD
+
+@@
+@@
+
+- PKT_RX_L4_CKSUM_NONE
++ RTE_MBUF_F_RX_L4_CKSUM_NONE
+
+@@
+@@
+
+- PKT_RX_IEEE1588_PTP
++ RTE_MBUF_F_RX_IEEE1588_PTP
+
+@@
+@@
+
+- PKT_RX_IEEE1588_TMST
++ RTE_MBUF_F_RX_IEEE1588_TMST
+
+@@
+@@
+
+- PKT_RX_FDIR_ID
++ RTE_MBUF_F_RX_FDIR_ID
+
+@@
+@@
+
+- PKT_RX_FDIR_FLX
++ RTE_MBUF_F_RX_FDIR_FLX
+
+@@
+@@
+
+- PKT_RX_QINQ_STRIPPED
++ RTE_MBUF_F_RX_QINQ_STRIPPED
+
+@@
+@@
+
+- PKT_RX_LRO
++ RTE_MBUF_F_RX_LRO
+
+@@
+@@
+
+- PKT_RX_SEC_OFFLOAD
++ RTE_MBUF_F_RX_SEC_OFFLOAD
+
+@@
+@@
+
+- PKT_RX_SEC_OFFLOAD_FAILED
++ RTE_MBUF_F_RX_SEC_OFFLOAD_FAILED
+
+@@
+@@
+
+- PKT_RX_QINQ
++ RTE_MBUF_F_RX_QINQ
+
+@@
+@@
+
+- PKT_RX_OUTER_L4_CKSUM_MASK
++ RTE_MBUF_F_RX_OUTER_L4_CKSUM_MASK
+
+@@
+@@
+
+- PKT_RX_OUTER_L4_CKSUM_UNKNOWN
++ RTE_MBUF_F_RX_OUTER_L4_CKSUM_UNKNOWN
+
+@@
+@@
+
+- PKT_RX_OUTER_L4_CKSUM_BAD
++ RTE_MBUF_F_RX_OUTER_L4_CKSUM_BAD
+
+@@
+@@
+
+- PKT_RX_OUTER_L4_CKSUM_GOOD
++ RTE_MBUF_F_RX_OUTER_L4_CKSUM_GOOD
+
+@@
+@@
+
+- PKT_RX_OUTER_L4_CKSUM_INVALID
++ RTE_MBUF_F_RX_OUTER_L4_CKSUM_INVALID
+
+@@
+@@
+
+- PKT_FIRST_FREE
++ RTE_MBUF_F_FIRST_FREE
+
+@@
+@@
+
+- PKT_LAST_FREE
++ RTE_MBUF_F_LAST_FREE
+
+@@
+@@
+
+- PKT_TX_OUTER_UDP_CKSUM
++ RTE_MBUF_F_TX_OUTER_UDP_CKSUM
+
+@@
+@@
+
+- PKT_TX_UDP_SEG
++ RTE_MBUF_F_TX_UDP_SEG
+
+@@
+@@
+
+- PKT_TX_SEC_OFFLOAD
++ RTE_MBUF_F_TX_SEC_OFFLOAD
+
+@@
+@@
+
+- PKT_TX_MACSEC
++ RTE_MBUF_F_TX_MACSEC
+
+@@
+@@
+
+- PKT_TX_TUNNEL_VXLAN
++ RTE_MBUF_F_TX_TUNNEL_VXLAN
+
+@@
+@@
+
+- PKT_TX_TUNNEL_GRE
++ RTE_MBUF_F_TX_TUNNEL_GRE
+
+@@
+@@
+
+- PKT_TX_TUNNEL_IPIP
++ RTE_MBUF_F_TX_TUNNEL_IPIP
+
+@@
+@@
+
+- PKT_TX_TUNNEL_GENEVE
++ RTE_MBUF_F_TX_TUNNEL_GENEVE
+
+@@
+@@
+
+- PKT_TX_TUNNEL_MPLSINUDP
++ RTE_MBUF_F_TX_TUNNEL_MPLSINUDP
+
+@@
+@@
+
+- PKT_TX_TUNNEL_VXLAN_GPE
++ RTE_MBUF_F_TX_TUNNEL_VXLAN_GPE
+
+@@
+@@
+
+- PKT_TX_TUNNEL_GTP
++ RTE_MBUF_F_TX_TUNNEL_GTP
+
+@@
+@@
+
+- PKT_TX_TUNNEL_IP
++ RTE_MBUF_F_TX_TUNNEL_IP
+
+@@
+@@
+
+- PKT_TX_TUNNEL_UDP
++ RTE_MBUF_F_TX_TUNNEL_UDP
+
+@@
+@@
+
+- PKT_TX_TUNNEL_MASK
++ RTE_MBUF_F_TX_TUNNEL_MASK
+
+@@
+@@
+
+- PKT_TX_QINQ
++ RTE_MBUF_F_TX_QINQ
+
+@@
+@@
+
+- PKT_TX_QINQ_PKT
++ RTE_MBUF_F_TX_QINQ_PKT
+
+@@
+@@
+
+- PKT_TX_TCP_SEG
++ RTE_MBUF_F_TX_TCP_SEG
+
+@@
+@@
+
+- PKT_TX_IEEE1588_TMST
++ RTE_MBUF_F_TX_IEEE1588_TMST
+
+@@
+@@
+
+- PKT_TX_L4_NO_CKSUM
++ RTE_MBUF_F_TX_L4_NO_CKSUM
+
+@@
+@@
+
+- PKT_TX_TCP_CKSUM
++ RTE_MBUF_F_TX_TCP_CKSUM
+
+@@
+@@
+
+- PKT_TX_SCTP_CKSUM
++ RTE_MBUF_F_TX_SCTP_CKSUM
+
+@@
+@@
+
+- PKT_TX_UDP_CKSUM
++ 

[dpdk-dev] [PATCH] common/octeontx2: fix link event message size

2021-07-30 Thread Harman Kalra
Due to wrong size of mbox message allocated for sending link status
to the VF, incorrect link status is observed.

Fixes: cb8d769fb6fe ("common/octeontx2: send link event to VF")

Signed-off-by: Harman Kalra 
---
 drivers/common/octeontx2/otx2_dev.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/common/octeontx2/otx2_dev.c 
b/drivers/common/octeontx2/otx2_dev.c
index 1485e2b357..ce4f0e7ca9 100644
--- a/drivers/common/octeontx2/otx2_dev.c
+++ b/drivers/common/octeontx2/otx2_dev.c
@@ -172,14 +172,17 @@ af_pf_wait_msg(struct otx2_dev *dev, uint16_t vf, int 
num_msg)
/* Send link status to VF */
struct cgx_link_user_info linfo;
struct mbox_msghdr *vf_msg;
+   size_t sz;
 
/* Get the link status */
if (dev->ops && dev->ops->link_status_get)
dev->ops->link_status_get(dev, &linfo);
 
+   sz = RTE_ALIGN(otx2_mbox_id2size(
+   MBOX_MSG_CGX_LINK_EVENT), MBOX_MSG_ALIGN);
/* Prepare the message to be sent */
vf_msg = otx2_mbox_alloc_msg(&dev->mbox_vfpf_up, vf,
-size);
+sz);
otx2_mbox_req_init(MBOX_MSG_CGX_LINK_EVENT, vf_msg);
memcpy((uint8_t *)vf_msg + sizeof(struct mbox_msghdr),
   &linfo, sizeof(struct cgx_link_user_info));
-- 
2.18.0



[dpdk-dev] [PATCH 1/2] common/cnxk: send link event to VF

2021-07-30 Thread Harman Kalra
Currently link event is only sent to the PF by AF as soon as it comes
up, or in case of any physical change in link. PF will broadcast
these link events to all its VFs as soon as it receives it.
But no event is sent when a new VF comes up, hence it will not have
the link status.
Adding support for sending link status to the VF once it comes up
successfully.

Signed-off-by: Harman Kalra 
---
 drivers/common/cnxk/roc_dev.c  | 33 ++
 drivers/common/cnxk/roc_dev_priv.h |  5 +
 drivers/common/cnxk/roc_nix.h  |  7 +++
 drivers/common/cnxk/roc_nix_mac.c  | 23 +
 drivers/common/cnxk/version.map|  2 ++
 5 files changed, 70 insertions(+)

diff --git a/drivers/common/cnxk/roc_dev.c b/drivers/common/cnxk/roc_dev.c
index c14f189f9b..4e204373dc 100644
--- a/drivers/common/cnxk/roc_dev.c
+++ b/drivers/common/cnxk/roc_dev.c
@@ -163,6 +163,39 @@ af_pf_wait_msg(struct dev *dev, uint16_t vf, int num_msg)
rsp->rc = msg->rc;
rsp->pcifunc = msg->pcifunc;
 
+   /* Whenever a PF comes up, AF sends the link status to it but
+* when VF comes up no such event is sent to respective VF.
+* Using MBOX_MSG_NIX_LF_START_RX response from AF for the
+* purpose and send the link status of PF to VF.
+*/
+   if (msg->id == MBOX_MSG_NIX_LF_START_RX) {
+   /* Send link status to VF */
+   struct cgx_link_user_info linfo;
+   struct mbox_msghdr *vf_msg;
+   size_t sz;
+
+   /* Get the link status */
+   memset(&linfo, 0, sizeof(struct cgx_link_user_info));
+   if (dev->ops && dev->ops->link_status_get)
+   dev->ops->link_status_get(dev->roc_nix, &linfo);
+
+   sz = PLT_ALIGN(mbox_id2size(MBOX_MSG_CGX_LINK_EVENT),
+  MBOX_MSG_ALIGN);
+   /* Prepare the message to be sent */
+   vf_msg = mbox_alloc_msg(&dev->mbox_vfpf_up, vf, sz);
+   if (vf_msg) {
+   mbox_req_init(MBOX_MSG_CGX_LINK_EVENT, vf_msg);
+   memcpy((uint8_t *)vf_msg +
+  sizeof(struct mbox_msghdr), &linfo,
+  sizeof(struct cgx_link_user_info));
+
+   vf_msg->rc = msg->rc;
+   vf_msg->pcifunc = msg->pcifunc;
+   /* Send to VF */
+   mbox_msg_send(&dev->mbox_vfpf_up, vf);
+   }
+   }
+
offset = mbox->rx_start + msg->next_msgoff;
}
plt_spinlock_unlock(&mdev->mbox_lock);
diff --git a/drivers/common/cnxk/roc_dev_priv.h 
b/drivers/common/cnxk/roc_dev_priv.h
index 9488db3c41..302dc0feb0 100644
--- a/drivers/common/cnxk/roc_dev_priv.h
+++ b/drivers/common/cnxk/roc_dev_priv.h
@@ -30,9 +30,14 @@ typedef void (*link_info_t)(void *roc_nix,
 /* PTP info callback */
 typedef int (*ptp_info_t)(void *roc_nix, bool enable);
 
+/* Link status get callback */
+typedef void (*link_status_get_t)(void *roc_nix,
+ struct cgx_link_user_info *link);
+
 struct dev_ops {
link_info_t link_status_update;
ptp_info_t ptp_info_update;
+   link_status_get_t link_status_get;
 };
 
 #define dev_is_vf(dev) ((dev)->hwcap & DEV_HWCAP_F_VF)
diff --git a/drivers/common/cnxk/roc_nix.h b/drivers/common/cnxk/roc_nix.h
index bb69027956..d7ab3c674e 100644
--- a/drivers/common/cnxk/roc_nix.h
+++ b/drivers/common/cnxk/roc_nix.h
@@ -243,6 +243,10 @@ typedef void (*link_status_t)(struct roc_nix *roc_nix,
 /* PTP info update callback */
 typedef int (*ptp_info_update_t)(struct roc_nix *roc_nix, bool enable);
 
+/* Link status get callback */
+typedef void (*link_info_get_t)(struct roc_nix *roc_nix,
+   struct roc_nix_link_info *link);
+
 struct roc_nix {
/* Input parameters */
struct plt_pci_device *pci_dev;
@@ -487,6 +491,9 @@ int __roc_api roc_nix_mac_max_rx_len_set(struct roc_nix 
*roc_nix,
 int __roc_api roc_nix_mac_link_cb_register(struct roc_nix *roc_nix,
   link_status_t link_update);
 void __roc_api roc_nix_mac_link_cb_unregister(struct roc_nix *roc_nix);
+int __roc_api roc_nix_mac_link_info_get_cb_register(
+   struct roc_nix *roc_nix, link_info_get_t link_info_get);
+void __roc_api roc_nix_mac_link_info_get_cb_unregister(struct roc_nix 
*roc_nix);
 
 /* Ops */
 int __roc_api roc_nix_switch_hdr_set(struct roc_nix *roc_nix,
diff --git a/drivers/common/cnxk/roc_nix_mac.c 
b/drivers/common/cnxk/roc_nix_mac.c
index 682d5a7295..36259941c9 100644
--- a/drivers/common/cnxk/roc_nix_mac.c
+++ b/drivers/common/cnxk/roc_nix_mac.c
@@ -296,

[dpdk-dev] [PATCH 2/2] net/cnxk: callback for getting link status

2021-07-30 Thread Harman Kalra
Adding a new callback for reading the link status. PF can read it's
link status and can forward the same to VF once it comes up.

Signed-off-by: Harman Kalra 
---
 drivers/net/cnxk/cnxk_ethdev.c |  9 +
 drivers/net/cnxk/cnxk_ethdev.h |  2 ++
 drivers/net/cnxk/cnxk_link.c   | 23 +++
 3 files changed, 34 insertions(+)

diff --git a/drivers/net/cnxk/cnxk_ethdev.c b/drivers/net/cnxk/cnxk_ethdev.c
index 0e3652ed51..7152dcd002 100644
--- a/drivers/net/cnxk/cnxk_ethdev.c
+++ b/drivers/net/cnxk/cnxk_ethdev.c
@@ -1314,6 +1314,10 @@ cnxk_eth_dev_init(struct rte_eth_dev *eth_dev)
/* Register up msg callbacks */
roc_nix_mac_link_cb_register(nix, cnxk_eth_dev_link_status_cb);
 
+   /* Register up msg callbacks */
+   roc_nix_mac_link_info_get_cb_register(nix,
+ cnxk_eth_dev_link_status_get_cb);
+
dev->eth_dev = eth_dev;
dev->configured = 0;
dev->ptype_disable = 0;
@@ -1415,6 +1419,11 @@ cnxk_eth_dev_uninit(struct rte_eth_dev *eth_dev, bool 
reset)
/* Disable link status events */
roc_nix_mac_link_event_start_stop(nix, false);
 
+   /* Unregister the link update op, this is required to stop VFs from
+* receiving link status updates on exit path.
+*/
+   roc_nix_mac_link_cb_unregister(nix);
+
/* Free up SQs */
for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
dev_ops->tx_queue_release(eth_dev->data->tx_queues[i]);
diff --git a/drivers/net/cnxk/cnxk_ethdev.h b/drivers/net/cnxk/cnxk_ethdev.h
index 4eead03905..4caf26303f 100644
--- a/drivers/net/cnxk/cnxk_ethdev.h
+++ b/drivers/net/cnxk/cnxk_ethdev.h
@@ -349,6 +349,8 @@ int cnxk_nix_rss_hash_conf_get(struct rte_eth_dev *eth_dev,
 void cnxk_nix_toggle_flag_link_cfg(struct cnxk_eth_dev *dev, bool set);
 void cnxk_eth_dev_link_status_cb(struct roc_nix *nix,
 struct roc_nix_link_info *link);
+void cnxk_eth_dev_link_status_get_cb(struct roc_nix *nix,
+struct roc_nix_link_info *link);
 int cnxk_nix_link_update(struct rte_eth_dev *eth_dev, int wait_to_complete);
 int cnxk_nix_queue_stats_mapping(struct rte_eth_dev *dev, uint16_t queue_id,
 uint8_t stat_idx, uint8_t is_rx);
diff --git a/drivers/net/cnxk/cnxk_link.c b/drivers/net/cnxk/cnxk_link.c
index 3fdbdba495..6a70801675 100644
--- a/drivers/net/cnxk/cnxk_link.c
+++ b/drivers/net/cnxk/cnxk_link.c
@@ -45,6 +45,29 @@ nix_link_status_print(struct rte_eth_dev *eth_dev, struct 
rte_eth_link *link)
plt_info("Port %d: Link Down", (int)(eth_dev->data->port_id));
 }
 
+void
+cnxk_eth_dev_link_status_get_cb(struct roc_nix *nix,
+   struct roc_nix_link_info *link)
+{
+   struct cnxk_eth_dev *dev = (struct cnxk_eth_dev *)nix;
+   struct rte_eth_link eth_link;
+   struct rte_eth_dev *eth_dev;
+
+   if (!link || !nix)
+   return;
+
+   eth_dev = dev->eth_dev;
+   if (!eth_dev)
+   return;
+
+   rte_eth_linkstatus_get(eth_dev, ð_link);
+
+   link->status = eth_link.link_status;
+   link->speed = eth_link.link_speed;
+   link->autoneg = eth_link.link_autoneg;
+   link->full_duplex = eth_link.link_duplex;
+}
+
 void
 cnxk_eth_dev_link_status_cb(struct roc_nix *nix, struct roc_nix_link_info 
*link)
 {
-- 
2.18.0



[dpdk-dev] [PATCH 1/2] common/cnxk: setup nix and lbk in loop mode in 98xx

2021-07-30 Thread Harman Kalra
In case of 98xx, 2 NIX blocks and 4 LBK blocks are present. Moreover
AF VFs are alternatively attached to NIX0 and NIX1 to ensure load
balancing. To support loopback functionality between pairs NIX0/NIX1
are attached to LBK1/LBK2 for transmission/reception respectively.
But in this default configuration NIX blocks cannot receive the
packets they sent from the same LBK, which is an important requirement
as some ODP applications only uses one AF VF for loopback functionality.
To support this scenario, NIX0 can use LBK0 (NIX1 - LBK3) by setting a
loop flag while making LF alloc mailbox request.

Signed-off-by: Harman Kalra 
---
 drivers/common/cnxk/roc_mbox.h | 1 +
 drivers/common/cnxk/roc_nix.c  | 5 -
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/common/cnxk/roc_mbox.h b/drivers/common/cnxk/roc_mbox.h
index b5da931b81..75d1ff1ef3 100644
--- a/drivers/common/cnxk/roc_mbox.h
+++ b/drivers/common/cnxk/roc_mbox.h
@@ -723,6 +723,7 @@ struct nix_lf_alloc_req {
uint64_t __io rx_cfg; /* See NIX_AF_LF(0..127)_RX_CFG */
uint64_t __io way_mask;
 #define NIX_LF_RSS_TAG_LSB_AS_ADDER BIT_ULL(0)
+#define NIX_LF_LBK_BLK_SEL BIT_ULL(1)
uint64_t flags;
 };
 
diff --git a/drivers/common/cnxk/roc_nix.c b/drivers/common/cnxk/roc_nix.c
index 23d508b941..1621f77fb0 100644
--- a/drivers/common/cnxk/roc_nix.c
+++ b/drivers/common/cnxk/roc_nix.c
@@ -145,9 +145,12 @@ roc_nix_lf_alloc(struct roc_nix *roc_nix, uint32_t nb_rxq, 
uint32_t nb_txq,
req->npa_func = idev_npa_pffunc_get();
req->sso_func = idev_sso_pffunc_get();
req->rx_cfg = rx_cfg;
+   if (roc_nix_is_lbk(roc_nix) && roc_nix->enable_loop &&
+   roc_model_is_cn98xx())
+   req->flags = NIX_LF_LBK_BLK_SEL;
 
if (!roc_nix->rss_tag_as_xor)
-   req->flags = NIX_LF_RSS_TAG_LSB_AS_ADDER;
+   req->flags |= NIX_LF_RSS_TAG_LSB_AS_ADDER;
 
rc = mbox_process_msg(mbox, (void *)&rsp);
if (rc)
-- 
2.18.0



[dpdk-dev] [PATCH 2/2] common/cnxk: update npc mcam range for 98xx

2021-07-30 Thread Harman Kalra
NPC mcam entry distribution is based on maximum number of PFs and LFs
available. Fixing the max no of PFs and LFs available on 98xx to fix the
mcam alloc entry range.

Signed-off-by: Harman Kalra 
---
 drivers/common/cnxk/roc_npc.c  | 2 ++
 drivers/common/cnxk/roc_npc_priv.h | 6 ++
 2 files changed, 8 insertions(+)

diff --git a/drivers/common/cnxk/roc_npc.c b/drivers/common/cnxk/roc_npc.c
index aff4eef554..27a7f20226 100644
--- a/drivers/common/cnxk/roc_npc.c
+++ b/drivers/common/cnxk/roc_npc.c
@@ -108,6 +108,8 @@ roc_npc_get_low_priority_mcam(struct roc_npc *roc_npc)
 
if (roc_model_is_cn10k())
return (npc->mcam_entries - NPC_MCAME_RESVD_10XX - 1);
+   else if (roc_model_is_cn98xx())
+   return (npc->mcam_entries - NPC_MCAME_RESVD_98XX - 1);
else
return (npc->mcam_entries - NPC_MCAME_RESVD_9XXX - 1);
 }
diff --git a/drivers/common/cnxk/roc_npc_priv.h 
b/drivers/common/cnxk/roc_npc_priv.h
index 5b884e3fd4..365701a545 100644
--- a/drivers/common/cnxk/roc_npc_priv.h
+++ b/drivers/common/cnxk/roc_npc_priv.h
@@ -46,10 +46,12 @@
 #define NPC_MCAM_KEY_X4_WORDS7 /* Number of 64-bit words */
 
 #define NPC_RVUPF_MAX_9XXX 0x10 /* HRM: RVU_PRIV_CONST */
+#define NPC_RVUPF_MAX_98XX 0x18 /* HRM: RVU_PRIV_CONST */
 #define NPC_RVUPF_MAX_10XX 0x20 /* HRM: RVU_PRIV_CONST */
 #define NPC_NIXLF_MAX 0x80 /* HRM: NIX_AF_CONST2 */
 #define NPC_MCAME_PER_PF   3   /* DRV: RSVD_MCAM_ENTRIES_PER_PF */
 #define NPC_MCAME_PER_LF   1   /* DRV: RSVD_MCAM_ENTRIES_PER_NIXLF */
+#define NPC_NIXLF_MAX_98XX (2 * NPC_NIXLF_MAX) /*2 NIXLFs */
 #define NPC_MCAME_RESVD_9XXX   
\
(NPC_NIXLF_MAX * NPC_MCAME_PER_LF +\
 (NPC_RVUPF_MAX_9XXX - 1) * NPC_MCAME_PER_PF)
@@ -58,6 +60,10 @@
(NPC_NIXLF_MAX * NPC_MCAME_PER_LF +\
 (NPC_RVUPF_MAX_10XX - 1) * NPC_MCAME_PER_PF)
 
+#define NPC_MCAME_RESVD_98XX   
\
+   (NPC_NIXLF_MAX_98XX * NPC_MCAME_PER_LF +   \
+(NPC_RVUPF_MAX_98XX - 1) * NPC_MCAME_PER_PF)
+
 enum npc_err_status {
NPC_ERR_PARAM = -1024,
NPC_ERR_NO_MEM,
-- 
2.18.0



Re: [dpdk-dev] [dpdk-stable] [PATCH v4] bus: clarify log for non-NUMA-aware devices

2021-07-30 Thread Thomas Monjalon
29/07/2021 00:06, Dmitry Kozlyuk:
> EAL: PCI device :00:06.0 on NUMA socket -1
> -   EAL:   Invalid NUMA socket, default to 0
> +   EAL:   Device is not NUMA-aware, defaulting socket to 0
> EAL:   probe driver: 1d0f:ec20 net_ena

The indentation in logs are wrong because they are not all
at the same log level.
If you run at a non-debug-level, you lose the first line,
so the indent becomes meaningless and confusing.

[...]
> - AUXILIARY_LOG(INFO, "Device is not NUMA-aware, defaulting NUMA 
> node to 0");
> + if (rte_socket_count() > 1)
> + AUXILIARY_LOG(INFO, "  Device is not NUMA-aware, 
> defaulting socket to
 0");

Instead of adding an indent, I would prefer we print the device name.
And we should remove log indents in other bus drivers.




Re: [dpdk-dev] [PATCH v2 0/2] app/acl: help to automate testing

2021-07-30 Thread Thomas Monjalon
26/07/2021 13:51, Konstantin Ananyev:
> The purpose of this series is to help automate ACL library functional
> testing using test-acl app.
> First patch adds into test-acl ability to skip comment/empty lines.
> Second patch adds script for automate testing.
> Sample input files are also provided.
> 
> v2:
>  - Added ability to skip comment/empty lines
>  - Fixed check-spdx-tag.sh complains (Thomas)
> 
> Konstantin Ananyev (2):
>   app/acl: allow comment and empty lines
>   app/acl: add script for automate testing

Applied, thanks.





Re: [dpdk-dev] [PATCH] eal/windows: cleanup virt2phys handle

2021-07-30 Thread Thomas Monjalon
27/07/2021 07:43, Menon, Ranjit:
> On 7/26/2021 2:36 PM, Dmitry Kozlyuk wrote:
> > eal_mem_virt2phys_init() opens a handle for use by rte_mem_virt2phy().
> > Close this handle on EAL cleanup.
> >
> > Fixes: 2a5d547a4a9b ("eal/windows: implement basic memory management")
> > Cc: sta...@dpdk.org
> >
> > Signed-off-by: Dmitry Kozlyuk 
> Acked-by: Ranjit Menon 

Applied, thanks.





Re: [dpdk-dev] [PATCH v3] app/procinfo: add device registers dump

2021-07-30 Thread Thomas Monjalon
29/07/2021 16:14, Pattan, Reshma:
> 
> > Signed-off-by: Chengchang Tang 
> > Signed-off-by: Min Hu (Connor) 
> 
> Acked-by: Reshma Pattan 

Applied, thanks.





Re: [dpdk-dev] [PATCH] examples/l3fwd: change mq-mode on single queue devices

2021-07-30 Thread Thomas Monjalon
13/05/2021 11:59, Bruce Richardson:
> On Wed, May 12, 2021 at 09:43:57PM +0300, Medvedkin, Vladimir wrote:
> > Hi Bruce,
> > 
> > On 12/05/2021 19:32, Bruce Richardson wrote:
> > > On Mon, May 10, 2021 at 06:53:19PM +0200, Heinrich Kuhn wrote:
> > > > From: "Chaoyong.He" 
> > > > 
> > > > Set the Rx multi-queue mode to NONE when configuring a port that is
> > > > associated with hardware that only supports a single Rx queue.
> > > > 
> > > > Signed-off-by: Chaoyong He 
> > > > Signed-off-by: Heinrich Kuhn 
> > > > Signed-off-by: Simon Horman 
> > > > ---
> > > > +   if (dev_info.max_rx_queues == 1)
> > > > +   local_port_conf.rxmode.mq_mode = ETH_MQ_RX_NONE;
> > > > +
> > > 
> > > While it makes sense to do this when the port only supports a single 
> > > queue,
> > > would it not also make sense to do this when the requested queues are 1
> > > too?
> > > 
> > > Adding some lookup library maintainers on CC - I assume that the RSS value
> > > is not actually used for lookup anywhere in l3fwd.
> > > 
> > 
> > As far as I can see the rss hash value is not used anywhere in l3fwd. In
> > LPM/FIB this is not required at all, in EM CRC or Jenkins hash is used.
> >
> That's what I thought from looking at the code too. Since this is not
> really a bug fix, I think it can be pushed till 21.08.
> 
> With or without the change I suggest above:
> Acked-by: Bruce Richardson 

Applied, sorry it has been waiting so long.




Re: [dpdk-dev] [PATCH 0/2] fixes to bnxt PMD

2021-07-30 Thread Ajit Khaparde
On Thu, Jul 29, 2021 at 10:36 PM Ajit Khaparde
 wrote:
>
> Fixes to bnxt PMD to address compatibility issues with different FW versions.
Patches applied to dpdk-next-net-brcm.

>
> Jay Ding (1):
>   net/bnxt: fix resource qcap list handling
>
> Kishore Padmanabha (1):
>   net/bnxt: fix stats counter resource
>
>  drivers/net/bnxt/tf_core/tf_msg.c| 12 ++--
>  .../tf_ulp/generic_templates/ulp_template_db_tbl.c   |  4 ++--
>  2 files changed, 8 insertions(+), 8 deletions(-)
>
> --
> 2.21.1 (Apple Git-122.3)
>


[dpdk-dev] [PATCH] crypto/octeontx: fix heap use after free

2021-07-30 Thread Akhil Goyal
When the PMD is removed, rte_cryptodev_pmd_release_device
is called which frees cryptodev->data, and then tries to free
cryptodev->data->dev_private, which causes the heap use
after free issue.

A temporary pointer is set before the free of cryptodev->data,
which can then be used afterwards to free dev_private.

Fixes: bfe2ae495ee2 ("crypto/octeontx: add PMD skeleton")
Cc: sta...@dpdk.org

Reported-by: ZhihongX Peng 
Signed-off-by: Akhil Goyal 
---
 drivers/crypto/octeontx/otx_cryptodev.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/crypto/octeontx/otx_cryptodev.c 
b/drivers/crypto/octeontx/otx_cryptodev.c
index 7207909abb..3822c0d779 100644
--- a/drivers/crypto/octeontx/otx_cryptodev.c
+++ b/drivers/crypto/octeontx/otx_cryptodev.c
@@ -75,6 +75,7 @@ otx_cpt_pci_remove(struct rte_pci_device *pci_dev)
 {
struct rte_cryptodev *cryptodev;
char name[RTE_CRYPTODEV_NAME_MAX_LEN];
+   void *dev_priv;
 
if (pci_dev == NULL)
return -EINVAL;
@@ -88,11 +89,13 @@ otx_cpt_pci_remove(struct rte_pci_device *pci_dev)
if (pci_dev->driver == NULL)
return -ENODEV;
 
+   dev_priv = cryptodev->data->dev_private;
+
/* free crypto device */
rte_cryptodev_pmd_release_device(cryptodev);
 
if (rte_eal_process_type() == RTE_PROC_PRIMARY)
-   rte_free(cryptodev->data->dev_private);
+   rte_free(dev_priv);
 
cryptodev->device->driver = NULL;
cryptodev->device = NULL;
-- 
2.25.1



Re: [dpdk-dev] [dpdk-announce] release candidate 21.08-rc2

2021-07-30 Thread Thinh Tran

Hi all,
IBM - DPDK on Power Systems

* Basic PF on Mellanox: No new issues or regressions were seen.
* Performance: not tested.

Systems tested:
 - IBM Power9 PowerNV 9006-22P
OS: RHEL 8.3
GCC:  version 8.3.1 20191121 (Red Hat 8.3.1-5)
NICs:
 - Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
 - firmware version: 16.29.1017
 - MLNX_OFED_LINUX-5.2-1.0.4.1 (OFED-5.2-1.0.4)

 - LPARs on IBM Power10 CHRP IBM,9105-42B
OS: RHEL 8.4
GCC: gcc version 8.4.1 20200928 (Red Hat 8.4.1-1)
NICs:
- Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
- firmware version: 16.30.1004
- MLNX_OFED_LINUX-5.3-1.0.0.2

Regards,
Thinh Tran

On 7/24/2021 8:38 AM, Thomas Monjalon wrote:

A new DPDK release candidate is ready for testing:
https://git.dpdk.org/dpdk/tag/?id=v21.08-rc2

There are 287 new patches in this snapshot,
most of them adding or fixing features in drivers.

Release notes:
https://doc.dpdk.org/guides/rel_notes/release_21_08.html

Highlights of 21.08-rc2:
- Wangxun ngbe ethernet driver
- Marvell CNXK event Rx/Tx adapters
- NVIDIA mlx5 crypto driver supporting AES-XTS
- ISAL compress support on Arm

Please test and report issues on bugs.dpdk.org.

DPDK 21.08-rc3 is expected in one week.

Thank you everyone




Re: [dpdk-dev] [PATCH] doc: announce security API changes for Inline IPsec

2021-07-30 Thread Akhil Goyal
> Announce changes to make rte_security_set_pkt_metadata() and
> rte_security_get_userdata() inline instead of C functions and
> also addition of another field in structure rte_security_ctx for
> holding flags.
> 
> Signed-off-by: Nithin Dabilpuram 
> Acked-by: Akhil Goyal 
> ---
Applied to dpdk-next-crypto

Thanks.


Re: [dpdk-dev] [EXT] [PATCH] crypto: fix heap use after free bug

2021-07-30 Thread Akhil Goyal
Fixed title
Cryptodev: fix heap use after free
> > The PMD destroy function was calling the release function, which frees
> > cryptodev->data, and then tries to free cryptodev->data->dev_private,
> > which causes the heap use after free issue.
> >
> > A temporary pointer is set before the free of cryptodev->data,
> > which can then be used afterwards to free dev_private.
> > The free cannot be moved to before the release function is called,
> > as dev_private is used in the QAT close function while being released.
I believe all PMDs use dev_private for close.
Hence replaces QAT with PMD
> >
> > Fixes: 9e6edea41805 ("cryptodev: add APIs to assist PMD initialisation")
> > Cc: declan.dohe...@intel.com
> > Cc: sta...@dpdk.org
> >
> > Reported-by: ZhihongX Peng 
> > Signed-off-by: Ciara Power 
> >
> > ---
> > The same issue is found in crypto/octeontx,
> > which may need to be addressed by maintainers.
> > Cc: Anoob Joseph 
> > ---
> >  lib/cryptodev/rte_cryptodev_pmd.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/lib/cryptodev/rte_cryptodev_pmd.c
> > b/lib/cryptodev/rte_cryptodev_pmd.c
> > index 0912004127..900acd7ba4 100644
> > --- a/lib/cryptodev/rte_cryptodev_pmd.c
> > +++ b/lib/cryptodev/rte_cryptodev_pmd.c
> > @@ -140,6 +140,7 @@ int
> >  rte_cryptodev_pmd_destroy(struct rte_cryptodev *cryptodev)
> >  {
> > int retval;
> > +   void *tmp_dev_private = cryptodev->data->dev_private;
> 
> Can we rename this pointer as dev_private?

Renamed this while merging, as we have RC3 deadline today.
> 
> >
> > CDEV_LOG_INFO("Closing crypto device %s", cryptodev->device-
> > >name);
> >
> > @@ -149,7 +150,7 @@ rte_cryptodev_pmd_destroy(struct rte_cryptodev
> > *cryptodev)
> > return retval;
> >
> > if (rte_eal_process_type() == RTE_PROC_PRIMARY)
> > -   rte_free(cryptodev->data->dev_private);
> > +   rte_free(tmp_dev_private);
> >
> >
> > cryptodev->device = NULL;
> > --
> > 2.25.1



Re: [dpdk-dev] [EXT] [PATCH] crypto: fix heap use after free bug

2021-07-30 Thread Akhil Goyal
> Fixed title
> Cryptodev: fix heap use after free
> > > The PMD destroy function was calling the release function, which frees
> > > cryptodev->data, and then tries to free cryptodev->data->dev_private,
> > > which causes the heap use after free issue.
> > >
> > > A temporary pointer is set before the free of cryptodev->data,
> > > which can then be used afterwards to free dev_private.
> > > The free cannot be moved to before the release function is called,
> > > as dev_private is used in the QAT close function while being released.
> I believe all PMDs use dev_private for close.
> Hence replaces QAT with PMD
> > >
> > > Fixes: 9e6edea41805 ("cryptodev: add APIs to assist PMD initialisation")
> > > Cc: declan.dohe...@intel.com
> > > Cc: sta...@dpdk.org
> > >
> > > Reported-by: ZhihongX Peng 
> > > Signed-off-by: Ciara Power 
> > >
> > > ---
> > > The same issue is found in crypto/octeontx,
> > > which may need to be addressed by maintainers.
> > > Cc: Anoob Joseph 
> > > ---
> > >  lib/cryptodev/rte_cryptodev_pmd.c | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/lib/cryptodev/rte_cryptodev_pmd.c
> > > b/lib/cryptodev/rte_cryptodev_pmd.c
> > > index 0912004127..900acd7ba4 100644
> > > --- a/lib/cryptodev/rte_cryptodev_pmd.c
> > > +++ b/lib/cryptodev/rte_cryptodev_pmd.c
> > > @@ -140,6 +140,7 @@ int
> > >  rte_cryptodev_pmd_destroy(struct rte_cryptodev *cryptodev)
> > >  {
> > >   int retval;
> > > + void *tmp_dev_private = cryptodev->data->dev_private;
> >
> > Can we rename this pointer as dev_private?
> 
> Renamed this while merging, as we have RC3 deadline today.
> >
> > >
> > >   CDEV_LOG_INFO("Closing crypto device %s", cryptodev->device-
> > > >name);
> > >
> > > @@ -149,7 +150,7 @@ rte_cryptodev_pmd_destroy(struct rte_cryptodev
> > > *cryptodev)
> > >   return retval;
> > >
> > >   if (rte_eal_process_type() == RTE_PROC_PRIMARY)
> > > - rte_free(cryptodev->data->dev_private);
> > > + rte_free(tmp_dev_private);
> > >
> > >
> > >   cryptodev->device = NULL;

Acked-by: Akhil Goyal 
Applied to dpdk-next-crypto


Re: [dpdk-dev] [EXT] [PATCH v2] examples/l2fwd-crypto: support cipher multiple data-unit

2021-07-30 Thread Akhil Goyal
> The support for multiple data-units includes the next:
> - Add a new command-line argument to provide the data-unit length.
> - Set the length in the cipher xform.
> - Validate device capabilities for this feature.
> - Pad the AES-XTS operation length to be aligned to the defined data-unit.
> 
> Signed-off-by: Matan Azrad 
> ---
Acked-by: Akhil Goyal 
Applied to dpdk-next-crypto

Thanks


Re: [dpdk-dev] [EXT] [PATCH 1/2] drivers/qat: fix wrong return value for invalid service

2021-07-30 Thread Akhil Goyal
> Subject: [EXT] [PATCH 1/2] drivers/qat: fix wrong return value for invalid
> service
> 
Title changed as " drivers: fix return value for QAT PMDs "
Please check ./devtools/check-git-log.sh before sending patch.
 
> Fix invalid value that is returned when asymmetric crypto
> or compression service is selected.

Description is also updated. Please check.
> 
> Fixes: 8f393c4ffdc1 ("common/qat: support GEN4 devices")
> 
> Signed-off-by: Arek Kusztal 
> ---

Applied to dpdk-next-crypto

Braces across 'EFAULT' are not needed. I tried fixing it, but it is being
Used at many places. Please fix that in a separate patch.


Re: [dpdk-dev] [EXT] [PATCH 2/2] crypto/qat: fix asymmetric crypto pmd create on gen3

2021-07-30 Thread Akhil Goyal
> This patch disables asymmetric crypto pmd on gen3 devices.
> 
> Fixes: 1f5e4053f9b4 ("common/qat: support GEN3 devices")

Cc: sta...@dpdk.org

> 
> Signed-off-by: Arek Kusztal 
> ---
>  drivers/crypto/qat/qat_asym_pmd.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/crypto/qat/qat_asym_pmd.c
> b/drivers/crypto/qat/qat_asym_pmd.c
> index d4680c3847..4891766471 100644
> --- a/drivers/crypto/qat/qat_asym_pmd.c
> +++ b/drivers/crypto/qat/qat_asym_pmd.c
> @@ -255,6 +255,10 @@ qat_asym_dev_create(struct qat_pci_device
> *qat_pci_dev,
>   QAT_LOG(ERR, "Asymmetric crypto PMD not supported on
> QAT 4xxx");
>   return -(EFAULT);
>   }
> + if (qat_pci_dev->qat_dev_gen == QAT_GEN3) {
> + QAT_LOG(ERR, "Asymmetric crypto PMD not supported on
> QAT c4xxx");
> + return -(EFAULT);
> + }
>   snprintf(name, RTE_CRYPTODEV_NAME_MAX_LEN, "%s_%s",
>   qat_pci_dev->name, "asym");
>   QAT_LOG(DEBUG, "Creating QAT ASYM device %s\n", name);
> --
> 2.30.2
Applied to dpdk-next-crypto




Re: [dpdk-dev] [EXT] [PATCH] crypto/mlx5: fix driver probing error flow

2021-07-30 Thread Akhil Goyal
> In crypto driver probing, there are two validations after context
> allocation.
> 
> When one of them fails, the context structure was not freed what caused
> a memory leak.
> 
> Free it.
> 
> Fixes: debb27ea3442 ("crypto/mlx5: create login object using DevX")
> Fixes: e8db4413cba5 ("crypto/mlx5: add keytag configuration")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Michael Baum 
> Acked-by: Matan Azrad 
> ---
Applied to dpdk-next-crypto
@Thomas Monjalon: I do not think title and patch description are inline.
You may fix it while pulling on main branch.


Re: [dpdk-dev] [PATCH] crypto/octeontx: fix heap use after free

2021-07-30 Thread Akhil Goyal
> Subject: [PATCH] crypto/octeontx: fix heap use after free
> 
> When the PMD is removed, rte_cryptodev_pmd_release_device
> is called which frees cryptodev->data, and then tries to free
> cryptodev->data->dev_private, which causes the heap use
> after free issue.
> 
> A temporary pointer is set before the free of cryptodev->data,
> which can then be used afterwards to free dev_private.
> 
> Fixes: bfe2ae495ee2 ("crypto/octeontx: add PMD skeleton")
> Cc: sta...@dpdk.org
> 
> Reported-by: ZhihongX Peng 
> Signed-off-by: Akhil Goyal 

Applied to dpdk-next-crypto



[dpdk-dev] [PATCH] doc: announce the deprecation of lcore state FINISHED

2021-07-30 Thread Honnappa Nagarahalli
Lcore state FINISHED is used by the worker thread to indicate that
it has completed the assigned task. The state is changed to
WAIT by another thread after it observes the updated state. This
additional step is redundant. After this deprecation, the worker
thread will update the state to WAIT.

Signed-off-by: Honnappa Nagarahalli 
Reviewed-by: Ruifeng Wang 
---
More discussion at:
http://patches.dpdk.org/project/dpdk/patch/20210224212018.17576-4-honnappa.nagaraha...@arm.com/

 doc/guides/rel_notes/deprecation.rst | 4 
 1 file changed, 4 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst 
b/doc/guides/rel_notes/deprecation.rst
index 9584d6bfd7..3adbde9e94 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -11,6 +11,10 @@ here.
 Deprecation Notices
 ---
 
+* eal: The lcore state FINISHED will be removed from the enum
+  rte_lcore_state_t. The lcore state WAIT is enough to represent the same
+  state.
+
 * kvargs: The function ``rte_kvargs_process`` will get a new parameter
   for returning key match count. It will ease handling of no-match case.
 
-- 
2.17.1



[dpdk-dev] [RFC] eal: simplify the implementation of rte_ctrl_thread_create

2021-07-30 Thread Honnappa Nagarahalli
The current described behaviour of rte_ctrl_thread_create is
rigid which makes the implementation of the function complex.
The behavior is abstracted to allow for simplified implementation.

Signed-off-by: Honnappa Nagarahalli 
---
 lib/eal/common/eal_common_thread.c | 65 +-
 lib/eal/include/rte_lcore.h|  8 ++--
 2 files changed, 32 insertions(+), 41 deletions(-)

diff --git a/lib/eal/common/eal_common_thread.c 
b/lib/eal/common/eal_common_thread.c
index 1a52f42a2b..86cacd840c 100644
--- a/lib/eal/common/eal_common_thread.c
+++ b/lib/eal/common/eal_common_thread.c
@@ -169,35 +169,35 @@ __rte_thread_uninit(void)
 struct rte_thread_ctrl_params {
void *(*start_routine)(void *);
void *arg;
-   pthread_barrier_t configured;
-   unsigned int refcnt;
+   int ret;
+   /* Synchronization variable between the control thread
+* and the thread calling rte_ctrl_thread_create function.
+* 0 - Initialized
+* 1 - Control thread is running successfully
+* 2 - Control thread encountered an error. 'ret' has the
+* error code.
+*/
+   unsigned int sync;
 };
 
-static void ctrl_params_free(struct rte_thread_ctrl_params *params)
-{
-   if (__atomic_sub_fetch(¶ms->refcnt, 1, __ATOMIC_ACQ_REL) == 0) {
-   (void)pthread_barrier_destroy(¶ms->configured);
-   free(params);
-   }
-}
-
 static void *ctrl_thread_init(void *arg)
 {
struct internal_config *internal_conf =
eal_get_internal_configuration();
rte_cpuset_t *cpuset = &internal_conf->ctrl_cpuset;
struct rte_thread_ctrl_params *params = arg;
-   void *(*start_routine)(void *);
+   void *(*start_routine)(void *) = params->start_routine;
void *routine_arg = params->arg;
 
__rte_thread_init(rte_lcore_id(), cpuset);
-
-   pthread_barrier_wait(¶ms->configured);
-   start_routine = params->start_routine;
-   ctrl_params_free(params);
-
-   if (start_routine == NULL)
+   params->ret = pthread_setaffinity_np(pthread_self(),
+   sizeof(*cpuset), cpuset);
+   if (params->ret != 0) {
+   params->sync = 2;
return NULL;
+   }
+
+   params->sync = 1;
 
return start_routine(routine_arg);
 }
@@ -207,9 +207,6 @@ rte_ctrl_thread_create(pthread_t *thread, const char *name,
const pthread_attr_t *attr,
void *(*start_routine)(void *), void *arg)
 {
-   struct internal_config *internal_conf =
-   eal_get_internal_configuration();
-   rte_cpuset_t *cpuset = &internal_conf->ctrl_cpuset;
struct rte_thread_ctrl_params *params;
int ret;
 
@@ -219,15 +216,12 @@ rte_ctrl_thread_create(pthread_t *thread, const char 
*name,
 
params->start_routine = start_routine;
params->arg = arg;
-   params->refcnt = 2;
-
-   ret = pthread_barrier_init(¶ms->configured, NULL, 2);
-   if (ret != 0)
-   goto fail_no_barrier;
+   params->ret = 0;
+   params->sync = 0;
 
ret = pthread_create(thread, attr, ctrl_thread_init, (void *)params);
if (ret != 0)
-   goto fail_with_barrier;
+   goto thread_create_failed;
 
if (name != NULL) {
ret = rte_thread_setname(*thread, name);
@@ -236,24 +230,21 @@ rte_ctrl_thread_create(pthread_t *thread, const char 
*name,
"Cannot set name for ctrl thread\n");
}
 
-   ret = pthread_setaffinity_np(*thread, sizeof(*cpuset), cpuset);
-   if (ret != 0)
-   params->start_routine = NULL;
+   /* Wait for the control thread to initialize successfully */
+   while (!params->sync)
+   rte_pause();
+   ret = params->ret;
 
-   pthread_barrier_wait(¶ms->configured);
-   ctrl_params_free(params);
+   free(params);
 
-   if (ret != 0)
-   /* start_routine has been set to NULL above; */
-   /* ctrl thread will exit immediately */
+   if (params->sync != 1)
+   /* ctrl thread is exiting */
pthread_join(*thread, NULL);
 
return -ret;
 
-fail_with_barrier:
-   (void)pthread_barrier_destroy(¶ms->configured);
+thread_create_failed:
 
-fail_no_barrier:
free(params);
 
return -ret;
diff --git a/lib/eal/include/rte_lcore.h b/lib/eal/include/rte_lcore.h
index 1550b75da0..f1cc5e38dc 100644
--- a/lib/eal/include/rte_lcore.h
+++ b/lib/eal/include/rte_lcore.h
@@ -420,10 +420,10 @@ rte_thread_unregister(void);
 /**
  * Create a control thread.
  *
- * Wrapper to pthread_create(), pthread_setname_np() and
- * pthread_setaffinity_np(). The affinity of the new thread is based
- * on the CPU affinity retrieved at the time rte_eal_init() was called,
- * the dataplane and service lcores are then excluded.
+ * Creates a control thread with the given name and attribu

[dpdk-dev] [PATCH] doc: abstract the behaviour of rte_ctrl_thread_create

2021-07-30 Thread Honnappa Nagarahalli
The current expected behaviour of the function rte_ctrl_thread_create
is rigid which makes the implementation of the function complex.
Make the expected behaviour abstract to allow for simplified
implementation.

With this change, the calls to pthread_setaffinity_np can be moved
to the control thread. This will avoid the use of
pthread_barrier_wait and simplify the synchronization mechanism
between rte_ctrl_thread_create and the calling thread.

Signed-off-by: Honnappa Nagarahalli 
---
Possible patch is at:
http://patches.dpdk.org/project/dpdk/patch/20210730213709.19400-1-honnappa.nagaraha...@arm.com/

 doc/guides/rel_notes/deprecation.rst | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst 
b/doc/guides/rel_notes/deprecation.rst
index 9584d6bfd7..1960e3c8bf 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -11,6 +11,13 @@ here.
 Deprecation Notices
 ---
 
+* eal: The expected behaviour of the function ``rte_ctrl_thread_create``
+  abstracted to allow for simplified implementation. The new behaviour is
+  as follows:
+  Creates a control thread with the given name. The affinity of the new
+  thread is based on the CPU affinity retrieved at the time rte_eal_init()
+  was called, the dataplane and service lcores are then excluded.
+
 * kvargs: The function ``rte_kvargs_process`` will get a new parameter
   for returning key match count. It will ease handling of no-match case.
 
-- 
2.17.1



Re: [dpdk-dev] [PATCH v3 0/8] use compiler atomic builtins for test

2021-07-30 Thread Thomas Monjalon
> Joyce Kong (8):
>   test/ticketlock: use compiler atomics for lcores sync
>   test/spinlock: use compile atomics for lcores sync
>   test/rwlock: use compiler atomics for lcores sync
>   test/mcslock: use compiler atomics for lcores sync
>   test/mempool: remove unused variable for lcores sync
>   test/mempool_perf: use compiler atomics for lcores sync
>   test/service_cores: use compiler atomics for lock sync
>   test/rcu: use compiler atomics for data sync

Applied, thanks.





Re: [dpdk-dev] [PATCH] doc: announce security API changes for Inline IPsec

2021-07-30 Thread Thomas Monjalon
27/07/2021 19:36, Nithin Dabilpuram:
> Announce changes to make rte_security_set_pkt_metadata() and
> rte_security_get_userdata() inline instead of C functions and
> also addition of another field in structure rte_security_ctx for
> holding flags.

I guess there is a performance reason but the motivation
is not explained. Also it is going in the opposite direction
of what is discussed in the Technical Board meetings:
we should avoid and reduce the number of inline functions
to reduce the ABI surface.




[dpdk-dev] [PATCH v11 00/10] eal: Add EAL API for threading

2021-07-30 Thread Narcisa Ana Maria Vasile
From: Narcisa Vasile 

EAL thread API

**Problem Statement**
DPDK currently uses the pthread interface to create and manage threads.
Windows does not support the POSIX thread programming model,
so it currently
relies on a header file that hides the Windows calls under
pthread matched interfaces. Given that EAL should isolate the environment
specifics from the applications and libraries and mediate
all the communication with the operating systems, a new EAL interface
is needed for thread management.

**Goals**
* Introduce a generic EAL API for threading support that will remove
  the current Windows pthread.h shim.
* Replace references to pthread_* across the DPDK codebase with the new
  RTE_THREAD_* API.
* Allow users to choose between using the RTE_THREAD_* API or a
  3rd party thread library through a configuration option.

**Design plan**
New API main files:
* rte_thread.h (librte_eal/include)
* rte_thread.c (librte_eal/windows)
* rte_thread.c (librte_eal/common)

**A schematic example of the design**
--
lib/librte_eal/include/rte_thread.h
int rte_thread_create();

lib/librte_eal/common/rte_thread.c
int rte_thread_create() 
{
return pthread_create();
}

lib/librte_eal/windows/rte_thread.c
int rte_thread_create() 
{
return CreateThread();
}
-

**Thread attributes**

When or after a thread is created, specific characteristics of the thread
can be adjusted. Given that the thread characteristics that are of interest
for DPDK applications are affinity and priority, the following structure
that represents thread attributes has been defined:

typedef struct
{
enum rte_thread_priority priority;
rte_cpuset_t cpuset;
} rte_thread_attr_t;

The *rte_thread_create()* function can optionally receive
an rte_thread_attr_t
object that will cause the thread to be created with the
affinity and priority
described by the attributes object. If no rte_thread_attr_t is passed
(parameter is NULL), the default affinity and priority are used.
An rte_thread_attr_t object can also be set to the default values
by calling *rte_thread_attr_init()*.

*Priority* is represented through an enum that currently advertises
two values for priority:
- RTE_THREAD_PRIORITY_NORMAL
- RTE_THREAD_PRIORITY_REALTIME_CRITICAL
The enum can be extended to allow for multiple priority levels.
rte_thread_set_priority  - sets the priority of a thread
rte_thread_attr_set_priority - updates an rte_thread_attr_t object
   with a new value for priority

The user can choose thread priority through an EAL parameter,
when starting an application.  If EAL parameter is not used,
the per-platform default value for thread priority is used.
Otherwise administrator has an option to set one of available options:
 --thread-prio normal
 --thread-prio realtime

Example:
./dpdk-l2fwd -l 0-3 -n 4 –thread-prio normal -- -q 8 -p 

*Affinity* is described by the already known “rte_cpuset_t” type.
rte_thread_attr_set/get_affinity - sets/gets the affinity field in a
   rte_thread_attr_t object
rte_thread_set/get_affinity  – sets/gets the affinity of a thread

**Errors**
A translation function that maps Windows error codes to errno-style
error codes is provided. 

**Future work**
The long term plan is for EAL to provide full threading support:
* Add support for conditional variables
* Add support for pthread_mutex_trylock
* Additional functionality offered by pthread_*
  (such as pthread_setname_np, etc.)

v11:
 - Add unit tests for thread API
 - Rebase

v10:
 - Remove patch no. 10. It will be broken down in subpatches 
   and sent as a different patchset that depends on this one.
   This is done due to the ABI breaks that would be caused by patch 10.
 - Replace unix/rte_thread.c with common/rte_thread.c
 - Remove initializations that may prevent compiler from issuing useful
   warnings.
 - Remove rte_thread_types.h and rte_windows_thread_types.h
 - Remove unneeded priority macros (EAL_THREAD_PRIORITY*)
 - Remove functions that retrieves thread handle from process handle
 - Remove rte_thread_cancel() until same behavior is obtained on
   all platforms.
 - Fix rte_thread_detach() function description,
   return value and remove empty line.
 - Reimplement mutex functions. Add compatible representation for mutex
   identifier. Add macro to replace static mutex initialization instances.
 - Fix commit messages (lines too long, remove unicode symbols)

v9:
- Sign patches

v8:
- Rebase
- Add rte_thread_detach() API
- Set default priority, when user did not specify a value

v7:
Based on DmitryK's review:
- Change thread id representation
- Change mutex id representation
- Implement static mutex inititalizer for Windows
- Change barrier identifier representation
- Improve commit messages
- Add missing doxygen comments
- Split error translation function
- Improve name for affi

[dpdk-dev] [PATCH v11 01/10] eal: add basic threading functions

2021-07-30 Thread Narcisa Ana Maria Vasile
From: Narcisa Vasile 

Use a portable, type-safe representation for the thread identifier.
Add functions for comparing thread ids and obtaining the thread id
for the current thread.

Signed-off-by: Narcisa Vasile 
---
 lib/eal/common/meson.build|  1 +
 lib/eal/{unix => common}/rte_thread.c | 57 ---
 lib/eal/include/rte_thread.h  | 48 +-
 lib/eal/unix/meson.build  |  1 -
 lib/eal/version.map   |  3 ++
 lib/eal/windows/rte_thread.c  | 17 
 6 files changed, 95 insertions(+), 32 deletions(-)
 rename lib/eal/{unix => common}/rte_thread.c (66%)

diff --git a/lib/eal/common/meson.build b/lib/eal/common/meson.build
index edfca9..eda250247b 100644
--- a/lib/eal/common/meson.build
+++ b/lib/eal/common/meson.build
@@ -80,6 +80,7 @@ sources += files(
 'rte_random.c',
 'rte_reciprocal.c',
 'rte_service.c',
+'rte_thread.c',
 'rte_version.c',
 )
 
diff --git a/lib/eal/unix/rte_thread.c b/lib/eal/common/rte_thread.c
similarity index 66%
rename from lib/eal/unix/rte_thread.c
rename to lib/eal/common/rte_thread.c
index c72d619ec1..92a7451b0a 100644
--- a/lib/eal/unix/rte_thread.c
+++ b/lib/eal/common/rte_thread.c
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright 2021 Mellanox Technologies, Ltd
+ * Copyright(c) 2021 Microsoft Corporation
  */
 
 #include 
@@ -16,25 +17,41 @@ struct eal_tls_key {
pthread_key_t thread_index;
 };
 
+rte_thread_t
+rte_thread_self(void)
+{
+   rte_thread_t thread_id;
+
+   thread_id.opaque_id = (uintptr_t)pthread_self();
+
+   return thread_id;
+}
+
+int
+rte_thread_equal(rte_thread_t t1, rte_thread_t t2)
+{
+   return pthread_equal((pthread_t)t1.opaque_id, (pthread_t)t2.opaque_id);
+}
+
 int
 rte_thread_key_create(rte_thread_key *key, void (*destructor)(void *))
 {
int err;
+   rte_thread_key k;
 
-   *key = malloc(sizeof(**key));
-   if ((*key) == NULL) {
+   k = malloc(sizeof(*k));
+   if (k == NULL) {
RTE_LOG(DEBUG, EAL, "Cannot allocate TLS key.\n");
-   rte_errno = ENOMEM;
-   return -1;
+   return EINVAL;
}
-   err = pthread_key_create(&((*key)->thread_index), destructor);
-   if (err) {
+   err = pthread_key_create(&(k->thread_index), destructor);
+   if (err != 0) {
RTE_LOG(DEBUG, EAL, "pthread_key_create failed: %s\n",
 strerror(err));
-   free(*key);
-   rte_errno = ENOEXEC;
-   return -1;
+   free(k);
+   return err;
}
+   *key = k;
return 0;
 }
 
@@ -43,18 +60,16 @@ rte_thread_key_delete(rte_thread_key key)
 {
int err;
 
-   if (!key) {
+   if (key == NULL) {
RTE_LOG(DEBUG, EAL, "Invalid TLS key.\n");
-   rte_errno = EINVAL;
-   return -1;
+   return EINVAL;
}
err = pthread_key_delete(key->thread_index);
-   if (err) {
+   if (err != 0) {
RTE_LOG(DEBUG, EAL, "pthread_key_delete failed: %s\n",
 strerror(err));
free(key);
-   rte_errno = ENOEXEC;
-   return -1;
+   return err;
}
free(key);
return 0;
@@ -65,17 +80,15 @@ rte_thread_value_set(rte_thread_key key, const void *value)
 {
int err;
 
-   if (!key) {
+   if (key == NULL) {
RTE_LOG(DEBUG, EAL, "Invalid TLS key.\n");
-   rte_errno = EINVAL;
-   return -1;
+   return EINVAL;
}
err = pthread_setspecific(key->thread_index, value);
-   if (err) {
+   if (err != 0) {
RTE_LOG(DEBUG, EAL, "pthread_setspecific failed: %s\n",
strerror(err));
-   rte_errno = ENOEXEC;
-   return -1;
+   return err;
}
return 0;
 }
@@ -83,7 +96,7 @@ rte_thread_value_set(rte_thread_key key, const void *value)
 void *
 rte_thread_value_get(rte_thread_key key)
 {
-   if (!key) {
+   if (key == NULL) {
RTE_LOG(DEBUG, EAL, "Invalid TLS key.\n");
rte_errno = EINVAL;
return NULL;
diff --git a/lib/eal/include/rte_thread.h b/lib/eal/include/rte_thread.h
index 8be8ed8f36..748f64d230 100644
--- a/lib/eal/include/rte_thread.h
+++ b/lib/eal/include/rte_thread.h
@@ -1,6 +1,8 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2021 Mellanox Technologies, Ltd
+ * Copyright(c) 2021 Microsoft Corporation
  */
+#include 
 
 #include 
 #include 
@@ -20,11 +22,45 @@
 extern "C" {
 #endif
 
+#include 
+
+/**
+ * Thread id descriptor.
+ */
+typedef struct rte_thread_tag {
+   uintptr_t opaque_id; /**< thread identifier */
+} rte_thread_t;
+
 /**
  * TLS key type, an opaque pointer.
  */
 typedef struct eal_tls_key *

[dpdk-dev] [PATCH v11 02/10] eal: add thread attributes

2021-07-30 Thread Narcisa Ana Maria Vasile
From: Narcisa Vasile 

Implement thread attributes for:
* thread affinity
* thread priority
Implement functions for managing thread attributes.

Priority is represented through an enum that allows for two levels:
- RTE_THREAD_PRIORITY_NORMAL
- RTE_THREAD_PRIORITY_REALTIME_CRITICAL

Affinity is described by the rte_cpuset_t type.

An rte_thread_attr_t object can be set to the default values
by calling rte_thread_attr_init().

Signed-off-by: Narcisa Vasile 
---
 lib/eal/common/rte_thread.c  | 46 ++
 lib/eal/include/rte_thread.h | 93 
 lib/eal/version.map  |  4 ++
 lib/eal/windows/rte_thread.c | 44 +
 4 files changed, 187 insertions(+)

diff --git a/lib/eal/common/rte_thread.c b/lib/eal/common/rte_thread.c
index 92a7451b0a..e1a4d7eae4 100644
--- a/lib/eal/common/rte_thread.c
+++ b/lib/eal/common/rte_thread.c
@@ -9,6 +9,7 @@
 #include 
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -33,6 +34,51 @@ rte_thread_equal(rte_thread_t t1, rte_thread_t t2)
return pthread_equal((pthread_t)t1.opaque_id, (pthread_t)t2.opaque_id);
 }
 
+int
+rte_thread_attr_init(rte_thread_attr_t *attr)
+{
+   RTE_VERIFY(attr != NULL);
+
+   CPU_ZERO(&attr->cpuset);
+   attr->priority = RTE_THREAD_PRIORITY_NORMAL;
+
+   return 0;
+}
+
+int
+rte_thread_attr_set_affinity(rte_thread_attr_t *thread_attr,
+rte_cpuset_t *cpuset)
+{
+   RTE_VERIFY(thread_attr != NULL);
+   RTE_VERIFY(cpuset != NULL);
+
+   thread_attr->cpuset = *cpuset;
+
+   return 0;
+}
+
+int
+rte_thread_attr_get_affinity(rte_thread_attr_t *thread_attr,
+rte_cpuset_t *cpuset)
+{
+   RTE_VERIFY(thread_attr != NULL);
+   RTE_VERIFY(cpuset != NULL);
+
+   *cpuset = thread_attr->cpuset;
+
+   return 0;
+}
+
+int
+rte_thread_attr_set_priority(rte_thread_attr_t *thread_attr,
+enum rte_thread_priority priority)
+{
+   RTE_VERIFY(thread_attr != NULL);
+
+   thread_attr->priority = priority;
+   return 0;
+}
+
 int
 rte_thread_key_create(rte_thread_key *key, void (*destructor)(void *))
 {
diff --git a/lib/eal/include/rte_thread.h b/lib/eal/include/rte_thread.h
index 748f64d230..032ff73b36 100644
--- a/lib/eal/include/rte_thread.h
+++ b/lib/eal/include/rte_thread.h
@@ -31,6 +31,30 @@ typedef struct rte_thread_tag {
uintptr_t opaque_id; /**< thread identifier */
 } rte_thread_t;
 
+/**
+ * Thread priority values.
+ */
+enum rte_thread_priority {
+   RTE_THREAD_PRIORITY_UNDEFINED = 0,
+   /**< priority hasn't been defined */
+   RTE_THREAD_PRIORITY_NORMAL= 1,
+   /**< normal thread priority, the default */
+   RTE_THREAD_PRIORITY_REALTIME_CRITICAL = 2,
+   /**< highest thread priority allowed */
+};
+
+#ifdef RTE_HAS_CPUSET
+
+/**
+ * Representation for thread attributes.
+ */
+typedef struct {
+   enum rte_thread_priority priority; /**< thread priority */
+   rte_cpuset_t cpuset; /**< thread affinity */
+} rte_thread_attr_t;
+
+#endif /* RTE_HAS_CPUSET */
+
 /**
  * TLS key type, an opaque pointer.
  */
@@ -63,6 +87,75 @@ int rte_thread_equal(rte_thread_t t1, rte_thread_t t2);
 
 #ifdef RTE_HAS_CPUSET
 
+/**
+ * Initialize the attributes of a thread.
+ * These attributes can be passed to the rte_thread_create() function
+ * that will create a new thread and set its attributes according to attr.
+ *
+ * @param attr
+ *   Thread attributes to initialize.
+ *
+ * @return
+ *   On success, return 0.
+ *   On failure, return a positive errno-style error number.
+ */
+__rte_experimental
+int rte_thread_attr_init(rte_thread_attr_t *attr);
+
+/**
+ * Set the CPU affinity value in the thread attributes pointed to
+ * by 'thread_attr'.
+ *
+ * @param thread_attr
+ *   Points to the thread attributes in which affinity will be updated.
+ *
+ * @param cpuset
+ *   Points to the value of the affinity to be set.
+ *
+ * @return
+ *   On success, return 0.
+ *   On failure, return a positive errno-style error number.
+ */
+__rte_experimental
+int rte_thread_attr_set_affinity(rte_thread_attr_t *thread_attr,
+   rte_cpuset_t *cpuset);
+
+/**
+ * Get the value of CPU affinity that is set in the thread attributes pointed
+ * to by 'thread_attr'.
+ *
+ * @param thread_attr
+ *   Points to the thread attributes from which affinity will be retrieved.
+ *
+ * @param cpuset
+ *   Pointer to the memory that will store the affinity.
+ *
+ * @return
+ *   On success, return 0.
+ *   On failure, return a positive errno-style error number.
+ */
+__rte_experimental
+int rte_thread_attr_get_affinity(rte_thread_attr_t *thread_attr,
+   rte_cpuset_t *cpuset);
+
+/**
+ * Set the thread priority value in the thread attributes pointed to
+ * by 'thread_attr'.
+ *
+ * @param thread_attr
+ *   Points to the thread attributes in which priority will be updated.
+ *
+ * @param priority
+ *   Po

[dpdk-dev] [PATCH v11 03/10] eal/windows: translate Windows errors to errno-style errors

2021-07-30 Thread Narcisa Ana Maria Vasile
From: Narcisa Vasile 

Add function to translate Windows error codes to
errno-style error codes. The possible return values are chosen
so that we have as much semantical compatibility between platforms as
possible.

Signed-off-by: Narcisa Vasile 
---
 lib/eal/common/rte_thread.c  |  6 +--
 lib/eal/include/rte_thread.h |  5 +-
 lib/eal/windows/rte_thread.c | 95 +++-
 3 files changed, 76 insertions(+), 30 deletions(-)

diff --git a/lib/eal/common/rte_thread.c b/lib/eal/common/rte_thread.c
index e1a4d7eae4..27ad1c7eb0 100644
--- a/lib/eal/common/rte_thread.c
+++ b/lib/eal/common/rte_thread.c
@@ -47,7 +47,7 @@ rte_thread_attr_init(rte_thread_attr_t *attr)
 
 int
 rte_thread_attr_set_affinity(rte_thread_attr_t *thread_attr,
-rte_cpuset_t *cpuset)
+   rte_cpuset_t *cpuset)
 {
RTE_VERIFY(thread_attr != NULL);
RTE_VERIFY(cpuset != NULL);
@@ -59,7 +59,7 @@ rte_thread_attr_set_affinity(rte_thread_attr_t *thread_attr,
 
 int
 rte_thread_attr_get_affinity(rte_thread_attr_t *thread_attr,
-rte_cpuset_t *cpuset)
+   rte_cpuset_t *cpuset)
 {
RTE_VERIFY(thread_attr != NULL);
RTE_VERIFY(cpuset != NULL);
@@ -71,7 +71,7 @@ rte_thread_attr_get_affinity(rte_thread_attr_t *thread_attr,
 
 int
 rte_thread_attr_set_priority(rte_thread_attr_t *thread_attr,
-enum rte_thread_priority priority)
+   enum rte_thread_priority priority)
 {
RTE_VERIFY(thread_attr != NULL);
 
diff --git a/lib/eal/include/rte_thread.h b/lib/eal/include/rte_thread.h
index 032ff73b36..bf649c2fe6 100644
--- a/lib/eal/include/rte_thread.h
+++ b/lib/eal/include/rte_thread.h
@@ -235,9 +235,8 @@ int rte_thread_value_set(rte_thread_key key, const void 
*value);
  *
  * @return
  *   On success, value data pointer (can also be NULL).
- *   On failure, NULL and an error number is set in rte_errno.
- *   rte_errno can be: EINVAL  - Invalid parameter passed.
- * ENOEXEC - Specific OS error.
+ *   On failure, NULL and a positive error number is set in rte_errno.
+ *
  */
 __rte_experimental
 void *rte_thread_value_get(rte_thread_key key);
diff --git a/lib/eal/windows/rte_thread.c b/lib/eal/windows/rte_thread.c
index 01966e7745..c1ecfbd6ae 100644
--- a/lib/eal/windows/rte_thread.c
+++ b/lib/eal/windows/rte_thread.c
@@ -13,6 +13,54 @@ struct eal_tls_key {
DWORD thread_index;
 };
 
+/* Translates the most common error codes related to threads */
+static int
+thread_translate_win32_error(DWORD error)
+{
+   switch (error) {
+   case ERROR_SUCCESS:
+   return 0;
+
+   case ERROR_INVALID_PARAMETER:
+   return EINVAL;
+
+   case ERROR_INVALID_HANDLE:
+   return EFAULT;
+
+   case ERROR_NOT_ENOUGH_MEMORY:
+   /* FALLTHROUGH */
+   case ERROR_NO_SYSTEM_RESOURCES:
+   return ENOMEM;
+
+   case ERROR_PRIVILEGE_NOT_HELD:
+   /* FALLTHROUGH */
+   case ERROR_ACCESS_DENIED:
+   return EACCES;
+
+   case ERROR_ALREADY_EXISTS:
+   return EEXIST;
+
+   case ERROR_POSSIBLE_DEADLOCK:
+   return EDEADLK;
+
+   case ERROR_INVALID_FUNCTION:
+   /* FALLTHROUGH */
+   case ERROR_CALL_NOT_IMPLEMENTED:
+   return ENOSYS;
+   }
+
+   return EINVAL;
+}
+
+static int
+thread_log_last_error(const char *message)
+{
+   DWORD error = GetLastError();
+   RTE_LOG(DEBUG, EAL, "GetLastError()=%lu: %s\n", error, message);
+
+   return thread_translate_win32_error(error);
+}
+
 rte_thread_t
 rte_thread_self(void)
 {
@@ -42,7 +90,7 @@ rte_thread_attr_init(rte_thread_attr_t *attr)
 
 int
 rte_thread_attr_set_affinity(rte_thread_attr_t *thread_attr,
-rte_cpuset_t *cpuset)
+   rte_cpuset_t *cpuset)
 {
RTE_VERIFY(thread_attr != NULL);
thread_attr->cpuset = *cpuset;
@@ -52,7 +100,7 @@ rte_thread_attr_set_affinity(rte_thread_attr_t *thread_attr,
 
 int
 rte_thread_attr_get_affinity(rte_thread_attr_t *thread_attr,
-rte_cpuset_t *cpuset)
+   rte_cpuset_t *cpuset)
 {
RTE_VERIFY(thread_attr != NULL);
 
@@ -63,7 +111,7 @@ rte_thread_attr_get_affinity(rte_thread_attr_t *thread_attr,
 
 int
 rte_thread_attr_set_priority(rte_thread_attr_t *thread_attr,
-enum rte_thread_priority priority)
+   enum rte_thread_priority priority)
 {
RTE_VERIFY(thread_attr != NULL);
 
@@ -76,18 +124,18 @@ int
 rte_thread_key_create(rte_thread_key *key,
__rte_unused void (*destructor)(void *))
 {
+   int ret;
+
*key = malloc(sizeof(**key));
if ((*key) == NULL) {
RTE_LOG(DEBUG, EAL, "Cannot allocate TLS key.\n");
-   rte_errno = ENOMEM;
-   return -1;
+   return ENOMEM;
}
(*key)->thread_index = TlsAlloc();

[dpdk-dev] [PATCH v11 05/10] eal: implement thread priority management functions

2021-07-30 Thread Narcisa Ana Maria Vasile
From: Narcisa Vasile 

Add function for setting the priority for a thread.
Priorities on multiple platforms are similarly determined by
a priority value and a priority class/policy.

On Linux, the following mapping is created:
RTE_THREAD_PRIORITY_NORMAL corresponds to
* policy SCHED_OTHER
* priority value:   (sched_get_priority_min(SCHED_OTHER) +
 sched_get_priority_max(SCHED_OTHER))/2;
RTE_THREAD_PRIORITY_REALTIME_CRITICAL corresponds to
* policy SCHED_RR
* priority value: sched_get_priority_max(SCHED_RR);

On Windows, the following mapping is created:
RTE_THREAD_PRIORITY_NORMAL corresponds to
* class NORMAL_PRIORITY_CLASS
* priority THREAD_PRIORITY_NORMAL
RTE_THREAD_PRIORITY_REALTIME_CRITICAL corresponds to
* class REALTIME_PRIORITY_CLASS
* priority THREAD_PRIORITY_TIME_CRITICAL

Signed-off-by: Narcisa Vasile 
---
 lib/eal/common/rte_thread.c  | 49 ++
 lib/eal/include/rte_thread.h | 17 ++
 lib/eal/version.map  |  1 +
 lib/eal/windows/rte_thread.c | 66 
 4 files changed, 133 insertions(+)

diff --git a/lib/eal/common/rte_thread.c b/lib/eal/common/rte_thread.c
index 73b7b3141c..fcebf7097c 100644
--- a/lib/eal/common/rte_thread.c
+++ b/lib/eal/common/rte_thread.c
@@ -50,6 +50,55 @@ rte_thread_get_affinity_by_id(rte_thread_t thread_id,
sizeof(*cpuset), cpuset);
 }
 
+static int
+thread_map_priority_to_os_value(enum rte_thread_priority eal_pri,
+   int *os_pri, int *pol)
+{
+   /* Clear the output parameters */
+   *os_pri = sched_get_priority_min(SCHED_OTHER) - 1;
+   *pol = -1;
+
+   switch (eal_pri) {
+   case RTE_THREAD_PRIORITY_NORMAL:
+   *pol = SCHED_OTHER;
+
+   /*
+* Choose the middle of the range to represent
+* the priority 'normal'.
+* On Linux, this should be 0, since both
+* sched_get_priority_min/_max return 0 for SCHED_OTHER.
+*/
+   *os_pri = (sched_get_priority_min(SCHED_OTHER) +
+   sched_get_priority_max(SCHED_OTHER))/2;
+   break;
+   case RTE_THREAD_PRIORITY_REALTIME_CRITICAL:
+   *pol = SCHED_RR;
+   *os_pri = sched_get_priority_max(SCHED_RR);
+   break;
+   default:
+   RTE_LOG(DEBUG, EAL, "The requested priority value is 
invalid.\n");
+   return EINVAL;
+   }
+   return 0;
+}
+
+int
+rte_thread_set_priority(rte_thread_t thread_id,
+   enum rte_thread_priority priority)
+{
+   int ret;
+   int policy;
+   struct sched_param param;
+
+   ret = thread_map_priority_to_os_value(priority, ¶m.sched_priority,
+   &policy);
+   if (ret != 0)
+   return ret;
+
+   return pthread_setschedparam((pthread_t)thread_id.opaque_id,
+   policy, ¶m);
+}
+
 int
 rte_thread_attr_init(rte_thread_attr_t *attr)
 {
diff --git a/lib/eal/include/rte_thread.h b/lib/eal/include/rte_thread.h
index ca4ade60e2..5514b2f57f 100644
--- a/lib/eal/include/rte_thread.h
+++ b/lib/eal/include/rte_thread.h
@@ -215,6 +215,23 @@ void rte_thread_get_affinity(rte_cpuset_t *cpusetp);
 
 #endif /* RTE_HAS_CPUSET */
 
+/**
+ * Set the priority of a thread.
+ *
+ * @param thread_id
+ *Id of the thread for which to set priority.
+ *
+ * @param priority
+ *   Priority value to be set.
+ *
+ * @return
+ *   On success, return 0.
+ *   On failure, return a positive errno-style error number.
+ */
+__rte_experimental
+int rte_thread_set_priority(rte_thread_t thread_id,
+   enum rte_thread_priority priority);
+
 /**
  * Create a TLS data key visible to all threads in the process.
  * the created key is later used to get/set a value.
diff --git a/lib/eal/version.map b/lib/eal/version.map
index 7ed4cd779e..df01e4 100644
--- a/lib/eal/version.map
+++ b/lib/eal/version.map
@@ -435,6 +435,7 @@ EXPERIMENTAL {
rte_thread_attr_set_priority;
rte_thread_get_affinity_by_id;
rte_thread_set_affinity_by_id;
+   rte_thread_set_priority;
 };
 
 INTERNAL {
diff --git a/lib/eal/windows/rte_thread.c b/lib/eal/windows/rte_thread.c
index 0127119f49..fb04718f58 100644
--- a/lib/eal/windows/rte_thread.c
+++ b/lib/eal/windows/rte_thread.c
@@ -200,6 +200,72 @@ rte_thread_get_affinity_by_id(rte_thread_t thread_id,
return ret;
 }
 
+static int
+thread_map_priority_to_os_value(enum rte_thread_priority eal_pri,
+   int *os_pri, int *pri_class)
+{
+   /* Clear the output parameters */
+   *os_pri = -1;
+   *pri_class = -1;
+
+   switch (eal_pri) {
+   case RTE_THREAD_PRIORITY_NORMAL:
+   *pri_class = NORMAL_PRIORITY_CLASS;
+   *os_pri = THREAD_PRIORITY_NORMAL;
+   break;
+   case RTE_THREAD_PRIORITY_REALTIME_CRITICAL:
+   *pri_class = REALTIME_PRIORITY_CLASS;
+   *os_pri = THREAD_PRIORITY

[dpdk-dev] [PATCH v11 04/10] eal: implement functions for thread affinity management

2021-07-30 Thread Narcisa Ana Maria Vasile
From: Narcisa Vasile 

Implement functions for getting/setting thread affinity.
Threads can be pinned to specific cores by setting their
affinity attribute.

Signed-off-by: Narcisa Vasile 
Signed-off-by: Dmitry Malloy 
---
 lib/eal/common/rte_thread.c   |  16 
 lib/eal/include/rte_thread.h  |  36 +++
 lib/eal/version.map   |   2 +
 lib/eal/windows/eal_lcore.c   | 176 +-
 lib/eal/windows/eal_windows.h |  10 ++
 lib/eal/windows/rte_thread.c  | 125 +++-
 6 files changed, 319 insertions(+), 46 deletions(-)

diff --git a/lib/eal/common/rte_thread.c b/lib/eal/common/rte_thread.c
index 27ad1c7eb0..73b7b3141c 100644
--- a/lib/eal/common/rte_thread.c
+++ b/lib/eal/common/rte_thread.c
@@ -34,6 +34,22 @@ rte_thread_equal(rte_thread_t t1, rte_thread_t t2)
return pthread_equal((pthread_t)t1.opaque_id, (pthread_t)t2.opaque_id);
 }
 
+int
+rte_thread_set_affinity_by_id(rte_thread_t thread_id,
+   const rte_cpuset_t *cpuset)
+{
+   return pthread_setaffinity_np((pthread_t)thread_id.opaque_id,
+   sizeof(*cpuset), cpuset);
+}
+
+int
+rte_thread_get_affinity_by_id(rte_thread_t thread_id,
+   rte_cpuset_t *cpuset)
+{
+   return pthread_getaffinity_np((pthread_t)thread_id.opaque_id,
+   sizeof(*cpuset), cpuset);
+}
+
 int
 rte_thread_attr_init(rte_thread_attr_t *attr)
 {
diff --git a/lib/eal/include/rte_thread.h b/lib/eal/include/rte_thread.h
index bf649c2fe6..ca4ade60e2 100644
--- a/lib/eal/include/rte_thread.h
+++ b/lib/eal/include/rte_thread.h
@@ -87,6 +87,42 @@ int rte_thread_equal(rte_thread_t t1, rte_thread_t t2);
 
 #ifdef RTE_HAS_CPUSET
 
+/**
+ * Set the affinity of thread 'thread_id' to the cpu set
+ * specified by 'cpuset'.
+ *
+ * @param thread_id
+ *Id of the thread for which to set the affinity.
+ *
+ * @param cpuset
+ *   Pointer to CPU affinity to set.
+ *
+ * @return
+ *   On success, return 0.
+ *   On failure, return a positive errno-style error number.
+ */
+__rte_experimental
+int rte_thread_set_affinity_by_id(rte_thread_t thread_id,
+   const rte_cpuset_t *cpuset);
+
+/**
+ * Get the affinity of thread 'thread_id' and store it
+ * in 'cpuset'.
+ *
+ * @param thread_id
+ *Id of the thread for which to get the affinity.
+ *
+ * @param cpuset
+ *   Pointer for storing the affinity value.
+ *
+ * @return
+ *   On success, return 0.
+ *   On failure, return a positive errno-style error number.
+ */
+__rte_experimental
+int rte_thread_get_affinity_by_id(rte_thread_t thread_id,
+   rte_cpuset_t *cpuset);
+
 /**
  * Initialize the attributes of a thread.
  * These attributes can be passed to the rte_thread_create() function
diff --git a/lib/eal/version.map b/lib/eal/version.map
index 9ffa5eb15e..7ed4cd779e 100644
--- a/lib/eal/version.map
+++ b/lib/eal/version.map
@@ -433,6 +433,8 @@ EXPERIMENTAL {
rte_thread_attr_get_affinity;
rte_thread_attr_set_affinity;
rte_thread_attr_set_priority;
+   rte_thread_get_affinity_by_id;
+   rte_thread_set_affinity_by_id;
 };
 
 INTERNAL {
diff --git a/lib/eal/windows/eal_lcore.c b/lib/eal/windows/eal_lcore.c
index 476c2d2bdf..295af50698 100644
--- a/lib/eal/windows/eal_lcore.c
+++ b/lib/eal/windows/eal_lcore.c
@@ -2,7 +2,6 @@
  * Copyright(c) 2019 Intel Corporation
  */
 
-#include 
 #include 
 #include 
 
@@ -27,13 +26,15 @@ struct socket_map {
 };
 
 struct cpu_map {
-   unsigned int socket_count;
unsigned int lcore_count;
+   unsigned int socket_count;
+   unsigned int cpu_count;
struct lcore_map lcores[RTE_MAX_LCORE];
struct socket_map sockets[RTE_MAX_NUMA_NODES];
+   GROUP_AFFINITY cpus[CPU_SETSIZE];
 };
 
-static struct cpu_map cpu_map = { 0 };
+static struct cpu_map cpu_map;
 
 /* eal_create_cpu_map() is called before logging is initialized */
 static void
@@ -47,13 +48,118 @@ log_early(const char *format, ...)
va_end(va);
 }
 
+static int
+eal_query_group_affinity(void)
+{
+   SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX *infos = NULL;
+   unsigned int *cpu_count = &cpu_map.cpu_count;
+   DWORD infos_size = 0;
+   int ret = 0;
+   USHORT group_count;
+   KAFFINITY affinity;
+   USHORT group_no;
+   unsigned int i;
+
+   if (!GetLogicalProcessorInformationEx(RelationGroup, NULL,
+ &infos_size)) {
+   DWORD error = GetLastError();
+   if (error != ERROR_INSUFFICIENT_BUFFER) {
+   log_early("Cannot get group information size, "
+ "error %lu\n", error);
+   rte_errno = EINVAL;
+   ret = -1;
+   goto cleanup;
+   }
+   }
+
+   infos = malloc(infos_size);
+   if (infos == NULL) {
+   log_early("Cannot allocate memory for NUMA node information\n");
+   rte_errno = ENOMEM;
+  

[dpdk-dev] [PATCH v11 07/10] eal: implement functions for mutex management

2021-07-30 Thread Narcisa Ana Maria Vasile
From: Narcisa Vasile 

Add functions for mutex init, destroy, lock, unlock.

Add RTE_STATIC_MUTEX macro to replace static initialization
of mutexes.
Windows does not have a static initializer.
Initialization is only done through InitializeCriticalSection().

The RTE_STATIC_MUTEX calls into the rte_thread_mutex_init()
function that performs the actual mutex initialization.

Signed-off-by: Narcisa Vasile 
---
 lib/eal/common/rte_thread.c  | 61 +++
 lib/eal/include/rte_thread.h | 94 
 lib/eal/version.map  |  4 ++
 lib/eal/windows/rte_thread.c | 53 
 4 files changed, 212 insertions(+)

diff --git a/lib/eal/common/rte_thread.c b/lib/eal/common/rte_thread.c
index a0a51bc190..ebae4a8af1 100644
--- a/lib/eal/common/rte_thread.c
+++ b/lib/eal/common/rte_thread.c
@@ -251,6 +251,67 @@ rte_thread_detach(rte_thread_t thread_id)
return pthread_detach((pthread_t)thread_id.opaque_id);
 }
 
+int
+rte_thread_mutex_init(rte_thread_mutex *mutex)
+{
+   int ret = 0;
+   pthread_mutex_t *m = NULL;
+
+   RTE_VERIFY(mutex != NULL);
+
+   m = calloc(1, sizeof(*m));
+   if (m == NULL) {
+   RTE_LOG(DEBUG, EAL, "Unable to initialize mutex. Insufficient 
memory!\n");
+   ret = ENOMEM;
+   goto cleanup;
+   }
+
+   ret = pthread_mutex_init(m, NULL);
+   if (ret != 0) {
+   RTE_LOG(DEBUG, EAL, "Failed to init mutex. ret = %d\n", ret);
+   goto cleanup;
+   }
+
+   mutex->mutex_id = m;
+   m = NULL;
+
+cleanup:
+   free(m);
+   return ret;
+}
+
+int
+rte_thread_mutex_lock(rte_thread_mutex *mutex)
+{
+   RTE_VERIFY(mutex != NULL);
+
+   return pthread_mutex_lock((pthread_mutex_t *)mutex->mutex_id);
+}
+
+int
+rte_thread_mutex_unlock(rte_thread_mutex *mutex)
+{
+   RTE_VERIFY(mutex != NULL);
+
+   return pthread_mutex_unlock((pthread_mutex_t *)mutex->mutex_id);
+}
+
+int
+rte_thread_mutex_destroy(rte_thread_mutex *mutex)
+{
+   int ret = 0;
+   RTE_VERIFY(mutex != NULL);
+
+   ret = pthread_mutex_destroy((pthread_mutex_t *)mutex->mutex_id);
+   if (ret != 0)
+   RTE_LOG(DEBUG, EAL, "Unable to destroy mutex, ret = %d\n", ret);
+
+   free(mutex->mutex_id);
+   mutex->mutex_id = NULL;
+
+   return ret;
+}
+
 int
 rte_thread_key_create(rte_thread_key *key, void (*destructor)(void *))
 {
diff --git a/lib/eal/include/rte_thread.h b/lib/eal/include/rte_thread.h
index 098c3ba343..7e813b573d 100644
--- a/lib/eal/include/rte_thread.h
+++ b/lib/eal/include/rte_thread.h
@@ -56,6 +56,26 @@ typedef struct {
 
 #endif /* RTE_HAS_CPUSET */
 
+#define RTE_DECLARE_MUTEX(private_lock)  rte_thread_mutex private_lock
+
+#define RTE_DEFINE_MUTEX(private_lock)\
+RTE_INIT(__rte_ ## private_lock ## _init)\
+{\
+   RTE_VERIFY(rte_thread_mutex_init(&private_lock) == 0);\
+}
+
+#define RTE_STATIC_MUTEX(private_lock)\
+static RTE_DECLARE_MUTEX(private_lock);\
+RTE_DEFINE_MUTEX(private_lock)
+
+
+/**
+ * Thread mutex representation.
+ */
+typedef struct rte_thread_mutex_tag {
+   void *mutex_id;  /**< mutex identifier */
+} rte_thread_mutex;
+
 /**
  * TLS key type, an opaque pointer.
  */
@@ -268,6 +288,28 @@ int rte_thread_join(rte_thread_t thread_id, unsigned long 
*value_ptr);
 __rte_experimental
 int rte_thread_detach(rte_thread_t thread_id);
 
+/**
+ * Set core affinity of the current thread.
+ * Support both EAL and non-EAL thread and update TLS.
+ *
+ * @param cpusetp
+ *   Pointer to CPU affinity to set.
+ *
+ * @return
+ *   On success, return 0; otherwise return -1;
+ */
+int rte_thread_set_affinity(rte_cpuset_t *cpusetp);
+
+/**
+ * Get core affinity of the current thread.
+ *
+ * @param cpusetp
+ *   Pointer to CPU affinity of current thread.
+ *   It presumes input is not NULL, otherwise it causes panic.
+ *
+ */
+void rte_thread_get_affinity(rte_cpuset_t *cpusetp);
+
 #endif /* RTE_HAS_CPUSET */
 
 /**
@@ -287,6 +329,58 @@ __rte_experimental
 int rte_thread_set_priority(rte_thread_t thread_id,
enum rte_thread_priority priority);
 
+/**
+ * Initializes a mutex.
+ *
+ * @param mutex
+ *The mutex to be initialized.
+ *
+ * @return
+ *   On success, return 0.
+ *   On failure, return a positive errno-style error number.
+ */
+__rte_experimental
+int rte_thread_mutex_init(rte_thread_mutex *mutex);
+
+/**
+ * Locks a mutex.
+ *
+ * @param mutex
+ *The mutex to be locked.
+ *
+ * @return
+ *   On success, return 0.
+ *   On failure, return a positive errno-style error number.
+ */
+__rte_experimental
+int rte_thread_mutex_lock(rte_thread_mutex *mutex);
+
+/**
+ * Unlocks a mutex.
+ *
+ * @param mutex
+ *The mutex to be unlocked.
+ *
+ * @return
+ *   On success, return 0.
+ *   On failure, return a positive errno-style error number.
+ */
+__rte_experimental
+int rte_thread_mutex_unlock(rte_thread_mutex *mutex);
+
+/**
+ * Releases all resources associated with a

[dpdk-dev] [PATCH v11 08/10] eal: implement functions for thread barrier management

2021-07-30 Thread Narcisa Ana Maria Vasile
From: Narcisa Vasile 

Add functions for barrier init, destroy, wait.

A portable type is used to represent a barrier identifier.
The rte_thread_barrier_wait() function returns the same value
on all platforms.

Signed-off-by: Narcisa Vasile 
---
 lib/eal/common/rte_thread.c  | 61 
 lib/eal/include/rte_thread.h | 58 ++
 lib/eal/version.map  |  3 ++
 lib/eal/windows/rte_thread.c | 56 +
 4 files changed, 178 insertions(+)

diff --git a/lib/eal/common/rte_thread.c b/lib/eal/common/rte_thread.c
index ebae4a8af1..3fdb267337 100644
--- a/lib/eal/common/rte_thread.c
+++ b/lib/eal/common/rte_thread.c
@@ -312,6 +312,67 @@ rte_thread_mutex_destroy(rte_thread_mutex *mutex)
return ret;
 }
 
+int
+rte_thread_barrier_init(rte_thread_barrier *barrier, int count)
+{
+   int ret = 0;
+   pthread_barrier_t *pthread_barrier = NULL;
+
+   RTE_VERIFY(barrier != NULL);
+   RTE_VERIFY(count > 0);
+
+   pthread_barrier = calloc(1, sizeof(*pthread_barrier));
+   if (pthread_barrier == NULL) {
+   RTE_LOG(DEBUG, EAL, "Unable to initialize barrier. Insufficient 
memory!\n");
+   ret = ENOMEM;
+   goto cleanup;
+   }
+   ret = pthread_barrier_init(pthread_barrier, NULL, count);
+   if (ret != 0) {
+   RTE_LOG(DEBUG, EAL, "Failed to init barrier, ret = %d\n", ret);
+   goto cleanup;
+   }
+
+   barrier->barrier_id = pthread_barrier;
+   pthread_barrier = NULL;
+
+cleanup:
+   free(pthread_barrier);
+   return ret;
+}
+
+int
+rte_thread_barrier_wait(rte_thread_barrier *barrier)
+{
+   int ret = 0;
+
+   RTE_VERIFY(barrier != NULL);
+   RTE_VERIFY(barrier->barrier_id != NULL);
+
+   ret = pthread_barrier_wait(barrier->barrier_id);
+   if (ret == PTHREAD_BARRIER_SERIAL_THREAD)
+   ret = RTE_THREAD_BARRIER_SERIAL_THREAD;
+
+   return ret;
+}
+
+int
+rte_thread_barrier_destroy(rte_thread_barrier *barrier)
+{
+   int ret = 0;
+
+   RTE_VERIFY(barrier != NULL);
+
+   ret = pthread_barrier_destroy(barrier->barrier_id);
+   if (ret != 0)
+   RTE_LOG(DEBUG, EAL, "Failed to destroy barrier: %d\n", ret);
+
+   free(barrier->barrier_id);
+   barrier->barrier_id = NULL;
+
+   return ret;
+}
+
 int
 rte_thread_key_create(rte_thread_key *key, void (*destructor)(void *))
 {
diff --git a/lib/eal/include/rte_thread.h b/lib/eal/include/rte_thread.h
index 7e813b573d..40da83467b 100644
--- a/lib/eal/include/rte_thread.h
+++ b/lib/eal/include/rte_thread.h
@@ -76,6 +76,18 @@ typedef struct rte_thread_mutex_tag {
void *mutex_id;  /**< mutex identifier */
 } rte_thread_mutex;
 
+/**
+ * Returned by rte_thread_barrier_wait() when call is successful.
+ */
+#define RTE_THREAD_BARRIER_SERIAL_THREAD -1
+
+/**
+ * Thread barrier representation.
+ */
+typedef struct rte_thread_barrier_tag {
+   void *barrier_id;  /**< barrrier identifier */
+} rte_thread_barrier;
+
 /**
  * TLS key type, an opaque pointer.
  */
@@ -381,6 +393,52 @@ int rte_thread_mutex_unlock(rte_thread_mutex *mutex);
 __rte_experimental
 int rte_thread_mutex_destroy(rte_thread_mutex *mutex);
 
+/**
+ * Initializes a synchronization barrier.
+ *
+ * @param barrier
+ *A pointer that references the newly created 'barrier' object.
+ *
+ * @param count
+ *The number of threads that must enter the barrier before
+ *the threads can continue execution.
+ *
+ * @return
+ *   On success, return 0.
+ *   On failure, return a positive errno-style error number.
+ */
+__rte_experimental
+int rte_thread_barrier_init(rte_thread_barrier *barrier, int count);
+
+/**
+ * Causes the calling thread to wait at the synchronization barrier 'barrier'.
+ *
+ * @param barrier
+ *The barrier used for synchronizing the threads.
+ *
+ * @return
+ *   Return RTE_THREAD_BARRIER_SERIAL_THREAD for the thread synchronized
+ *  at the barrier.
+ *   Return 0 for all other threads.
+ *   Return a positive errno-style error number, in case of failure.
+ */
+__rte_experimental
+int rte_thread_barrier_wait(rte_thread_barrier *barrier);
+
+/**
+ * Releases all resources used by a synchronization barrier
+ * and uninitializes it.
+ *
+ * @param barrier
+ *The barrier to be destroyed.
+ *
+ * @return
+ *   On success, return 0.
+ *   On failure, return a positive errno-style error number.
+ */
+__rte_experimental
+int rte_thread_barrier_destroy(rte_thread_barrier *barrier);
+
 /**
  * Create a TLS data key visible to all threads in the process.
  * the created key is later used to get/set a value.
diff --git a/lib/eal/version.map b/lib/eal/version.map
index a1c7a8e87d..c081fdd96c 100644
--- a/lib/eal/version.map
+++ b/lib/eal/version.map
@@ -443,6 +443,9 @@ EXPERIMENTAL {
rte_thread_mutex_lock;
rte_thread_mutex_unlock;
rte_thread_mutex_destroy;
+   rte_thread_barrier_init;
+   

[dpdk-dev] [PATCH v11 06/10] eal: add thread lifetime management

2021-07-30 Thread Narcisa Ana Maria Vasile
From: Narcisa Vasile 

Add functions for thread creation, joining, detaching.

The *rte_thread_create()* function can optionally receive
an rte_thread_attr_t object that will cause the thread to be
created with the affinity and priority described by the
attributes object. If no rte_thread_attr_t is passed (parameter is NULL),
the default affinity and priority are used.

On Windows, the function executed by a thread when the thread starts is
represeneted by a function pointer of type DWORD (*func) (void*).
On other platforms, the function pointer is a void* (*func) (void*).

Performing a cast between these two types of function pointers to
uniformize the API on all platforms may result in undefined behavior.
TO fix this issue, a wrapper that respects the signature required by
CreateThread() has been created on Windows.

Signed-off-by: Narcisa Vasile 
---
 lib/eal/common/rte_thread.c | 107 +
 lib/eal/include/rte_thread.h|  55 +
 lib/eal/version.map |   3 +
 lib/eal/windows/include/sched.h |   2 +-
 lib/eal/windows/rte_thread.c| 138 
 5 files changed, 304 insertions(+), 1 deletion(-)

diff --git a/lib/eal/common/rte_thread.c b/lib/eal/common/rte_thread.c
index fcebf7097c..a0a51bc190 100644
--- a/lib/eal/common/rte_thread.c
+++ b/lib/eal/common/rte_thread.c
@@ -144,6 +144,113 @@ rte_thread_attr_set_priority(rte_thread_attr_t 
*thread_attr,
return 0;
 }
 
+int
+rte_thread_create(rte_thread_t *thread_id,
+   const rte_thread_attr_t *thread_attr,
+   rte_thread_func thread_func, void *args)
+{
+   int ret = 0;
+   pthread_attr_t attr;
+   pthread_attr_t *attrp = NULL;
+   struct sched_param param = {
+   .sched_priority = 0,
+   };
+   int policy = SCHED_OTHER;
+
+   if (thread_attr != NULL) {
+   ret = pthread_attr_init(&attr);
+   if (ret != 0) {
+   RTE_LOG(DEBUG, EAL, "pthread_attr_init failed\n");
+   goto cleanup;
+   }
+
+   attrp = &attr;
+
+   if (thread_attr->priority != RTE_THREAD_PRIORITY_UNDEFINED) {
+   /*
+* Set the inherit scheduler parameter to explicit,
+* otherwise the priority attribute is ignored.
+*/
+   ret = pthread_attr_setinheritsched(attrp,
+   PTHREAD_EXPLICIT_SCHED);
+   if (ret != 0) {
+   RTE_LOG(DEBUG, EAL, 
"pthread_attr_setinheritsched failed\n");
+   goto cleanup;
+   }
+
+   ret = thread_map_priority_to_os_value(
+   thread_attr->priority,
+   ¶m.sched_priority, &policy
+   );
+   if (ret != 0)
+   goto cleanup;
+
+   ret = pthread_attr_setschedpolicy(attrp, policy);
+   if (ret != 0) {
+   RTE_LOG(DEBUG, EAL, 
"pthread_attr_setschedpolicy failed\n");
+   goto cleanup;
+   }
+
+   ret = pthread_attr_setschedparam(attrp, ¶m);
+   if (ret != 0) {
+   RTE_LOG(DEBUG, EAL, "pthread_attr_setschedparam 
failed\n");
+   goto cleanup;
+   }
+   }
+
+   if (CPU_COUNT(&thread_attr->cpuset) > 0) {
+   ret = pthread_attr_setaffinity_np(attrp,
+   sizeof(thread_attr->cpuset),
+   &thread_attr->cpuset);
+   if (ret != 0) {
+   RTE_LOG(DEBUG, EAL, 
"pthread_attr_setaffinity_np failed\n");
+   goto cleanup;
+   }
+   }
+   }
+
+   ret = pthread_create((pthread_t *)&thread_id->opaque_id, attrp,
+   thread_func, args);
+   if (ret != 0) {
+   RTE_LOG(DEBUG, EAL, "pthread_create failed\n");
+   goto cleanup;
+   }
+
+cleanup:
+   if (attrp != NULL)
+   pthread_attr_destroy(&attr);
+
+   return ret;
+}
+
+int
+rte_thread_join(rte_thread_t thread_id, unsigned long *value_ptr)
+{
+   int ret = 0;
+   void *res = NULL;
+   void **pres = NULL;
+
+   if (value_ptr != NULL)
+   pres = &res;
+
+   ret = pthread_join((pthread_t)thread_id.opaque_id, pres);
+   if (ret != 0) {
+   RTE_LOG(DEBUG, EAL, "pthread_join failed\n");
+   return ret;
+   }
+
+   if (pres != NULL)
+   *value_ptr = *(unsigned long *)(*pres);
+
+   return 0;
+}
+
+i

[dpdk-dev] [PATCH v11 09/10] eal: add EAL argument for setting thread priority

2021-07-30 Thread Narcisa Ana Maria Vasile
From: Narcisa Vasile 

Allow the user to choose the thread priority through an EAL
command line argument.

The user can choose thread priority through an EAL parameter,
when starting an application.  If EAL parameter is not used,
the per-platform default value for thread priority is used.
Otherwise administrator has an option to set one of available options:
 --thread-prio normal
 --thread-prio realtime

 Example:
./dpdk-l2fwd -l 0-3 -n 4 --thread-prio normal -- -q 8 -p 

Signed-off-by: Narcisa Vasile 
---
 lib/eal/common/eal_common_options.c | 28 +++-
 lib/eal/common/eal_internal_cfg.h   |  2 ++
 lib/eal/common/eal_options.h|  2 ++
 3 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/lib/eal/common/eal_common_options.c 
b/lib/eal/common/eal_common_options.c
index ff5861b5f3..9d29696b84 100644
--- a/lib/eal/common/eal_common_options.c
+++ b/lib/eal/common/eal_common_options.c
@@ -107,6 +107,7 @@ eal_long_options[] = {
{OPT_TELEMETRY, 0, NULL, OPT_TELEMETRY_NUM},
{OPT_NO_TELEMETRY,  0, NULL, OPT_NO_TELEMETRY_NUM },
{OPT_FORCE_MAX_SIMD_BITWIDTH, 1, NULL, OPT_FORCE_MAX_SIMD_BITWIDTH_NUM},
+   {OPT_THREAD_PRIORITY,   1, NULL, OPT_THREAD_PRIORITY_NUM},
 
/* legacy options that will be removed in future */
{OPT_PCI_BLACKLIST, 1, NULL, OPT_PCI_BLACKLIST_NUM},
@@ -1412,6 +1413,24 @@ eal_parse_simd_bitwidth(const char *arg)
return 0;
 }
 
+static int
+eal_parse_thread_priority(const char *arg)
+{
+   struct internal_config *internal_conf =
+   eal_get_internal_configuration();
+   enum rte_thread_priority priority;
+
+   if (!strncmp("normal", arg, sizeof("normal")))
+   priority = RTE_THREAD_PRIORITY_NORMAL;
+   else if (!strncmp("realtime", arg, sizeof("realtime")))
+   priority = RTE_THREAD_PRIORITY_REALTIME_CRITICAL;
+   else
+   return -1;
+
+   internal_conf->thread_priority = priority;
+   return 0;
+}
+
 static int
 eal_parse_base_virtaddr(const char *arg)
 {
@@ -1825,7 +1844,13 @@ eal_parse_common_option(int opt, const char *optarg,
return -1;
}
break;
-
+   case OPT_THREAD_PRIORITY_NUM:
+   if (eal_parse_thread_priority(optarg) < 0) {
+   RTE_LOG(ERR, EAL, "invalid parameter for --"
+   OPT_THREAD_PRIORITY "\n");
+   return -1;
+   }
+   break;
/* don't know what to do, leave this to caller */
default:
return 1;
@@ -2088,6 +2113,7 @@ eal_common_usage(void)
   "  (can be used multiple times)\n"
   "  --"OPT_VMWARE_TSC_MAP"Use VMware TSC map instead of 
native RDTSC\n"
   "  --"OPT_PROC_TYPE" Type of this process 
(primary|secondary|auto)\n"
+  "  --"OPT_THREAD_PRIORITY"   Set threads priority 
(normal|realtime)\n"
 #ifndef RTE_EXEC_ENV_WINDOWS
   "  --"OPT_SYSLOG"Set syslog facility\n"
 #endif
diff --git a/lib/eal/common/eal_internal_cfg.h 
b/lib/eal/common/eal_internal_cfg.h
index d6c0470eb8..b2996cd65b 100644
--- a/lib/eal/common/eal_internal_cfg.h
+++ b/lib/eal/common/eal_internal_cfg.h
@@ -94,6 +94,8 @@ struct internal_config {
unsigned int no_telemetry; /**< true to disable Telemetry */
struct simd_bitwidth max_simd_bitwidth;
/**< max simd bitwidth path to use */
+   enum rte_thread_priority thread_priority;
+   /**< thread priority to configure */
 };
 
 void eal_reset_internal_config(struct internal_config *internal_cfg);
diff --git a/lib/eal/common/eal_options.h b/lib/eal/common/eal_options.h
index 7b348e707f..9f5b209f64 100644
--- a/lib/eal/common/eal_options.h
+++ b/lib/eal/common/eal_options.h
@@ -93,6 +93,8 @@ enum {
OPT_NO_TELEMETRY_NUM,
 #define OPT_FORCE_MAX_SIMD_BITWIDTH  "force-max-simd-bitwidth"
OPT_FORCE_MAX_SIMD_BITWIDTH_NUM,
+#define OPT_THREAD_PRIORITY  "thread-prio"
+   OPT_THREAD_PRIORITY_NUM,
 
/* legacy option that will be removed in future */
 #define OPT_PCI_BLACKLIST "pci-blacklist"
-- 
2.31.0.vfs.0.1



[dpdk-dev] [PATCH v11 10/10] Add unit tests for thread API

2021-07-30 Thread Narcisa Ana Maria Vasile
From: Narcisa Vasile 

As a new API for threading is introduced,
a set of unit tests have been added to test the new interface.

Signed-off-by: Narcisa Vasile 
---
 app/test/meson.build|   2 +
 app/test/test_threads.c | 419 
 2 files changed, 421 insertions(+)
 create mode 100644 app/test/test_threads.c

diff --git a/app/test/meson.build b/app/test/meson.build
index a7611686ad..6fe8b02459 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -140,6 +140,7 @@ test_sources = files(
 'test_table_tables.c',
 'test_tailq.c',
 'test_thash.c',
+   'test_threads.c',
 'test_timer.c',
 'test_timer_perf.c',
 'test_timer_racecond.c',
@@ -276,6 +277,7 @@ fast_tests = [
 ['reorder_autotest', true],
 ['service_autotest', true],
 ['thash_autotest', true],
+   ['threads_autotest, true'],
 ['trace_autotest', true],
 ]
 
diff --git a/app/test/test_threads.c b/app/test/test_threads.c
new file mode 100644
index 00..ce614942eb
--- /dev/null
+++ b/app/test/test_threads.c
@@ -0,0 +1,419 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2021 Microsoft.
+ */
+
+#include 
+
+#include 
+
+#include "test.h"
+
+#define THREADS_COUNT 20
+
+#define TEST_THREADS_LOG(func) \
+   printf("Error at line %d. %s failed!\n", __LINE__, func)
+
+static void *
+thread_loop_self(void *arg)
+{
+   rte_thread_t *id = arg;
+
+   *id = rte_thread_self();
+
+   return NULL;
+}
+
+static int
+test_thread_self(void)
+{
+   rte_thread_t threads_ids[THREADS_COUNT];
+   rte_thread_t self_ids[THREADS_COUNT] = {0};
+   size_t i;
+   size_t j;
+   int ret = 0;
+
+   for (i = 0; i < THREADS_COUNT; ++i) {
+   if (rte_thread_create(&threads_ids[i], NULL, thread_loop_self,
+   &self_ids[i]) != 0) {
+   printf("Error, Only %zu threads created.\n", i);
+   break;
+   }
+   }
+
+   for (j = 0; j < i; ++j) {
+   ret = rte_thread_join(threads_ids[j], NULL);
+   if (ret != 0) {
+   TEST_THREADS_LOG("rte_thread_join()");
+   return -1;
+   }
+
+   if (rte_thread_equal(threads_ids[j], self_ids[j]) == 0)
+   ret = -1;
+   }
+
+   return ret;
+}
+
+struct thread_context {
+   rte_thread_barrier *barrier;
+   size_t *thread_count;
+};
+
+static void *
+thread_loop_barrier(void *arg)
+{
+
+   struct thread_context *ctx = arg;
+
+   (void)__atomic_add_fetch(ctx->thread_count, 1, __ATOMIC_RELAXED);
+
+   if (rte_thread_barrier_wait(ctx->barrier) > 0)
+   TEST_THREADS_LOG("rte_thread_barrier_wait()");
+
+   return NULL;
+}
+
+static int
+test_thread_barrier(void)
+{
+   rte_thread_t threads_ids[THREADS_COUNT];
+   struct thread_context ctx[THREADS_COUNT] = {0};
+   rte_thread_barrier barrier;
+   size_t count = 0;
+   size_t i;
+   size_t j;
+   int ret = 0;
+
+   ret = rte_thread_barrier_init(&barrier, THREADS_COUNT + 1);
+   if (ret != 0) {
+   TEST_THREADS_LOG("rte_thread_barrier_init()");
+   return -1;
+   }
+
+   for (i = 0; i < THREADS_COUNT; ++i) {
+   ctx[i].thread_count = &count;
+   ctx[i].barrier = &barrier;
+   if (rte_thread_create(&threads_ids[i], NULL,
+   thread_loop_barrier, &ctx[i]) != 0) {
+   printf("Error, Only %zu threads created.\n", i);
+   ret = -1;
+   goto error;
+   }
+   }
+
+   ret = rte_thread_barrier_wait(ctx->barrier);
+   if (ret > 0) {
+   TEST_THREADS_LOG("rte_thread_barrier_wait()");
+   ret = -1;
+   goto error;
+   }
+
+   if (count != i) {
+   ret = -1;
+   printf("Error, expected thread count(%zu) to be equal "
+   "to the number of threads that wait at the 
barrier(%zu)\n",
+   count, i);
+   goto error;
+   }
+
+error:
+   for (j = 0; j < i; ++j) {
+   ret = rte_thread_join(threads_ids[j], NULL);
+   if (ret != 0) {
+   TEST_THREADS_LOG("rte_thread_join()");
+   ret = -1;
+   break;
+   }
+   }
+
+   ret = rte_thread_barrier_destroy(&barrier);
+   if (ret != 0) {
+   TEST_THREADS_LOG("rte_thread_barrier_destroy()");
+   ret = -1;
+   }
+
+   return ret;
+}
+
+static size_t val;
+
+static void *
+thread_loop_mutex(void *arg)
+{
+   rte_thread_mutex *mutex = arg;
+
+   rte_thread_mutex_lock(mutex);
+   val++;
+   rte_thread_mutex_unlock(mutex);
+
+   return NULL;
+}
+
+static int
+test_threa

[dpdk-dev] [PATCH] net/bnxt: fix seg fault on Thor

2021-07-30 Thread Ajit Khaparde
In a few cases with Thor device, PMD can segfault when VF
representors are specified. Temporarily fix it by preventing
VF reps for Thor device. This will be addressed in next release.

Fixes: 3fe124d2536c ("net/bnxt: support Thor platform")
Cc: sta...@dpdk.org

Signed-off-by: Ajit Khaparde 
---
 drivers/net/bnxt/tf_ulp/bnxt_ulp.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/bnxt/tf_ulp/bnxt_ulp.c 
b/drivers/net/bnxt/tf_ulp/bnxt_ulp.c
index fa2148ead7..dbf85e4eda 100644
--- a/drivers/net/bnxt/tf_ulp/bnxt_ulp.c
+++ b/drivers/net/bnxt/tf_ulp/bnxt_ulp.c
@@ -815,6 +815,11 @@ ulp_ctx_init(struct bnxt *bp,
goto error_deinit;
}
 
+   if (devid == BNXT_ULP_DEVICE_ID_THOR) {
+   ulp_data->ulp_flags &= ~BNXT_ULP_VF_REP_ENABLED;
+   BNXT_TF_DBG(ERR, "Enabled non-VFR mode\n");
+   }
+
/*
 * Shared session must be created before first regular session but after
 * the ulp_ctx is valid.
-- 
2.21.1 (Apple Git-122.3)



Re: [dpdk-dev] [PATCH] net/bnxt: fix seg fault on Thor

2021-07-30 Thread Ajit Khaparde
On Fri, Jul 30, 2021 at 2:15 PM Ajit Khaparde
 wrote:
>
> In a few cases with Thor device, PMD can segfault when VF
> representors are specified. Temporarily fix it by preventing
> VF reps for Thor device. This will be addressed in next release.
>
> Fixes: 3fe124d2536c ("net/bnxt: support Thor platform")
> Cc: sta...@dpdk.org
>
> Signed-off-by: Ajit Khaparde 

Updated the commit headline to
"net/bnxt: disable VF representors on Thor"
Merged to dpdk-next-net-brcm.

> ---
>  drivers/net/bnxt/tf_ulp/bnxt_ulp.c | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/drivers/net/bnxt/tf_ulp/bnxt_ulp.c 
> b/drivers/net/bnxt/tf_ulp/bnxt_ulp.c
> index fa2148ead7..dbf85e4eda 100644
> --- a/drivers/net/bnxt/tf_ulp/bnxt_ulp.c
> +++ b/drivers/net/bnxt/tf_ulp/bnxt_ulp.c
> @@ -815,6 +815,11 @@ ulp_ctx_init(struct bnxt *bp,
> goto error_deinit;
> }
>
> +   if (devid == BNXT_ULP_DEVICE_ID_THOR) {
> +   ulp_data->ulp_flags &= ~BNXT_ULP_VF_REP_ENABLED;
> +   BNXT_TF_DBG(ERR, "Enabled non-VFR mode\n");
> +   }
> +
> /*
>  * Shared session must be created before first regular session but 
> after
>  * the ulp_ctx is valid.
> --
> 2.21.1 (Apple Git-122.3)
>