RE: [RFC] ring: improve ring performance with C11 atomics

2023-04-22 Thread Morten Brørup
> From: Wathsala Vithanage [mailto:wathsala.vithan...@arm.com]
> Sent: Friday, 21 April 2023 21.17
> 
> Tail load in __rte_ring_move_cons_head and __rte_ring_move_prod_head
> can be changed to __ATOMIC_RELAXED from __ATOMIC_ACQUIRE.
> Because to calculate the addresses of the dequeue
> elements __rte_ring_dequeue_elems uses the old_head updated by the
> __atomic_compare_exchange_n intrinsic used in
> __rte_ring_move_prod_head. This results in an address dependency
> between the two operations. Therefore __rte_ring_dequeue_elems cannot
> happen before  __rte_ring_move_prod_head.
> Similarly __rte_ring_enqueue_elems and __rte_ring_move_cons_head
> won't be reordered either.

These preconditions should be added as comments in the source code.

> 
> Performance on Arm N1
> Gain relative to generic implementation
> +---+
> | Bulk enq/dequeue count on size 8 (Arm N1) |
> +---+
> | Generic | C11 atomics  | C11 atomics improved |
> +---+
> | Total count: 766730 | Total count: 651686  | Total count: 812125  |
> | |Gain: -15%|Gain: 6%  |
> +---+
> +---+
> | Bulk enq/dequeue count on size 32 (Arm N1)|
> +---+
> | Generic | C11 atomics  | C11 atomics improved |
> +---+
> | Total count: 816745 | Total count: 646385  | Total count: 830935  |
> | |Gain: -21%|Gain: 2%  |
> +---+

Big performance gain compared to pre-improved C11 atomics! Excellent.

> 
> Performance on x86-64 Cascade Lake
> Gain relative to generic implementation
> +---+
> | Bulk enq/dequeue count on size 8  |
> +---+
> | Generic | C11 atomics  | C11 atomics improved |
> +---+
> | Total count: 181640 | Total count: 181995  | Total count: 182791  |
> | |Gain: 0.2%|Gain: 0.6%
> +---+
> +---+
> | Bulk enq/dequeue count on size 32 |
> +---+
> | Generic | C11 atomics  | C11 atomics improved |
> +---+
> | Total count: 167495 | Total count: 161536  | Total count: 163190  |
> | |Gain: -3.5%   |Gain: -2.6%   |
> +---+

I noticed that the larger size (32 objects) had a larger relative drop in 
performance than the smaller size (8 objects), so I am wondering what the 
performance numbers are for size 512, the default RTE_MEMPOOL_CACHE_MAX_SIZE? 
It's probably not going to change anything regarding the patch acceptance, but 
I'm curious about the numbers.

> 
> Signed-off-by: Wathsala Vithanage 
> Reviewed-by: Honnappa Nagarahalli 
> Reviewed-by: Feifei Wang 
> ---
>  .mailmap|  1 +
>  lib/ring/rte_ring_c11_pvt.h | 18 +-
>  2 files changed, 10 insertions(+), 9 deletions(-)
> 
> diff --git a/.mailmap b/.mailmap
> index 4018f0fc47..367115d134 100644
> --- a/.mailmap
> +++ b/.mailmap
> @@ -1430,6 +1430,7 @@ Walter Heymans 
>  Wang Sheng-Hui 
>  Wangyu (Eric) 
>  Waterman Cao 
> +Wathsala Vithanage 
>  Weichun Chen 
>  Wei Dai 
>  Weifeng Li 
> diff --git a/lib/ring/rte_ring_c11_pvt.h b/lib/ring/rte_ring_c11_pvt.h
> index f895950df4..1895f2bb0e 100644
> --- a/lib/ring/rte_ring_c11_pvt.h
> +++ b/lib/ring/rte_ring_c11_pvt.h
> @@ -24,6 +24,13 @@ __rte_ring_update_tail(struct rte_ring_headtail *ht,
> uint32_t old_val,
>   if (!single)
>   rte_wait_until_equal_32(&ht->tail, old_val, __ATOMIC_RELAXED);
> 
> + /*
> +  * Updating of ht->tail cannot happen before elements are added to or
> +  * removed from the ring, as it could result in data races between
> +  * producer and consumer threads. Therefore ht->tail should be updated
> +  * with release semantics to prevent ring data copy phase from sinking
> +  * below it.
> +  */

I think this comment should clarified as:

Updating of ht->tail SHOULD NOT happen before elements are added to or
remove

[RFC PATCH 0/1] Introduce Event ML Adapter

2023-04-22 Thread Srikanth Yalavarthi
Machine learning event adapter library
==

DPDK Eventdev library provides event driven programming model with features to 
schedule
events. ML Device library provides an interface to ML poll mode drivers that 
support
Machine Learning inference operations. Event ML Adapter is intended to bridge 
between
the event device and the ML device.

Packet flow from ML device to the event device can be accomplished using 
software and hardware
based transfer mechanisms. The adapter queries an eventdev PMD to determine 
which mechanism to
be used. The adapter uses an EAL service core function for software based 
packet transfer and
uses the eventdev PMD functions to configure hardware based packet transfer 
between ML device
and the event device.

The application can choose to submit a ML operation directly to an ML device or 
send it to the ML
adapter via eventdev based on RTE_EVENT_ML_ADAPTER_CAP_INTERNAL_PORT_OP_FWD 
capability. The first
mode is known as the event new (RTE_EVENT_ML_ADAPTER_OP_NEW) mode and the 
second as the event
forward (RTE_EVENT_ML_ADAPTER_OP_FORWARD) mode. The choice of mode can be 
specified while
creating the adapter. In the former mode, it is an application responsibility 
to enable ingress
packet ordering. In the latter mode, it is the adapter responsibility to enable 
the ingress
packet ordering.


Working model of RTE_EVENT_ML_ADAPTER_OP_NEW mode:

   +--+ +--+
   |  | |   ML stage   |
   | Application  |---[2]-->| + enqueue to |
   |  | | mldev|
   +--+ +--+
   ^   ^   |
   |   |  [3]
  [6] [1]  |
   |   |   |
   +--+|
   |  ||
   | Event device ||
   |  ||
   +--+|
  ^|
  ||
 [5]   |
  |v
   +--+ +--+
   |  | |  |
   |  ML adapter  |<--[4]---|mldev |
   |  | |  |
   +--+ +--+


[1] Application dequeues events from the previous stage.
[2] Application prepares the ML operations.
[3] ML operations are submitted to mldev by application.
[4] ML adapter dequeues ML completions from mldev.
[5] ML adapter enqueues events to the eventdev.
[6] Application dequeues from eventdev for further processing.

In the RTE_EVENT_ML_ADAPTER_OP_NEW mode, application submits ML operations 
directly to ML device.
The ML adapter then dequeues ML completions from ML device and enqueue events 
to the event
device. This mode does not ensure ingress ordering, if the application directly 
enqueues to mldev
without going through ML / atomic stage i.e. removing item [1] and [2].

Events dequeued from the adapter will be treated as new events. In this mode, 
application needs
to specify event information (response information) which is needed to enqueue 
an event after the
ML operation is completed.


Working model of RTE_EVENT_ML_ADAPTER_OP_FORWARD mode:

   +--+ +--+
   --[1]-->|  |---[2]-->|  Application |
   | Event device | |  in  |
   <--[8]--|  |<--[3]---| Ordered stage|
   +--+ +--+
   ^  |
   | [4]
  [7] |
   |  v
  ++   +--+
  ||--[5]->|  |
  |   ML adapter   |   | mldev|
  ||<-[6]--|  |
  ++   +--+


[1] Events from the previous stage.
[2] Application in ordered stage dequeues events from eventdev.
[3] Application enqueues ML operations as events to eventdev.
[4] ML adapter dequeues event from eventdev.
[5] ML adapter submits ML operations to mldev (Atomic stage).
[6] ML adapter dequeues ML completions from mldev
[7] ML adapter enqueues events to the eventdev
[8] Events to the next stage

In the event forward (RTE_EVENT_ML_ADAPTER_OP_FORWARD) mode, if the HW supports 
the capability
RTE_EVENT_ML_ADAPTER_CAP_INTERNAL_PORT_OP_FWD, application can directly submit 
the ML operations
to the mldev. 

[RFC PATCH 1/1] eventdev: introduce ML event adapter library

2023-04-22 Thread Srikanth Yalavarthi
Introduce event ML adapter APIs. This patch provides information
on adapter modes and usage. Application can use this event adapter
interface to transfer packets between ML device and event device.

Signed-off-by: Srikanth Yalavarthi 
---
 MAINTAINERS   |6 +
 config/rte_config.h   |1 +
 doc/api/doxy-api-index.md |1 +
 doc/guides/prog_guide/event_ml_adapter.rst|  268 
 doc/guides/prog_guide/eventdev.rst|8 +-
 .../img/event_ml_adapter_op_forward.svg   | 1086 +
 .../img/event_ml_adapter_op_new.svg   | 1079 
 doc/guides/prog_guide/index.rst   |1 +
 lib/eventdev/meson.build  |4 +-
 lib/eventdev/rte_event_ml_adapter.c   |6 +
 lib/eventdev/rte_event_ml_adapter.h   |  594 +
 lib/eventdev/rte_eventdev.h   |   45 +
 lib/meson.build   |2 +-
 lib/mldev/rte_mldev.h |6 +
 14 files changed, 3102 insertions(+), 5 deletions(-)
 create mode 100644 doc/guides/prog_guide/event_ml_adapter.rst
 create mode 100644 doc/guides/prog_guide/img/event_ml_adapter_op_forward.svg
 create mode 100644 doc/guides/prog_guide/img/event_ml_adapter_op_new.svg
 create mode 100644 lib/eventdev/rte_event_ml_adapter.c
 create mode 100644 lib/eventdev/rte_event_ml_adapter.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 8df23e5099..2b47d26561 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -544,6 +544,12 @@ F: drivers/raw/skeleton/
 F: app/test/test_rawdev.c
 F: doc/guides/prog_guide/rawdev.rst
 
+Eventdev ML Adapter API
+M: Srikanth Yalavarthi 
+T: git://dpdk.org/next/dpdk-next-eventdev
+F: lib/eventdev/*ml_adapter*
+F: doc/guides/prog_guide/event_ml_adapter.rst
+
 
 Memory Pool Drivers
 ---
diff --git a/config/rte_config.h b/config/rte_config.h
index 7b8c85e948..0c3911658f 100644
--- a/config/rte_config.h
+++ b/config/rte_config.h
@@ -78,6 +78,7 @@
 #define RTE_EVENT_ETH_INTR_RING_SIZE 1024
 #define RTE_EVENT_CRYPTO_ADAPTER_MAX_INSTANCE 32
 #define RTE_EVENT_ETH_TX_ADAPTER_MAX_INSTANCE 32
+#define RTE_EVENT_ML_ADAPTER_MAX_INSTANCE 32
 
 /* rawdev defines */
 #define RTE_RAWDEV_MAX_DEVS 64
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index c709fd48ad..e34a945b30 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -29,6 +29,7 @@ The public API headers are grouped by topics:
   [event_eth_tx_adapter](@ref rte_event_eth_tx_adapter.h),
   [event_timer_adapter](@ref rte_event_timer_adapter.h),
   [event_crypto_adapter](@ref rte_event_crypto_adapter.h),
+  [event_ml_adapter](@ref rte_event_ml_adapter.h),
   [rawdev](@ref rte_rawdev.h),
   [metrics](@ref rte_metrics.h),
   [bitrate](@ref rte_bitrate.h),
diff --git a/doc/guides/prog_guide/event_ml_adapter.rst 
b/doc/guides/prog_guide/event_ml_adapter.rst
new file mode 100644
index 00..d0b8f9c1b6
--- /dev/null
+++ b/doc/guides/prog_guide/event_ml_adapter.rst
@@ -0,0 +1,268 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+Copyright (c) 2023 Marvell.
+
+Event ML Adapter Library
+
+
+DPDK :doc:`Eventdev library ` provides event driven programming 
model with features
+to schedule events. :doc:`ML Device library ` provides an interface to 
ML poll mode
+drivers that support Machine Learning inference operations. Event ML Adapter 
is intended to
+bridge between the event device and the ML device.
+
+Packet flow from ML device to the event device can be accomplished using 
software and hardware
+based transfer mechanisms. The adapter queries an eventdev PMD to determine 
which mechanism to
+be used. The adapter uses an EAL service core function for software based 
packet transfer and
+uses the eventdev PMD functions to configure hardware based packet transfer 
between ML device
+and the event device. ML adapter uses a new event type called 
``RTE_EVENT_TYPE_MLDEV`` to
+indicate the source of event.
+
+Application can choose to submit an ML operation directly to an ML device or 
send it to an ML
+adapter via eventdev based on RTE_EVENT_ML_ADAPTER_CAP_INTERNAL_PORT_OP_FWD 
capability. The
+first mode is known as the event new (RTE_EVENT_ML_ADAPTER_OP_NEW) mode and 
the second as the
+event forward (RTE_EVENT_ML_ADAPTER_OP_FORWARD) mode. Choice of mode can be 
specified while
+creating the adapter. In the former mode, it is the application's 
responsibility to enable
+ingress packet ordering. In the latter mode, it is the adapter's 
responsibility to enable
+ingress packet ordering.
+
+
+Adapter Modes
+-
+
+RTE_EVENT_ML_ADAPTER_OP_NEW mode
+
+
+In the RTE_EVENT_ML_ADAPTER_OP_NEW mode, application submits ML operations 
directly to an ML
+device. The adapter then dequeues ML completions from the ML device and 
enqueues them as events
+to the event device. This mode does not ensure ingress o

Re: [EXT] [PATCH] crypto/uadk: set queue pair in dev_configure

2023-04-22 Thread Zhangfei Gao
Hi, Akhil

On Thu, 20 Apr 2023 at 15:20, Akhil Goyal  wrote:
>
> > By default, uadk only alloc two queues for each algorithm, which
> > will impact performance.
> > Set queue pair number as required in dev_configure.
> > The default max queue pair number is 8, which can be modified
> > via para: max_nb_queue_pairs
> >
> Please add documentation for the newly added devarg in uadk.rst.

Will add

+
+Initialization
+--
+
+To use the PMD in an application, user must:
+
+* Call rte_vdev_init("crypto_uadk") within the application.
+
+* Use --vdev="crypto_uadk" in the EAL options, which will call
rte_vdev_init() internally.
+
+The following parameters (all optional) can be provided in the
previous two calls:
+
+* max_nb_queue_pairs: Specify the maximum number of queue pairs in
the device (8 by default).
+The max value of max_nb_queue_pairs can be queried from the device
property available_instances.
+Property available_instances value may differ from the devices and platforms.
+Allocating queue pairs bigger than available_instances will fail.
+
+Example:
+
+.. code-block:: console
+
+ cat /sys/class/uacce/hisi_sec2-2/available_instances
+ 256
+
+ sudo dpdk-test-crypto-perf -l 0-10 --vdev crypto_uadk,max_nb_queue_pairs=10 \
+ -- --devtype crypto_uadk --optype cipher-only --buffer-sz 8192

>
> > Example:
> > sudo dpdk-test-crypto-perf -l 0-10 --vdev crypto_uadk,max_nb_queue_pairs=10
> >   -- --devtype crypto_uadk --optype cipher-only --buffer-sz 8192
> >
> > lcore idBuf Size  Burst Size  Gbps  Cycles/Buf
> >
> >38192  327.5226  871.19
> >78192  327.5225  871.20
> >18192  327.5225  871.20
> >48192  327.5224  871.21
> >58192  327.5224  871.21
> >   108192  327.5223  871.22
> >98192  327.5223  871.23
> >28192  327.5222  871.23
> >88192  327.5222  871.23
> >68192  327.5218  871.28
> >
>
> No need to mention the above test result in patch description.
ok, thanks

>
> > Signed-off-by: Zhangfei Gao 
> > ---
> >  drivers/crypto/uadk/uadk_crypto_pmd.c | 19 +--
> >  drivers/crypto/uadk/uadk_crypto_pmd_private.h |  1 +
> >  2 files changed, 18 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/crypto/uadk/uadk_crypto_pmd.c
> > b/drivers/crypto/uadk/uadk_crypto_pmd.c
> > index 4f729e0f07..34aae99342 100644
> > --- a/drivers/crypto/uadk/uadk_crypto_pmd.c
> > +++ b/drivers/crypto/uadk/uadk_crypto_pmd.c
> > @@ -357,8 +357,15 @@ static const struct rte_cryptodev_capabilities
> > uadk_crypto_v2_capabilities[] = {
> >  /* Configure device */
> >  static int
> >  uadk_crypto_pmd_config(struct rte_cryptodev *dev __rte_unused,
> > -struct rte_cryptodev_config *config __rte_unused)
> > +struct rte_cryptodev_config *config)
> >  {
> > + char env[128];
> > +
> > + /* set queue pairs num via env */
> > + sprintf(env, "sync:%d@0", config->nb_queue_pairs);
> > + setenv("WD_CIPHER_CTX_NUM", env, 1);
> > + setenv("WD_DIGEST_CTX_NUM", env, 1);
> > +
>
> Who is the intended user of this environment variable?
wd_cipher_env_init and wd_digest_env_init in set_session_xxx_parameters
will fetch the env and allocate queue pairs accordingly.


>
> >   return 0;
> >  }
> >
> > @@ -434,7 +441,7 @@ uadk_crypto_pmd_info_get(struct rte_cryptodev *dev,
> >   if (dev_info != NULL) {
> >   dev_info->driver_id = dev->driver_id;
> >   dev_info->driver_name = dev->device->driver->name;
> > - dev_info->max_nb_queue_pairs = 128;
> > + dev_info->max_nb_queue_pairs = priv->max_nb_qpairs;
> >   /* No limit of number of sessions */
> >   dev_info->sym.max_nb_sessions = 0;
> >   dev_info->feature_flags = dev->feature_flags;
> > @@ -1015,6 +1022,7 @@ uadk_cryptodev_probe(struct rte_vdev_device *vdev)
> >   struct uadk_crypto_priv *priv;
> >   struct rte_cryptodev *dev;
> >   struct uacce_dev *udev;
> > + const char *input_args;
> >   const char *name;
> >
> >   udev = wd_get_accel_dev("cipher");
> > @@ -1030,6 +1038,9 @@ uadk_cryptodev_probe(struct rte_vdev_device *vdev)
> >   if (name == NULL)
> >   return -EINVAL;
> >
> > + input_args = rte_vdev_device_args(vdev);
> > + rte_cryptodev_pmd_parse_input_args(&init_params, input_args);
> > +
> >   dev = rte_cryptodev_pmd_create(name, &vdev->device, &init_params);
> >   if (dev == NULL) {
> >   UADK_LOG(ERR, "driver %s: create failed", init_params.name);
> > @@ -1044,6 +1055,7 @@ uadk_cryptodev_probe(struct rte_vdev_device *vdev)
> > 

[PATCH v2] app/mldev: add internal function for file read

2023-04-22 Thread Srikanth Yalavarthi
Added internal function to read model, input and reference
files with required error checks. This change fixes the
unchecked return value and improper use of negative value
issues reported by coverity scan for file read operations.

Coverity issue: 383742, 383743
Fixes: f6661e6d9a3a ("app/mldev: validate model operations")
Fixes: da6793390596 ("app/mldev: support inference validation")

Signed-off-by: Srikanth Yalavarthi 
---
 app/test-mldev/test_common.c   | 59 ++
 app/test-mldev/test_common.h   |  2 +
 app/test-mldev/test_inference_common.c | 54 +--
 app/test-mldev/test_model_common.c | 33 +++---
 4 files changed, 87 insertions(+), 61 deletions(-)

diff --git a/app/test-mldev/test_common.c b/app/test-mldev/test_common.c
index 016b31c6ba..be67ea487c 100644
--- a/app/test-mldev/test_common.c
+++ b/app/test-mldev/test_common.c
@@ -5,12 +5,71 @@
 #include 
 
 #include 
+#include 
 #include 
 #include 
 
 #include "ml_common.h"
 #include "test_common.h"
 
+int
+ml_read_file(char *file, size_t *size, char **buffer)
+{
+   char *file_buffer = NULL;
+   long file_size = 0;
+   int ret = 0;
+   FILE *fp;
+
+   fp = fopen(file, "r");
+   if (fp == NULL) {
+   ml_err("Failed to open file: %s\n", file);
+   return -EIO;
+   }
+
+   if (fseek(fp, 0, SEEK_END) == 0) {
+   file_size = ftell(fp);
+   if (file_size == -1) {
+   ret = -EIO;
+   goto error;
+   }
+
+   file_buffer = malloc(file_size);
+   if (file_buffer == NULL) {
+   ml_err("Failed to allocate memory: %s\n", file);
+   ret = -ENOMEM;
+   goto error;
+   }
+
+   if (fseek(fp, 0, SEEK_SET) != 0) {
+   ret = -EIO;
+   goto error;
+   }
+
+   if (fread(file_buffer, sizeof(char), file_size, fp) != 
(unsigned long)file_size) {
+   ml_err("Failed to read file : %s\n", file);
+   ret = -EIO;
+   goto error;
+   }
+   fclose(fp);
+   } else {
+   ret = -EIO;
+   goto error;
+   }
+
+   *buffer = file_buffer;
+   *size = file_size;
+
+   return 0;
+
+error:
+   rte_free(file_buffer);
+
+   if (fp != NULL)
+   fclose(fp);
+
+   return ret;
+}
+
 bool
 ml_test_cap_check(struct ml_options *opt)
 {
diff --git a/app/test-mldev/test_common.h b/app/test-mldev/test_common.h
index a7b2ea652a..7e3634b0c6 100644
--- a/app/test-mldev/test_common.h
+++ b/app/test-mldev/test_common.h
@@ -24,4 +24,6 @@ int ml_test_device_close(struct ml_test *test, struct 
ml_options *opt);
 int ml_test_device_start(struct ml_test *test, struct ml_options *opt);
 int ml_test_device_stop(struct ml_test *test, struct ml_options *opt);
 
+int ml_read_file(char *file, size_t *size, char **buffer);
+
 #endif /* TEST_COMMON_H */
diff --git a/app/test-mldev/test_inference_common.c 
b/app/test-mldev/test_inference_common.c
index 29c18bbc85..96213413a2 100644
--- a/app/test-mldev/test_inference_common.c
+++ b/app/test-mldev/test_inference_common.c
@@ -610,10 +610,10 @@ ml_inference_iomem_setup(struct ml_test *test, struct 
ml_options *opt, uint16_t
char mp_name[RTE_MEMPOOL_NAMESIZE];
const struct rte_memzone *mz;
uint64_t nb_buffers;
+   char *buffer = NULL;
uint32_t buff_size;
uint32_t mz_size;
-   uint32_t fsize;
-   FILE *fp;
+   size_t fsize;
int ret;
 
/* get input buffer size */
@@ -653,51 +653,35 @@ ml_inference_iomem_setup(struct ml_test *test, struct 
ml_options *opt, uint16_t
t->model[fid].reference = NULL;
 
/* load input file */
-   fp = fopen(opt->filelist[fid].input, "r");
-   if (fp == NULL) {
-   ml_err("Failed to open input file : %s\n", 
opt->filelist[fid].input);
-   ret = -errno;
+   ret = ml_read_file(opt->filelist[fid].input, &fsize, &buffer);
+   if (ret != 0)
goto error;
-   }
 
-   fseek(fp, 0, SEEK_END);
-   fsize = ftell(fp);
-   fseek(fp, 0, SEEK_SET);
-   if (fsize != t->model[fid].inp_dsize) {
-   ml_err("Invalid input file, size = %u (expected size = %" 
PRIu64 ")\n", fsize,
+   if (fsize == t->model[fid].inp_dsize) {
+   rte_memcpy(t->model[fid].input, buffer, fsize);
+   rte_free(buffer);
+   } else {
+   ml_err("Invalid input file, size = %zu (expected size = %" 
PRIu64 ")\n", fsize,
   t->model[fid].inp_dsize);
ret = -EINVAL;
-   fclose(fp);
-   goto error;
-   }
-
-   if (fread(t->model[fid].input, 1, t->model[fid].inp_dsize, fp) != 
t->model[fid].inp_dsize) {

[PATCH v2] app/mldev: fix code formatting and typos

2023-04-22 Thread Srikanth Yalavarthi
Updated ML application source files to have uniform code formatting
style across. Remove extra blank lines. Fix typos in application help.

Fixes: 8cb22a545447 ("app/mldev: fix debug build")
Fixes: da6793390596 ("app/mldev: support inference validation")
Fixes: c0e871657d6a ("app/mldev: support queue pairs and size")

Signed-off-by: Srikanth Yalavarthi 
---
 app/test-mldev/ml_options.c|  4 +--
 app/test-mldev/test_inference_common.c | 36 +-
 2 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/app/test-mldev/ml_options.c b/app/test-mldev/ml_options.c
index 2efcc3532c..e2f3c4dec8 100644
--- a/app/test-mldev/ml_options.c
+++ b/app/test-mldev/ml_options.c
@@ -200,7 +200,7 @@ ml_dump_test_options(const char *testname)
 {
if (strcmp(testname, "device_ops") == 0) {
printf("\t\t--queue_pairs  : number of queue pairs to 
create\n"
-  "\t\t--queue_size   : size fo queue-pair\n");
+  "\t\t--queue_size   : size of queue-pair\n");
printf("\n");
}
 
@@ -215,7 +215,7 @@ ml_dump_test_options(const char *testname)
   "\t\t--repetitions  : number of inference 
repetitions\n"
   "\t\t--burst_size   : inference burst size\n"
   "\t\t--queue_pairs  : number of queue pairs to 
create\n"
-  "\t\t--queue_size   : size fo queue-pair\n"
+  "\t\t--queue_size   : size of queue-pair\n"
   "\t\t--batches  : number of batches of input\n"
   "\t\t--tolerance: maximum tolerance (%%) for 
output validation\n"
   "\t\t--stats: enable reporting performance 
statistics\n");
diff --git a/app/test-mldev/test_inference_common.c 
b/app/test-mldev/test_inference_common.c
index bf7e6bbe10..29c18bbc85 100644
--- a/app/test-mldev/test_inference_common.c
+++ b/app/test-mldev/test_inference_common.c
@@ -20,23 +20,23 @@
 
 #define ML_TEST_READ_TYPE(buffer, type) (*((type *)buffer))
 
-#define ML_TEST_CHECK_OUTPUT(output, reference, tolerance) 
\
+#define ML_TEST_CHECK_OUTPUT(output, reference, tolerance) \
(((float)output - (float)reference) <= (((float)reference * tolerance) 
/ 100.0))
 
-#define ML_OPEN_WRITE_GET_ERR(name, buffer, size, err) 
\
-   do {
   \
-   FILE *fp = fopen(name, "w+");   
   \
-   if (fp == NULL) {   
   \
-   ml_err("Unable to create file: %s, error: %s", name, 
strerror(errno)); \
-   err = true; 
   \
-   } else {
   \
-   if (fwrite(buffer, 1, size, fp) != size) {  
   \
-   ml_err("Error writing output, file: %s, error: 
%s", name,  \
-  strerror(errno));
   \
-   err = true; 
   \
-   }   
   \
-   fclose(fp); 
   \
-   }   
   \
+#define ML_OPEN_WRITE_GET_ERR(name, buffer, size, err) \
+   do { \
+   FILE *fp = fopen(name, "w+"); \
+   if (fp == NULL) { \
+   ml_err("Unable to create file: %s, error: %s", name, 
strerror(errno)); \
+   err = true; \
+   } else { \
+   if (fwrite(buffer, 1, size, fp) != size) { \
+   ml_err("Error writing output, file: %s, error: 
%s", name, \
+  strerror(errno)); \
+   err = true; \
+   } \
+   fclose(fp); \
+   } \
} while (0)
 
 static void
@@ -951,7 +951,7 @@ ml_request_finish(struct rte_mempool *mp, void *opaque, 
void *obj, unsigned int
if (t->cmn.opt->debug) {
/* dump quantized output buffer */
if (asprintf(&dump_path, "%s.q.%u", 
t->cmn.opt->filelist[req->fid].output,
-   obj_idx) == -1)
+obj_idx) == -1)
return;
ML_OPEN_WRITE_GET_ERR(dump_path, req->output, model->out_qsize,

[PATCH v1] ml/cnxk: fix xstat type names in documentation

2023-04-22 Thread Srikanth Yalavarthi
Fix incorrect type names for xstats in ML cnxk driver documentation.

Fixes: 4ff4ab8e1a20 ("ml/cnxk: support extended statistics")

Signed-off-by: Srikanth Yalavarthi 
---
 doc/guides/mldevs/cnxk.rst | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/doc/guides/mldevs/cnxk.rst b/doc/guides/mldevs/cnxk.rst
index 91e5df095a..1aa9225765 100644
--- a/doc/guides/mldevs/cnxk.rst
+++ b/doc/guides/mldevs/cnxk.rst
@@ -231,11 +231,11 @@ Total number of extended stats would be equal to 6 x 
number of models loaded.
+---+-+--+
| 3 | Max-HW-Latency  | Maximum hardware latency |
+---+-+--+
-   | 4 | Avg-HW-Latency  | Average firmware latency |
+   | 4 | Avg-FW-Latency  | Average firmware latency |
+---+-+--+
-   | 5 | Avg-HW-Latency  | Minimum firmware latency |
+   | 5 | Min-FW-Latency  | Minimum firmware latency |
+---+-+--+
-   | 6 | Avg-HW-Latency  | Maximum firmware latency |
+   | 6 | Max-FW-Latency  | Maximum firmware latency |
+---+-+--+
 
 Latency values reported by the PMD through xstats can have units,
-- 
2.17.1



[PATCH v1 0/3] Add support for 32 I/O per model

2023-04-22 Thread Srikanth Yalavarthi
This patch series adds support for 32 inputs / outputs per each
model. Changes required to enable the required support include:

1. Splitiing model metadata fields into structures.
2. Update model metadata to v2301 which supports 32 I/O.
3. Update ML driver code to support metadata v2301 .


Srikanth Yalavarthi (3):
  ml/cnxk: split metadata fields into sections
  ml/cnxk: update model metadata to v2301
  ml/cnxk: add support for 32 I/O per model

 drivers/ml/cnxk/cn10k_ml_model.c | 401 +---
 drivers/ml/cnxk/cn10k_ml_model.h | 512 +--
 drivers/ml/cnxk/cn10k_ml_ops.c   | 133 ++--
 3 files changed, 659 insertions(+), 387 deletions(-)

--
2.17.1



[PATCH v1 1/3] ml/cnxk: split metadata fields into sections

2023-04-22 Thread Srikanth Yalavarthi
Split metadata into header, model sections, weights & bias,
input / output and data sections. This is a preparatory step
to introduce v2301 of model metadata.

Signed-off-by: Srikanth Yalavarthi 
---
 drivers/ml/cnxk/cn10k_ml_model.c |  26 +-
 drivers/ml/cnxk/cn10k_ml_model.h | 487 ---
 2 files changed, 270 insertions(+), 243 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index 2ded05c5dc..c0b7b061f5 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -47,42 +47,42 @@ cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t 
size)
metadata = (struct cn10k_ml_model_metadata *)buffer;
 
/* Header CRC check */
-   if (metadata->metadata_header.header_crc32c != 0) {
-   header_crc32c = rte_hash_crc(
-   buffer, sizeof(metadata->metadata_header) - 
sizeof(uint32_t), 0);
+   if (metadata->header.header_crc32c != 0) {
+   header_crc32c =
+   rte_hash_crc(buffer, sizeof(metadata->header) - 
sizeof(uint32_t), 0);
 
-   if (header_crc32c != metadata->metadata_header.header_crc32c) {
+   if (header_crc32c != metadata->header.header_crc32c) {
plt_err("Invalid model, Header CRC mismatch");
return -EINVAL;
}
}
 
/* Payload CRC check */
-   if (metadata->metadata_header.payload_crc32c != 0) {
-   payload_crc32c = rte_hash_crc(buffer + 
sizeof(metadata->metadata_header),
- size - 
sizeof(metadata->metadata_header), 0);
+   if (metadata->header.payload_crc32c != 0) {
+   payload_crc32c = rte_hash_crc(buffer + sizeof(metadata->header),
+ size - sizeof(metadata->header), 
0);
 
-   if (payload_crc32c != metadata->metadata_header.payload_crc32c) 
{
+   if (payload_crc32c != metadata->header.payload_crc32c) {
plt_err("Invalid model, Payload CRC mismatch");
return -EINVAL;
}
}
 
/* Model magic string */
-   if (strncmp((char *)metadata->metadata_header.magic, 
MRVL_ML_MODEL_MAGIC_STRING, 4) != 0) {
-   plt_err("Invalid model, magic = %s", 
metadata->metadata_header.magic);
+   if (strncmp((char *)metadata->header.magic, MRVL_ML_MODEL_MAGIC_STRING, 
4) != 0) {
+   plt_err("Invalid model, magic = %s", metadata->header.magic);
return -EINVAL;
}
 
/* Target architecture */
-   if (metadata->metadata_header.target_architecture != 
MRVL_ML_MODEL_TARGET_ARCH) {
+   if (metadata->header.target_architecture != MRVL_ML_MODEL_TARGET_ARCH) {
plt_err("Model target architecture (%u) not supported",
-   metadata->metadata_header.target_architecture);
+   metadata->header.target_architecture);
return -ENOTSUP;
}
 
/* Header version */
-   rte_memcpy(version, metadata->metadata_header.version, 4 * 
sizeof(uint8_t));
+   rte_memcpy(version, metadata->header.version, 4 * sizeof(uint8_t));
if (version[0] * 1000 + version[1] * 100 < MRVL_ML_MODEL_VERSION) {
plt_err("Metadata version = %u.%u.%u.%u (< %u.%u.%u.%u) not 
supported", version[0],
version[1], version[2], version[3], 
(MRVL_ML_MODEL_VERSION / 1000) % 10,
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 1bc748265d..b30ad5a981 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -30,298 +30,325 @@ enum cn10k_ml_model_state {
 #define MRVL_ML_OUTPUT_NAME_LEN   16
 #define MRVL_ML_INPUT_OUTPUT_SIZE  8
 
-/* Model file metadata structure */
-struct cn10k_ml_model_metadata {
-   /* Header (256-byte) */
-   struct {
-   /* Magic string ('M', 'R', 'V', 'L') */
-   uint8_t magic[4];
+/* Header (256-byte) */
+struct cn10k_ml_model_metadata_header {
+   /* Magic string ('M', 'R', 'V', 'L') */
+   uint8_t magic[4];
 
-   /* Metadata version */
-   uint8_t version[4];
+   /* Metadata version */
+   uint8_t version[4];
 
-   /* Metadata size */
-   uint32_t metadata_size;
+   /* Metadata size */
+   uint32_t metadata_size;
 
-   /* Unique ID */
-   uint8_t uuid[128];
+   /* Unique ID */
+   uint8_t uuid[128];
 
-   /* Model target architecture
-* 0 = Undefined
-* 1 = M1K
-* 128 = MLIP
-* 256 = Experimental
-*/
-   uint32_t target_architecture;
-   uint8_t reserved[104];
+   /* Model target architecture
+* 0 = Undefined
+* 1 = M1

[PATCH v1 2/3] ml/cnxk: update model metadata to v2301

2023-04-22 Thread Srikanth Yalavarthi
Update model metadata to v2301. Revised metadata introduces
fields to support up to 32 inputs/outputs per model, scratch
relocation and updates to names of existing fields. Update
driver files to include changes in names of metadata fields.

Signed-off-by: Srikanth Yalavarthi 
---
 drivers/ml/cnxk/cn10k_ml_model.c | 111 ---
 drivers/ml/cnxk/cn10k_ml_model.h |  36 +++---
 drivers/ml/cnxk/cn10k_ml_ops.c   |  50 +++---
 3 files changed, 106 insertions(+), 91 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index c0b7b061f5..a15df700aa 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -83,11 +83,11 @@ cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t 
size)
 
/* Header version */
rte_memcpy(version, metadata->header.version, 4 * sizeof(uint8_t));
-   if (version[0] * 1000 + version[1] * 100 < MRVL_ML_MODEL_VERSION) {
+   if (version[0] * 1000 + version[1] * 100 != MRVL_ML_MODEL_VERSION_MIN) {
plt_err("Metadata version = %u.%u.%u.%u (< %u.%u.%u.%u) not 
supported", version[0],
-   version[1], version[2], version[3], 
(MRVL_ML_MODEL_VERSION / 1000) % 10,
-   (MRVL_ML_MODEL_VERSION / 100) % 10, 
(MRVL_ML_MODEL_VERSION / 10) % 10,
-   MRVL_ML_MODEL_VERSION % 10);
+   version[1], version[2], version[3], 
(MRVL_ML_MODEL_VERSION_MIN / 1000) % 10,
+   (MRVL_ML_MODEL_VERSION_MIN / 100) % 10,
+   (MRVL_ML_MODEL_VERSION_MIN / 10) % 10, 
MRVL_ML_MODEL_VERSION_MIN % 10);
return -ENOTSUP;
}
 
@@ -125,36 +125,36 @@ cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t 
size)
}
 
/* Check input count */
-   if (metadata->model.num_input > MRVL_ML_INPUT_OUTPUT_SIZE) {
+   if (metadata->model.num_input > MRVL_ML_NUM_INPUT_OUTPUT_1) {
plt_err("Invalid metadata, num_input  = %u (> %u)", 
metadata->model.num_input,
-   MRVL_ML_INPUT_OUTPUT_SIZE);
+   MRVL_ML_NUM_INPUT_OUTPUT_1);
return -EINVAL;
}
 
/* Check output count */
-   if (metadata->model.num_output > MRVL_ML_INPUT_OUTPUT_SIZE) {
+   if (metadata->model.num_output > MRVL_ML_NUM_INPUT_OUTPUT_1) {
plt_err("Invalid metadata, num_output  = %u (> %u)", 
metadata->model.num_output,
-   MRVL_ML_INPUT_OUTPUT_SIZE);
+   MRVL_ML_NUM_INPUT_OUTPUT_1);
return -EINVAL;
}
 
/* Inputs */
for (i = 0; i < metadata->model.num_input; i++) {
-   if 
(rte_ml_io_type_size_get(cn10k_ml_io_type_map(metadata->input[i].input_type)) <=
+   if 
(rte_ml_io_type_size_get(cn10k_ml_io_type_map(metadata->input1[i].input_type)) 
<=
0) {
plt_err("Invalid metadata, input[%u] : input_type = 
%u", i,
-   metadata->input[i].input_type);
+   metadata->input1[i].input_type);
return -EINVAL;
}
 
if (rte_ml_io_type_size_get(
-   
cn10k_ml_io_type_map(metadata->input[i].model_input_type)) <= 0) {
+   
cn10k_ml_io_type_map(metadata->input1[i].model_input_type)) <= 0) {
plt_err("Invalid metadata, input[%u] : model_input_type 
= %u", i,
-   metadata->input[i].model_input_type);
+   metadata->input1[i].model_input_type);
return -EINVAL;
}
 
-   if (metadata->input[i].relocatable != 1) {
+   if (metadata->input1[i].relocatable != 1) {
plt_err("Model not supported, non-relocatable input: 
%u", i);
return -ENOTSUP;
}
@@ -163,20 +163,20 @@ cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t 
size)
/* Outputs */
for (i = 0; i < metadata->model.num_output; i++) {
if (rte_ml_io_type_size_get(
-   
cn10k_ml_io_type_map(metadata->output[i].output_type)) <= 0) {
+   
cn10k_ml_io_type_map(metadata->output1[i].output_type)) <= 0) {
plt_err("Invalid metadata, output[%u] : output_type = 
%u", i,
-   metadata->output[i].output_type);
+   metadata->output1[i].output_type);
return -EINVAL;
}
 
if (rte_ml_io_type_size_get(
-   
cn10k_ml_io_type_map(metadata->output[i].model_output_type)) <= 0) {
+   
cn10k_ml_io_type_map(metadata->output1[i].model_output_type)) <= 0) {
plt_err("Invalid metadata, output[

[PATCH v1 3/3] ml/cnxk: add support for 32 I/O per model

2023-04-22 Thread Srikanth Yalavarthi
Added support for 32 inputs and outputs per model.

Signed-off-by: Srikanth Yalavarthi 
---
 drivers/ml/cnxk/cn10k_ml_model.c | 374 ++-
 drivers/ml/cnxk/cn10k_ml_model.h |   5 +-
 drivers/ml/cnxk/cn10k_ml_ops.c   | 125 ---
 3 files changed, 367 insertions(+), 137 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index a15df700aa..92c47d39ba 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -41,8 +41,9 @@ cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size)
struct cn10k_ml_model_metadata *metadata;
uint32_t payload_crc32c;
uint32_t header_crc32c;
-   uint8_t version[4];
+   uint32_t version;
uint8_t i;
+   uint8_t j;
 
metadata = (struct cn10k_ml_model_metadata *)buffer;
 
@@ -82,10 +83,13 @@ cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t 
size)
}
 
/* Header version */
-   rte_memcpy(version, metadata->header.version, 4 * sizeof(uint8_t));
-   if (version[0] * 1000 + version[1] * 100 != MRVL_ML_MODEL_VERSION_MIN) {
-   plt_err("Metadata version = %u.%u.%u.%u (< %u.%u.%u.%u) not 
supported", version[0],
-   version[1], version[2], version[3], 
(MRVL_ML_MODEL_VERSION_MIN / 1000) % 10,
+   version = metadata->header.version[0] * 1000 + 
metadata->header.version[1] * 100 +
+ metadata->header.version[2] * 10 + 
metadata->header.version[3];
+   if (version < MRVL_ML_MODEL_VERSION_MIN) {
+   plt_err("Metadata version = %u.%u.%u.%u (< %u.%u.%u.%u) not 
supported",
+   metadata->header.version[0], 
metadata->header.version[1],
+   metadata->header.version[2], 
metadata->header.version[3],
+   (MRVL_ML_MODEL_VERSION_MIN / 1000) % 10,
(MRVL_ML_MODEL_VERSION_MIN / 100) % 10,
(MRVL_ML_MODEL_VERSION_MIN / 10) % 10, 
MRVL_ML_MODEL_VERSION_MIN % 10);
return -ENOTSUP;
@@ -125,60 +129,119 @@ cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t 
size)
}
 
/* Check input count */
-   if (metadata->model.num_input > MRVL_ML_NUM_INPUT_OUTPUT_1) {
-   plt_err("Invalid metadata, num_input  = %u (> %u)", 
metadata->model.num_input,
-   MRVL_ML_NUM_INPUT_OUTPUT_1);
-   return -EINVAL;
-   }
-
-   /* Check output count */
-   if (metadata->model.num_output > MRVL_ML_NUM_INPUT_OUTPUT_1) {
-   plt_err("Invalid metadata, num_output  = %u (> %u)", 
metadata->model.num_output,
-   MRVL_ML_NUM_INPUT_OUTPUT_1);
-   return -EINVAL;
-   }
-
-   /* Inputs */
-   for (i = 0; i < metadata->model.num_input; i++) {
-   if 
(rte_ml_io_type_size_get(cn10k_ml_io_type_map(metadata->input1[i].input_type)) 
<=
-   0) {
-   plt_err("Invalid metadata, input[%u] : input_type = 
%u", i,
-   metadata->input1[i].input_type);
+   if (version < 2301) {
+   if (metadata->model.num_input > MRVL_ML_NUM_INPUT_OUTPUT_1) {
+   plt_err("Invalid metadata, num_input  = %u (> %u)",
+   metadata->model.num_input, 
MRVL_ML_NUM_INPUT_OUTPUT_1);
return -EINVAL;
}
 
-   if (rte_ml_io_type_size_get(
-   
cn10k_ml_io_type_map(metadata->input1[i].model_input_type)) <= 0) {
-   plt_err("Invalid metadata, input[%u] : model_input_type 
= %u", i,
-   metadata->input1[i].model_input_type);
+   /* Check output count */
+   if (metadata->model.num_output > MRVL_ML_NUM_INPUT_OUTPUT_1) {
+   plt_err("Invalid metadata, num_output  = %u (> %u)",
+   metadata->model.num_output, 
MRVL_ML_NUM_INPUT_OUTPUT_1);
return -EINVAL;
}
-
-   if (metadata->input1[i].relocatable != 1) {
-   plt_err("Model not supported, non-relocatable input: 
%u", i);
-   return -ENOTSUP;
+   } else {
+   if (metadata->model.num_input > MRVL_ML_NUM_INPUT_OUTPUT) {
+   plt_err("Invalid metadata, num_input  = %u (> %u)",
+   metadata->model.num_input, 
MRVL_ML_NUM_INPUT_OUTPUT);
+   return -EINVAL;
}
-   }
 
-   /* Outputs */
-   for (i = 0; i < metadata->model.num_output; i++) {
-   if (rte_ml_io_type_size_get(
-   
cn10k_ml_io_type_map(metadata->output1[i].output_type)) <= 0) {
-   plt_err("Invalid metadata, output[%u] : output_type = 
%u", i,
-   metadata->output1[i].output_t

[PATCH v1 0/5] Implementation of revised ML xstats spec

2023-04-22 Thread Srikanth Yalavarthi
This series of patches introduces revised xstats specification for ML
device. The revised xstats spec is based on eventdev xstats and supports
DEVICE and MODEL modes to get xstats. This enables retrieving xstats for
device and each model separately.


Srikanth Yalavarthi (5):
  mldev: remove xstats APIs from library
  mldev: introduce revised xstats
  mldev: implement xstats library functions
  app/mldev: enable reporting xstats
  ml/cnxk: implement xstats driver functions

 app/test-mldev/meson.build |   1 +
 app/test-mldev/ml_common.h |  11 +
 app/test-mldev/ml_options.c|   5 +-
 app/test-mldev/test_common.h   |   3 +
 app/test-mldev/test_inference_common.c | 113 -
 app/test-mldev/test_inference_common.h |   1 -
 app/test-mldev/test_inference_interleave.c |   6 +-
 app/test-mldev/test_inference_ordered.c|   5 +-
 app/test-mldev/test_model_ops.c|   3 +
 app/test-mldev/test_stats.c| 129 +
 app/test-mldev/test_stats.h|  13 +
 doc/guides/mldevs/cnxk.rst |  30 +-
 drivers/ml/cnxk/cn10k_ml_dev.h |  96 +++-
 drivers/ml/cnxk/cn10k_ml_model.h   |  21 -
 drivers/ml/cnxk/cn10k_ml_ops.c | 520 +++--
 lib/mldev/rte_mldev.c  |  15 +-
 lib/mldev/rte_mldev.h  |  97 ++--
 lib/mldev/rte_mldev_core.h |  28 +-
 18 files changed, 757 insertions(+), 340 deletions(-)
 create mode 100644 app/test-mldev/test_stats.c
 create mode 100644 app/test-mldev/test_stats.h

--
2.17.1



[PATCH v1 1/5] mldev: remove xstats APIs from library

2023-04-22 Thread Srikanth Yalavarthi
This change is a preparatoy step for revised xstats APIs.
Revised xstats APIs support reporting device and per model
stats, which is based on eventdev xstats.

Removed xstats APIs from spec and library implementation.
Disabled reporting xstats in test application and disabled
xstats functions in drivers. Renamed stats_get function to
throughput_get.

This change is needed as the revised APIs are not backward
compatible with the current xstats.

Signed-off-by: Srikanth Yalavarthi 
---
 app/test-mldev/test_inference_common.c | 55 +
 app/test-mldev/test_inference_common.h |  2 +-
 app/test-mldev/test_inference_interleave.c |  2 +-
 app/test-mldev/test_inference_ordered.c|  2 +-
 drivers/ml/cnxk/cn10k_ml_ops.c | 10 +--
 lib/mldev/rte_mldev.c  | 88 
 lib/mldev/rte_mldev.h  | 90 -
 lib/mldev/rte_mldev_core.h | 93 --
 lib/mldev/version.map  |  4 -
 9 files changed, 7 insertions(+), 339 deletions(-)

diff --git a/app/test-mldev/test_inference_common.c 
b/app/test-mldev/test_inference_common.c
index af831fc1bf..1e16608582 100644
--- a/app/test-mldev/test_inference_common.c
+++ b/app/test-mldev/test_inference_common.c
@@ -1029,7 +1029,7 @@ ml_inference_launch_cores(struct ml_test *test, struct 
ml_options *opt, uint16_t
 }
 
 int
-ml_inference_stats_get(struct ml_test *test, struct ml_options *opt)
+ml_inference_throughput_get(struct ml_test *test, struct ml_options *opt)
 {
struct test_inference *t = ml_test_priv(test);
uint64_t total_cycles = 0;
@@ -1038,56 +1038,10 @@ ml_inference_stats_get(struct ml_test *test, struct 
ml_options *opt)
uint64_t avg_e2e;
uint32_t qp_id;
uint64_t freq;
-   int ret;
-   int i;
 
if (!opt->stats)
return 0;
 
-   /* get xstats size */
-   t->xstats_size = rte_ml_dev_xstats_names_get(opt->dev_id, NULL, 0);
-   if (t->xstats_size >= 0) {
-   /* allocate for xstats_map and values */
-   t->xstats_map = rte_malloc(
-   "ml_xstats_map", t->xstats_size * sizeof(struct 
rte_ml_dev_xstats_map), 0);
-   if (t->xstats_map == NULL) {
-   ret = -ENOMEM;
-   goto error;
-   }
-
-   t->xstats_values =
-   rte_malloc("ml_xstats_values", t->xstats_size * 
sizeof(uint64_t), 0);
-   if (t->xstats_values == NULL) {
-   ret = -ENOMEM;
-   goto error;
-   }
-
-   ret = rte_ml_dev_xstats_names_get(opt->dev_id, t->xstats_map, 
t->xstats_size);
-   if (ret != t->xstats_size) {
-   printf("Unable to get xstats names, ret = %d\n", ret);
-   ret = -1;
-   goto error;
-   }
-
-   for (i = 0; i < t->xstats_size; i++)
-   rte_ml_dev_xstats_get(opt->dev_id, &t->xstats_map[i].id,
- &t->xstats_values[i], 1);
-   }
-
-   /* print xstats*/
-   printf("\n");
-   print_line(80);
-   printf(" ML Device Extended Statistics\n");
-   print_line(80);
-   for (i = 0; i < t->xstats_size; i++)
-   printf(" %-64s = %" PRIu64 "\n", t->xstats_map[i].name, 
t->xstats_values[i]);
-   print_line(80);
-
-   /* release buffers */
-   rte_free(t->xstats_map);
-
-   rte_free(t->xstats_values);
-
/* print end-to-end stats */
freq = rte_get_tsc_hz();
for (qp_id = 0; qp_id < RTE_MAX_LCORE; qp_id++)
@@ -1121,11 +1075,4 @@ ml_inference_stats_get(struct ml_test *test, struct 
ml_options *opt)
print_line(80);
 
return 0;
-
-error:
-   rte_free(t->xstats_map);
-
-   rte_free(t->xstats_values);
-
-   return ret;
 }
diff --git a/app/test-mldev/test_inference_common.h 
b/app/test-mldev/test_inference_common.h
index e79344cea4..0a9b930788 100644
--- a/app/test-mldev/test_inference_common.h
+++ b/app/test-mldev/test_inference_common.h
@@ -70,6 +70,6 @@ void ml_inference_mem_destroy(struct ml_test *test, struct 
ml_options *opt);
 int ml_inference_result(struct ml_test *test, struct ml_options *opt, uint16_t 
fid);
 int ml_inference_launch_cores(struct ml_test *test, struct ml_options *opt, 
uint16_t start_fid,
  uint16_t end_fid);
-int ml_inference_stats_get(struct ml_test *test, struct ml_options *opt);
+int ml_inference_throughput_get(struct ml_test *test, struct ml_options *opt);
 
 #endif /* TEST_INFERENCE_COMMON_H */
diff --git a/app/test-mldev/test_inference_interleave.c 
b/app/test-mldev/test_inference_interleave.c
index bd2c286737..23b8efe4f0 100644
--- a/app/test-mldev/test_inference_interleave.c
+++ b/app/test-mldev/test_inference_interleave.c
@@ -58,7 +58,7 @@ test_inference_interleave_

[PATCH v1 2/5] mldev: introduce revised xstats

2023-04-22 Thread Srikanth Yalavarthi
Introduce revised xstats APIs to support reporting device and
per-model xstats. Stat type is selected through mode parameter.
Support modes include device and model.

Signed-off-by: Srikanth Yalavarthi 
---
 lib/mldev/rte_mldev.h | 113 ++
 1 file changed, 113 insertions(+)

diff --git a/lib/mldev/rte_mldev.h b/lib/mldev/rte_mldev.h
index 1e967a7c2a..222ecbdbe1 100644
--- a/lib/mldev/rte_mldev.h
+++ b/lib/mldev/rte_mldev.h
@@ -593,6 +593,16 @@ __rte_experimental
 void
 rte_ml_dev_stats_reset(int16_t dev_id);
 
+/**
+ * Selects the component of the mldev to retrieve statistics from.
+ */
+enum rte_ml_dev_xstats_mode {
+   RTE_ML_DEV_XSTATS_DEVICE,
+   /**< Device xstats */
+   RTE_ML_DEV_XSTATS_MODEL,
+   /**< Model xstats */
+};
+
 /**
  * A name-key lookup element for extended statistics.
  *
@@ -605,6 +615,109 @@ struct rte_ml_dev_xstats_map {
/**< xstat name */
 };
 
+/**
+ * Retrieve names of extended statistics of an ML device.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param mode
+ *   Mode of statistics to retrieve. Choices include the device statistics and 
model statistics.
+ * @param model_id
+ *   Used to specify the model number in model mode, and is ignored in device 
mode.
+ * @param[out] xstats_map
+ *   Block of memory to insert names and ids into. Must be at least size in 
capacity. If set to
+ * NULL, function returns required capacity. The id values returned can be 
passed to
+ * *rte_ml_dev_xstats_get* to select statistics.
+ * @param size
+ *   Capacity of xstats_names (number of xstats_map).
+ * @return
+ *   - Positive value lower or equal to size: success. The return value is the 
number of entries
+ * filled in the stats table.
+ *   - Positive value higher than size: error, the given statistics table is 
too small. The return
+ * value corresponds to the size that should be given to succeed. The entries 
in the table are not
+ * valid and shall not be used by the caller.
+ *   - Negative value on error:
+ *-ENODEV for invalid *dev_id*.
+ *-EINVAL for invalid mode, model parameters.
+ *-ENOTSUP if the device doesn't support this function.
+ */
+__rte_experimental
+int
+rte_ml_dev_xstats_names_get(int16_t dev_id, enum rte_ml_dev_xstats_mode mode, 
int32_t model_id,
+   struct rte_ml_dev_xstats_map *xstats_map, uint32_t 
size);
+
+/**
+ * Retrieve the value of a single stat by requesting it by name.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param name
+ *   Name of stat name to retrieve.
+ * @param[out] stat_id
+ *   If non-NULL, the numerical id of the stat will be returned, so that 
further requests for the
+ * stat can be got using rte_ml_dev_xstats_get, which will be faster as it 
doesn't need to scan a
+ * list of names for the stat. If the stat cannot be found, the id returned 
will be (unsigned)-1.
+ * @param[out] value
+ *   Value of the stat to be returned.
+ * @return
+ *   - Zero: No error.
+ *   - Negative value: -EINVAL if stat not found, -ENOTSUP if not supported.
+ */
+__rte_experimental
+int
+rte_ml_dev_xstats_by_name_get(int16_t dev_id, const char *name, uint16_t 
*stat_id, uint64_t *value);
+
+/**
+ * Retrieve extended statistics of an ML device.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param mode
+ *  Mode of statistics to retrieve. Choices include the device statistics and 
model statistics.
+ * @param model_id
+ *   Used to specify the model id in model mode, and is ignored in device mode.
+ * @param stat_ids
+ *   ID numbers of the stats to get. The ids can be got from the stat position 
in the stat list from
+ * rte_ml_dev_xstats_names_get(), or by using rte_ml_dev_xstats_by_name_get().
+ * @param[out] values
+ *   Values for each stats request by ID.
+ * @param nb_ids
+ *   Number of stats requested.
+ * @return
+ *   - Positive value: number of stat entries filled into the values array
+ *   - Negative value on error:
+ *-ENODEV for invalid *dev_id*.
+ *-EINVAL for invalid mode, model id or stat id parameters.
+ *-ENOTSUP if the device doesn't support this function.
+ */
+__rte_experimental
+int
+rte_ml_dev_xstats_get(int16_t dev_id, enum rte_ml_dev_xstats_mode mode, 
int32_t model_id,
+ const uint16_t stat_ids[], uint64_t values[], uint16_t 
nb_ids);
+
+/**
+ * Reset the values of the xstats of the selected component in the device.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param mode
+ *   Mode of the statistics to reset. Choose from device or model.
+ * @param model_id
+ *   Model stats to reset. 0 and positive values select models, while -1 
indicates all models.
+ * @param stat_ids
+ *   Selects specific statistics to be reset. When NULL, all statistics 
selected by *mode* will be
+ * reset. If non-NULL, must point to array of at least *nb_ids* size.
+ * @param nb_ids
+ *   The number of ids available from the *ids* arra

[PATCH v1 3/5] mldev: implement xstats library functions

2023-04-22 Thread Srikanth Yalavarthi
Implemented xstats library functions as per revised spec.

Signed-off-by: Srikanth Yalavarthi 
---
 lib/mldev/rte_mldev.c  |  91 +++
 lib/mldev/rte_mldev_core.h | 107 +
 lib/mldev/version.map  |   4 ++
 3 files changed, 202 insertions(+)

diff --git a/lib/mldev/rte_mldev.c b/lib/mldev/rte_mldev.c
index 72d4d7a165..0d8ccd3212 100644
--- a/lib/mldev/rte_mldev.c
+++ b/lib/mldev/rte_mldev.c
@@ -438,6 +438,97 @@ rte_ml_dev_stats_reset(int16_t dev_id)
(*dev->dev_ops->dev_stats_reset)(dev);
 }
 
+int
+rte_ml_dev_xstats_names_get(int16_t dev_id, enum rte_ml_dev_xstats_mode mode, 
int32_t model_id,
+   struct rte_ml_dev_xstats_map *xstats_map, uint32_t 
size)
+{
+   struct rte_ml_dev *dev;
+
+   if (!rte_ml_dev_is_valid_dev(dev_id)) {
+   RTE_MLDEV_LOG(ERR, "Invalid dev_id = %d\n", dev_id);
+   return -EINVAL;
+   }
+
+   dev = rte_ml_dev_pmd_get_dev(dev_id);
+   if (*dev->dev_ops->dev_xstats_names_get == NULL)
+   return -ENOTSUP;
+
+   return (*dev->dev_ops->dev_xstats_names_get)(dev, mode, model_id, 
xstats_map, size);
+}
+
+int
+rte_ml_dev_xstats_by_name_get(int16_t dev_id, const char *name, uint16_t 
*stat_id, uint64_t *value)
+{
+   struct rte_ml_dev *dev;
+
+   if (!rte_ml_dev_is_valid_dev(dev_id)) {
+   RTE_MLDEV_LOG(ERR, "Invalid dev_id = %d\n", dev_id);
+   return -EINVAL;
+   }
+
+   dev = rte_ml_dev_pmd_get_dev(dev_id);
+   if (*dev->dev_ops->dev_xstats_by_name_get == NULL)
+   return -ENOTSUP;
+
+   if (name == NULL) {
+   RTE_MLDEV_LOG(ERR, "Dev %d, name cannot be NULL\n", dev_id);
+   return -EINVAL;
+   }
+
+   if (value == NULL) {
+   RTE_MLDEV_LOG(ERR, "Dev %d, value cannot be NULL\n", dev_id);
+   return -EINVAL;
+   }
+
+   return (*dev->dev_ops->dev_xstats_by_name_get)(dev, name, stat_id, 
value);
+}
+
+int
+rte_ml_dev_xstats_get(int16_t dev_id, enum rte_ml_dev_xstats_mode mode, 
int32_t model_id,
+ const uint16_t stat_ids[], uint64_t values[], uint16_t 
nb_ids)
+{
+   struct rte_ml_dev *dev;
+
+   if (!rte_ml_dev_is_valid_dev(dev_id)) {
+   RTE_MLDEV_LOG(ERR, "Invalid dev_id = %d\n", dev_id);
+   return -EINVAL;
+   }
+
+   dev = rte_ml_dev_pmd_get_dev(dev_id);
+   if (*dev->dev_ops->dev_xstats_get == NULL)
+   return -ENOTSUP;
+
+   if (stat_ids == NULL) {
+   RTE_MLDEV_LOG(ERR, "Dev %d, stat_ids cannot be NULL\n", dev_id);
+   return -EINVAL;
+   }
+
+   if (values == NULL) {
+   RTE_MLDEV_LOG(ERR, "Dev %d, values cannot be NULL\n", dev_id);
+   return -EINVAL;
+   }
+
+   return (*dev->dev_ops->dev_xstats_get)(dev, mode, model_id, stat_ids, 
values, nb_ids);
+}
+
+int
+rte_ml_dev_xstats_reset(int16_t dev_id, enum rte_ml_dev_xstats_mode mode, 
int32_t model_id,
+   const uint16_t stat_ids[], uint16_t nb_ids)
+{
+   struct rte_ml_dev *dev;
+
+   if (!rte_ml_dev_is_valid_dev(dev_id)) {
+   RTE_MLDEV_LOG(ERR, "Invalid dev_id = %d\n", dev_id);
+   return -EINVAL;
+   }
+
+   dev = rte_ml_dev_pmd_get_dev(dev_id);
+   if (*dev->dev_ops->dev_xstats_reset == NULL)
+   return -ENOTSUP;
+
+   return (*dev->dev_ops->dev_xstats_reset)(dev, mode, model_id, stat_ids, 
nb_ids);
+}
+
 int
 rte_ml_dev_dump(int16_t dev_id, FILE *fd)
 {
diff --git a/lib/mldev/rte_mldev_core.h b/lib/mldev/rte_mldev_core.h
index 926a652397..78b8b7633d 100644
--- a/lib/mldev/rte_mldev_core.h
+++ b/lib/mldev/rte_mldev_core.h
@@ -236,6 +236,101 @@ typedef int (*mldev_stats_get_t)(struct rte_ml_dev *dev, 
struct rte_ml_dev_stats
  */
 typedef void (*mldev_stats_reset_t)(struct rte_ml_dev *dev);
 
+/**
+ * @internal
+ *
+ * Function used to get names of extended stats.
+ *
+ * @param dev
+ * ML device pointer.
+ * @param mode
+ * Mode of stats to retrieve.
+ * @param model_id
+ * Used to specify model id in model mode. Ignored in device mode.
+ * @param xstats_map
+ * Array to insert id and names into.
+ * @param size
+ * Size of xstats_map array.
+ *
+ * @return
+ * - >= 0 and <= size on success.
+ * - > size, error. Returns the size of xstats_map array required.
+ * - < 0, error code on failure.
+ */
+typedef int (*mldev_xstats_names_get_t)(struct rte_ml_dev *dev, enum 
rte_ml_dev_xstats_mode mode,
+   int32_t model_id, struct 
rte_ml_dev_xstats_map *xstats_map,
+   uint32_t size);
+
+/**
+ * @internal
+ *
+ * Function used to get a single extended stat by name.
+ *
+ * @param dev
+ * ML device pointer.
+ * @param name
+ * Name of the stat to retrieve.
+ * @param stat_id
+ * ID of the stat to be returned.
+ * @param

[PATCH v1 4/5] app/mldev: enable reporting xstats

2023-04-22 Thread Srikanth Yalavarthi
Enabled reporting xstats in ML test application. Enabled
stats option for model_ops test case. Added common files
for xstats and throughput functions.

Signed-off-by: Srikanth Yalavarthi 
---
 app/test-mldev/meson.build |   1 +
 app/test-mldev/ml_common.h |  11 ++
 app/test-mldev/ml_options.c|   5 +-
 app/test-mldev/test_common.h   |   3 +
 app/test-mldev/test_inference_common.c |  60 --
 app/test-mldev/test_inference_common.h |   1 -
 app/test-mldev/test_inference_interleave.c |   6 +-
 app/test-mldev/test_inference_ordered.c|   5 +-
 app/test-mldev/test_model_ops.c|   3 +
 app/test-mldev/test_stats.c| 129 +
 app/test-mldev/test_stats.h|  13 +++
 11 files changed, 172 insertions(+), 65 deletions(-)
 create mode 100644 app/test-mldev/test_stats.c
 create mode 100644 app/test-mldev/test_stats.h

diff --git a/app/test-mldev/meson.build b/app/test-mldev/meson.build
index 15db534dc2..18e28f2713 100644
--- a/app/test-mldev/meson.build
+++ b/app/test-mldev/meson.build
@@ -19,6 +19,7 @@ sources = files(
 'test_inference_common.c',
 'test_inference_ordered.c',
 'test_inference_interleave.c',
+'test_stats.c'
 )
 
 deps += ['mldev', 'hash']
diff --git a/app/test-mldev/ml_common.h b/app/test-mldev/ml_common.h
index 624a5aff50..8d7cc9eeb7 100644
--- a/app/test-mldev/ml_common.h
+++ b/app/test-mldev/ml_common.h
@@ -26,4 +26,15 @@
 
 #define ml_dump_end printf("\b\t}\n\n")
 
+static inline void
+ml_print_line(uint16_t len)
+{
+   uint16_t i;
+
+   for (i = 0; i < len; i++)
+   printf("-");
+
+   printf("\n");
+}
+
 #endif /* ML_COMMON_H */
diff --git a/app/test-mldev/ml_options.c b/app/test-mldev/ml_options.c
index 2efcc3532c..1daa229748 100644
--- a/app/test-mldev/ml_options.c
+++ b/app/test-mldev/ml_options.c
@@ -205,7 +205,8 @@ ml_dump_test_options(const char *testname)
}
 
if (strcmp(testname, "model_ops") == 0) {
-   printf("\t\t--models   : comma separated list of 
models\n");
+   printf("\t\t--models   : comma separated list of 
models\n"
+  "\t\t--stats: enable reporting device 
statistics\n");
printf("\n");
}
 
@@ -218,7 +219,7 @@ ml_dump_test_options(const char *testname)
   "\t\t--queue_size   : size fo queue-pair\n"
   "\t\t--batches  : number of batches of input\n"
   "\t\t--tolerance: maximum tolerance (%%) for 
output validation\n"
-  "\t\t--stats: enable reporting performance 
statistics\n");
+  "\t\t--stats: enable reporting device and 
model statistics\n");
printf("\n");
}
 }
diff --git a/app/test-mldev/test_common.h b/app/test-mldev/test_common.h
index a7b2ea652a..def108d5b2 100644
--- a/app/test-mldev/test_common.h
+++ b/app/test-mldev/test_common.h
@@ -14,6 +14,9 @@ struct test_common {
struct ml_options *opt;
enum ml_test_result result;
struct rte_ml_dev_info dev_info;
+   struct rte_ml_dev_xstats_map *xstats_map;
+   uint64_t *xstats_values;
+   int xstats_size;
 };
 
 bool ml_test_cap_check(struct ml_options *opt);
diff --git a/app/test-mldev/test_inference_common.c 
b/app/test-mldev/test_inference_common.c
index 1e16608582..469ed35f6c 100644
--- a/app/test-mldev/test_inference_common.c
+++ b/app/test-mldev/test_inference_common.c
@@ -39,17 +39,6 @@
}   
   \
} while (0)
 
-static void
-print_line(uint16_t len)
-{
-   uint16_t i;
-
-   for (i = 0; i < len; i++)
-   printf("-");
-
-   printf("\n");
-}
-
 /* Enqueue inference requests with burst size equal to 1 */
 static int
 ml_enqueue_single(void *arg)
@@ -1027,52 +1016,3 @@ ml_inference_launch_cores(struct ml_test *test, struct 
ml_options *opt, uint16_t
 
return 0;
 }
-
-int
-ml_inference_throughput_get(struct ml_test *test, struct ml_options *opt)
-{
-   struct test_inference *t = ml_test_priv(test);
-   uint64_t total_cycles = 0;
-   uint32_t nb_filelist;
-   uint64_t throughput;
-   uint64_t avg_e2e;
-   uint32_t qp_id;
-   uint64_t freq;
-
-   if (!opt->stats)
-   return 0;
-
-   /* print end-to-end stats */
-   freq = rte_get_tsc_hz();
-   for (qp_id = 0; qp_id < RTE_MAX_LCORE; qp_id++)
-   total_cycles += t->args[qp_id].end_cycles - 
t->args[qp_id].start_cycles;
-   avg_e2e = total_cycles / opt->repetitions;
-
-   if (freq == 0) {
-   avg_e2e = total_cycles / opt->repetitions;
-   printf(" %-64s = %" PRIu64 "\n", "Average End-to-End Latency 
(cycles)", avg_e2e);
-   } else {
-   avg_e2e =

[PATCH v1 5/5] ml/cnxk: implement xstats driver functions

2023-04-22 Thread Srikanth Yalavarthi
Added support for revised xstats APIs in cnxk ML driver.

Signed-off-by: Srikanth Yalavarthi 
---
 doc/guides/mldevs/cnxk.rst   |  30 +-
 drivers/ml/cnxk/cn10k_ml_dev.h   |  96 +-
 drivers/ml/cnxk/cn10k_ml_model.h |  21 --
 drivers/ml/cnxk/cn10k_ml_ops.c   | 530 ++-
 4 files changed, 502 insertions(+), 175 deletions(-)

diff --git a/doc/guides/mldevs/cnxk.rst b/doc/guides/mldevs/cnxk.rst
index 91e5df095a..2a339451fd 100644
--- a/doc/guides/mldevs/cnxk.rst
+++ b/doc/guides/mldevs/cnxk.rst
@@ -213,14 +213,32 @@ Debugging Options
 Extended stats
 --
 
-Marvell cnxk ML PMD supports reporting the inference latencies
-through extended statistics.
-The PMD supports the below list of 6 extended stats types per each model.
-Total number of extended stats would be equal to 6 x number of models loaded.
+Marvell cnxk ML PMD supports reporting the device and model extended 
statistics.
 
-.. _table_octeon_cnxk_ml_xstats_names:
+PMD supports the below list of 4 device extended stats.
 
-.. table:: OCTEON cnxk ML PMD xstats names
+.. _table_octeon_cnxk_ml_device_xstats_names:
+
+.. table:: OCTEON cnxk ML PMD device xstats names
+
+   +---+-+--+
+   | # | Type| Description  |
+   +===+=+==+
+   | 1 | nb_models_loaded| Number of models loaded  |
+   +---+-+--+
+   | 2 | nb_models_unloaded  | Number of models unloaded|
+   +---+-+--+
+   | 3 | nb_models_started   | Number of models started |
+   +---+-+--+
+   | 4 | nb_models_stopped   | Number of models stopped |
+   +---+-+--+
+
+
+PMD supports the below list of 6 extended stats types per each model.
+
+.. _table_octeon_cnxk_ml_model_xstats_names:
+
+.. table:: OCTEON cnxk ML PMD model xstats names
 
+---+-+--+
| # | Type| Description  |
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index b4e46899c0..5a8c8206b2 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -380,6 +380,89 @@ struct cn10k_ml_fw {
struct cn10k_ml_req *req;
 };
 
+/* Extended stats types enum */
+enum cn10k_ml_xstats_type {
+   /* Number of models loaded */
+   nb_models_loaded,
+
+   /* Number of models unloaded */
+   nb_models_unloaded,
+
+   /* Number of models started */
+   nb_models_started,
+
+   /* Number of models stopped */
+   nb_models_stopped,
+
+   /* Average inference hardware latency */
+   avg_hw_latency,
+
+   /* Minimum hardware latency */
+   min_hw_latency,
+
+   /* Maximum hardware latency */
+   max_hw_latency,
+
+   /* Average firmware latency */
+   avg_fw_latency,
+
+   /* Minimum firmware latency */
+   min_fw_latency,
+
+   /* Maximum firmware latency */
+   max_fw_latency,
+};
+
+/* Extended stats function type enum. */
+enum cn10k_ml_xstats_fn_type {
+   /* Device function */
+   CN10K_ML_XSTATS_FN_DEVICE,
+
+   /* Model function */
+   CN10K_ML_XSTATS_FN_MODEL,
+};
+
+/* Function pointer to get xstats for a type */
+typedef uint64_t (*cn10k_ml_xstats_fn)(struct rte_ml_dev *dev, uint16_t 
obj_idx,
+  enum cn10k_ml_xstats_type stat);
+
+/* Extended stats entry structure */
+struct cn10k_ml_xstats_entry {
+   /* Name-ID map */
+   struct rte_ml_dev_xstats_map map;
+
+   /* xstats mode, device or model */
+   enum rte_ml_dev_xstats_mode mode;
+
+   /* Type of xstats */
+   enum cn10k_ml_xstats_type type;
+
+   /* xstats function */
+   enum cn10k_ml_xstats_fn_type fn_id;
+
+   /* Object ID, model ID for model stat type */
+   uint16_t obj_idx;
+
+   /* Allowed to reset the stat */
+   uint8_t reset_allowed;
+
+   /* An offset to be taken away to emulate resets */
+   uint64_t reset_value;
+};
+
+/* Extended stats data */
+struct cn10k_ml_xstats {
+   /* Pointer to xstats entries */
+   struct cn10k_ml_xstats_entry *entries;
+
+   /* Store num stats and offset of the stats for each model */
+   uint16_t count_per_model[ML_CN10K_MAX_MODELS];
+   uint16_t offset_for_model[ML_CN10K_MAX_MODELS];
+   uint16_t count_mode_device;
+   uint16_t count_mode_model;
+   uint16_t count;
+};
+
 /* Device private data */
 struct cn10k_ml_dev {
/* Device ROC */
@@ -397,8 +480,17 @@ struct cn10k_ml_dev {
/* Number of model

RE: [PATCH v5] enhance NUMA affinity heuristic

2023-04-22 Thread You, KaisenX



> -Original Message-
> From: Thomas Monjalon 
> Sent: 2023年4月21日 16:13
> To: You, KaisenX 
> Cc: dev@dpdk.org; Zhou, YidingX ;
> david.march...@redhat.com; Matz, Olivier ;
> ferruh.yi...@amd.com; zhou...@loongson.cn; sta...@dpdk.org;
> Richardson, Bruce ; jer...@marvell.com;
> Burakov, Anatoly 
> Subject: Re: [PATCH v5] enhance NUMA affinity heuristic
> 
> 21/04/2023 04:34, You, KaisenX:
> > From: Thomas Monjalon 
> > > 13/04/2023 02:56, You, KaisenX:
> > > > From: You, KaisenX
> > > > > From: Thomas Monjalon 
> > > > > >
> > > > > > I'm not comfortable with this patch.
> > > > > >
> > > > > > First, there is no comment in the code which helps to
> > > > > > understand the
> > > logic.
> > > > > > Second, I'm afraid changing the value of the per-core variable
> > > > > > _socket_id may have an impact on some applications.
> > > > > >
> > > > Hi Thomas, I'm sorry to bother you again, but we can't think of a
> > > > better solution for now, would you please give me some suggestion,
> > > > and
> > > then I will modify it accordingly.
> > >
> > > You need to better explain the logic both in the commit message and
> > > in code comments.
> > > When it will be done, it will be easier to have a discussion with
> > > other maintainers and community experts.
> > > Thank you
> > >
> > Thank you for your reply, I'll explain my patch in more detail next.
> >
> > When a DPDK application is started on only one numa node,
> 
> What do you mean by started on only one node?
When the dpdk application is started with the startup parameter "-l 40-59" 
(this range is on the same node as the system cpu processor).Only memory is 
allocated for this node when the process is initialized.
> 
> > memory is allocated for only one socket.
> > When interrupt threads use memory, memory may not be found on the
> > socket where the interrupt thread is currently located,
> 
> Why interrupt thread is on a different socket?
The above only allocates memory on node1, but the interrupt thread is created 
on node0.
Interrupt threads are created by rte_ctrl_thread_create() 
,rte_ctrl_thread_create()' 
does NOT run on main lcore, it can run on any core except data plane cores. 
So interrupt thread can run on any core.
> > and memory has to be reallocated on the hugepage, this operation can
> > lead to performance degradation.
> >
> > So my modification is in the function malloc_get_numa_socket to make
> > sure that the first socket with memory can be returned.
> >
> > If you can accept my explanation and modification, I will send the V6
> > version to improve the commit message and code comments.
> >
> > > > > Thank you for your reply.
> > > > > First, about comments, I can submit a new patch to add comments
> > > > > to help understand.
> > > > > Second, if you do not change the value of the per-core variable_
> > > > > socket_ id, /lib/eal/common/malloc_heap.c
> > > > > malloc_get_numa_socket(void)
> > > > > {
> > > > > const struct internal_config *conf =
> eal_get_internal_configuration();
> > > > > unsigned int socket_id = rte_socket_id();   // The return 
> > > > > value of
> > > > > "rte_socket_id()" is 1
> > > > > unsigned int idx;
> > > > >
> > > > > if (socket_id != (unsigned int)SOCKET_ID_ANY)
> > > > > return socket_id;//so return here
> > > > >
> > > > > This will cause return here, This function returns the socket_id
> > > > > of unallocated memory.
> > > > >
> > > > > If you have a better solution, I can modify it.
> 
> 
>