RE: [RFC] ring: improve ring performance with C11 atomics
> From: Wathsala Vithanage [mailto:wathsala.vithan...@arm.com] > Sent: Friday, 21 April 2023 21.17 > > Tail load in __rte_ring_move_cons_head and __rte_ring_move_prod_head > can be changed to __ATOMIC_RELAXED from __ATOMIC_ACQUIRE. > Because to calculate the addresses of the dequeue > elements __rte_ring_dequeue_elems uses the old_head updated by the > __atomic_compare_exchange_n intrinsic used in > __rte_ring_move_prod_head. This results in an address dependency > between the two operations. Therefore __rte_ring_dequeue_elems cannot > happen before __rte_ring_move_prod_head. > Similarly __rte_ring_enqueue_elems and __rte_ring_move_cons_head > won't be reordered either. These preconditions should be added as comments in the source code. > > Performance on Arm N1 > Gain relative to generic implementation > +---+ > | Bulk enq/dequeue count on size 8 (Arm N1) | > +---+ > | Generic | C11 atomics | C11 atomics improved | > +---+ > | Total count: 766730 | Total count: 651686 | Total count: 812125 | > | |Gain: -15%|Gain: 6% | > +---+ > +---+ > | Bulk enq/dequeue count on size 32 (Arm N1)| > +---+ > | Generic | C11 atomics | C11 atomics improved | > +---+ > | Total count: 816745 | Total count: 646385 | Total count: 830935 | > | |Gain: -21%|Gain: 2% | > +---+ Big performance gain compared to pre-improved C11 atomics! Excellent. > > Performance on x86-64 Cascade Lake > Gain relative to generic implementation > +---+ > | Bulk enq/dequeue count on size 8 | > +---+ > | Generic | C11 atomics | C11 atomics improved | > +---+ > | Total count: 181640 | Total count: 181995 | Total count: 182791 | > | |Gain: 0.2%|Gain: 0.6% > +---+ > +---+ > | Bulk enq/dequeue count on size 32 | > +---+ > | Generic | C11 atomics | C11 atomics improved | > +---+ > | Total count: 167495 | Total count: 161536 | Total count: 163190 | > | |Gain: -3.5% |Gain: -2.6% | > +---+ I noticed that the larger size (32 objects) had a larger relative drop in performance than the smaller size (8 objects), so I am wondering what the performance numbers are for size 512, the default RTE_MEMPOOL_CACHE_MAX_SIZE? It's probably not going to change anything regarding the patch acceptance, but I'm curious about the numbers. > > Signed-off-by: Wathsala Vithanage > Reviewed-by: Honnappa Nagarahalli > Reviewed-by: Feifei Wang > --- > .mailmap| 1 + > lib/ring/rte_ring_c11_pvt.h | 18 +- > 2 files changed, 10 insertions(+), 9 deletions(-) > > diff --git a/.mailmap b/.mailmap > index 4018f0fc47..367115d134 100644 > --- a/.mailmap > +++ b/.mailmap > @@ -1430,6 +1430,7 @@ Walter Heymans > Wang Sheng-Hui > Wangyu (Eric) > Waterman Cao > +Wathsala Vithanage > Weichun Chen > Wei Dai > Weifeng Li > diff --git a/lib/ring/rte_ring_c11_pvt.h b/lib/ring/rte_ring_c11_pvt.h > index f895950df4..1895f2bb0e 100644 > --- a/lib/ring/rte_ring_c11_pvt.h > +++ b/lib/ring/rte_ring_c11_pvt.h > @@ -24,6 +24,13 @@ __rte_ring_update_tail(struct rte_ring_headtail *ht, > uint32_t old_val, > if (!single) > rte_wait_until_equal_32(&ht->tail, old_val, __ATOMIC_RELAXED); > > + /* > + * Updating of ht->tail cannot happen before elements are added to or > + * removed from the ring, as it could result in data races between > + * producer and consumer threads. Therefore ht->tail should be updated > + * with release semantics to prevent ring data copy phase from sinking > + * below it. > + */ I think this comment should clarified as: Updating of ht->tail SHOULD NOT happen before elements are added to or remove
[RFC PATCH 0/1] Introduce Event ML Adapter
Machine learning event adapter library == DPDK Eventdev library provides event driven programming model with features to schedule events. ML Device library provides an interface to ML poll mode drivers that support Machine Learning inference operations. Event ML Adapter is intended to bridge between the event device and the ML device. Packet flow from ML device to the event device can be accomplished using software and hardware based transfer mechanisms. The adapter queries an eventdev PMD to determine which mechanism to be used. The adapter uses an EAL service core function for software based packet transfer and uses the eventdev PMD functions to configure hardware based packet transfer between ML device and the event device. The application can choose to submit a ML operation directly to an ML device or send it to the ML adapter via eventdev based on RTE_EVENT_ML_ADAPTER_CAP_INTERNAL_PORT_OP_FWD capability. The first mode is known as the event new (RTE_EVENT_ML_ADAPTER_OP_NEW) mode and the second as the event forward (RTE_EVENT_ML_ADAPTER_OP_FORWARD) mode. The choice of mode can be specified while creating the adapter. In the former mode, it is an application responsibility to enable ingress packet ordering. In the latter mode, it is the adapter responsibility to enable the ingress packet ordering. Working model of RTE_EVENT_ML_ADAPTER_OP_NEW mode: +--+ +--+ | | | ML stage | | Application |---[2]-->| + enqueue to | | | | mldev| +--+ +--+ ^ ^ | | | [3] [6] [1] | | | | +--+| | || | Event device || | || +--+| ^| || [5] | |v +--+ +--+ | | | | | ML adapter |<--[4]---|mldev | | | | | +--+ +--+ [1] Application dequeues events from the previous stage. [2] Application prepares the ML operations. [3] ML operations are submitted to mldev by application. [4] ML adapter dequeues ML completions from mldev. [5] ML adapter enqueues events to the eventdev. [6] Application dequeues from eventdev for further processing. In the RTE_EVENT_ML_ADAPTER_OP_NEW mode, application submits ML operations directly to ML device. The ML adapter then dequeues ML completions from ML device and enqueue events to the event device. This mode does not ensure ingress ordering, if the application directly enqueues to mldev without going through ML / atomic stage i.e. removing item [1] and [2]. Events dequeued from the adapter will be treated as new events. In this mode, application needs to specify event information (response information) which is needed to enqueue an event after the ML operation is completed. Working model of RTE_EVENT_ML_ADAPTER_OP_FORWARD mode: +--+ +--+ --[1]-->| |---[2]-->| Application | | Event device | | in | <--[8]--| |<--[3]---| Ordered stage| +--+ +--+ ^ | | [4] [7] | | v ++ +--+ ||--[5]->| | | ML adapter | | mldev| ||<-[6]--| | ++ +--+ [1] Events from the previous stage. [2] Application in ordered stage dequeues events from eventdev. [3] Application enqueues ML operations as events to eventdev. [4] ML adapter dequeues event from eventdev. [5] ML adapter submits ML operations to mldev (Atomic stage). [6] ML adapter dequeues ML completions from mldev [7] ML adapter enqueues events to the eventdev [8] Events to the next stage In the event forward (RTE_EVENT_ML_ADAPTER_OP_FORWARD) mode, if the HW supports the capability RTE_EVENT_ML_ADAPTER_CAP_INTERNAL_PORT_OP_FWD, application can directly submit the ML operations to the mldev.
[RFC PATCH 1/1] eventdev: introduce ML event adapter library
Introduce event ML adapter APIs. This patch provides information on adapter modes and usage. Application can use this event adapter interface to transfer packets between ML device and event device. Signed-off-by: Srikanth Yalavarthi --- MAINTAINERS |6 + config/rte_config.h |1 + doc/api/doxy-api-index.md |1 + doc/guides/prog_guide/event_ml_adapter.rst| 268 doc/guides/prog_guide/eventdev.rst|8 +- .../img/event_ml_adapter_op_forward.svg | 1086 + .../img/event_ml_adapter_op_new.svg | 1079 doc/guides/prog_guide/index.rst |1 + lib/eventdev/meson.build |4 +- lib/eventdev/rte_event_ml_adapter.c |6 + lib/eventdev/rte_event_ml_adapter.h | 594 + lib/eventdev/rte_eventdev.h | 45 + lib/meson.build |2 +- lib/mldev/rte_mldev.h |6 + 14 files changed, 3102 insertions(+), 5 deletions(-) create mode 100644 doc/guides/prog_guide/event_ml_adapter.rst create mode 100644 doc/guides/prog_guide/img/event_ml_adapter_op_forward.svg create mode 100644 doc/guides/prog_guide/img/event_ml_adapter_op_new.svg create mode 100644 lib/eventdev/rte_event_ml_adapter.c create mode 100644 lib/eventdev/rte_event_ml_adapter.h diff --git a/MAINTAINERS b/MAINTAINERS index 8df23e5099..2b47d26561 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -544,6 +544,12 @@ F: drivers/raw/skeleton/ F: app/test/test_rawdev.c F: doc/guides/prog_guide/rawdev.rst +Eventdev ML Adapter API +M: Srikanth Yalavarthi +T: git://dpdk.org/next/dpdk-next-eventdev +F: lib/eventdev/*ml_adapter* +F: doc/guides/prog_guide/event_ml_adapter.rst + Memory Pool Drivers --- diff --git a/config/rte_config.h b/config/rte_config.h index 7b8c85e948..0c3911658f 100644 --- a/config/rte_config.h +++ b/config/rte_config.h @@ -78,6 +78,7 @@ #define RTE_EVENT_ETH_INTR_RING_SIZE 1024 #define RTE_EVENT_CRYPTO_ADAPTER_MAX_INSTANCE 32 #define RTE_EVENT_ETH_TX_ADAPTER_MAX_INSTANCE 32 +#define RTE_EVENT_ML_ADAPTER_MAX_INSTANCE 32 /* rawdev defines */ #define RTE_RAWDEV_MAX_DEVS 64 diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md index c709fd48ad..e34a945b30 100644 --- a/doc/api/doxy-api-index.md +++ b/doc/api/doxy-api-index.md @@ -29,6 +29,7 @@ The public API headers are grouped by topics: [event_eth_tx_adapter](@ref rte_event_eth_tx_adapter.h), [event_timer_adapter](@ref rte_event_timer_adapter.h), [event_crypto_adapter](@ref rte_event_crypto_adapter.h), + [event_ml_adapter](@ref rte_event_ml_adapter.h), [rawdev](@ref rte_rawdev.h), [metrics](@ref rte_metrics.h), [bitrate](@ref rte_bitrate.h), diff --git a/doc/guides/prog_guide/event_ml_adapter.rst b/doc/guides/prog_guide/event_ml_adapter.rst new file mode 100644 index 00..d0b8f9c1b6 --- /dev/null +++ b/doc/guides/prog_guide/event_ml_adapter.rst @@ -0,0 +1,268 @@ +.. SPDX-License-Identifier: BSD-3-Clause +Copyright (c) 2023 Marvell. + +Event ML Adapter Library + + +DPDK :doc:`Eventdev library ` provides event driven programming model with features +to schedule events. :doc:`ML Device library ` provides an interface to ML poll mode +drivers that support Machine Learning inference operations. Event ML Adapter is intended to +bridge between the event device and the ML device. + +Packet flow from ML device to the event device can be accomplished using software and hardware +based transfer mechanisms. The adapter queries an eventdev PMD to determine which mechanism to +be used. The adapter uses an EAL service core function for software based packet transfer and +uses the eventdev PMD functions to configure hardware based packet transfer between ML device +and the event device. ML adapter uses a new event type called ``RTE_EVENT_TYPE_MLDEV`` to +indicate the source of event. + +Application can choose to submit an ML operation directly to an ML device or send it to an ML +adapter via eventdev based on RTE_EVENT_ML_ADAPTER_CAP_INTERNAL_PORT_OP_FWD capability. The +first mode is known as the event new (RTE_EVENT_ML_ADAPTER_OP_NEW) mode and the second as the +event forward (RTE_EVENT_ML_ADAPTER_OP_FORWARD) mode. Choice of mode can be specified while +creating the adapter. In the former mode, it is the application's responsibility to enable +ingress packet ordering. In the latter mode, it is the adapter's responsibility to enable +ingress packet ordering. + + +Adapter Modes +- + +RTE_EVENT_ML_ADAPTER_OP_NEW mode + + +In the RTE_EVENT_ML_ADAPTER_OP_NEW mode, application submits ML operations directly to an ML +device. The adapter then dequeues ML completions from the ML device and enqueues them as events +to the event device. This mode does not ensure ingress o
Re: [EXT] [PATCH] crypto/uadk: set queue pair in dev_configure
Hi, Akhil On Thu, 20 Apr 2023 at 15:20, Akhil Goyal wrote: > > > By default, uadk only alloc two queues for each algorithm, which > > will impact performance. > > Set queue pair number as required in dev_configure. > > The default max queue pair number is 8, which can be modified > > via para: max_nb_queue_pairs > > > Please add documentation for the newly added devarg in uadk.rst. Will add + +Initialization +-- + +To use the PMD in an application, user must: + +* Call rte_vdev_init("crypto_uadk") within the application. + +* Use --vdev="crypto_uadk" in the EAL options, which will call rte_vdev_init() internally. + +The following parameters (all optional) can be provided in the previous two calls: + +* max_nb_queue_pairs: Specify the maximum number of queue pairs in the device (8 by default). +The max value of max_nb_queue_pairs can be queried from the device property available_instances. +Property available_instances value may differ from the devices and platforms. +Allocating queue pairs bigger than available_instances will fail. + +Example: + +.. code-block:: console + + cat /sys/class/uacce/hisi_sec2-2/available_instances + 256 + + sudo dpdk-test-crypto-perf -l 0-10 --vdev crypto_uadk,max_nb_queue_pairs=10 \ + -- --devtype crypto_uadk --optype cipher-only --buffer-sz 8192 > > > Example: > > sudo dpdk-test-crypto-perf -l 0-10 --vdev crypto_uadk,max_nb_queue_pairs=10 > > -- --devtype crypto_uadk --optype cipher-only --buffer-sz 8192 > > > > lcore idBuf Size Burst Size Gbps Cycles/Buf > > > >38192 327.5226 871.19 > >78192 327.5225 871.20 > >18192 327.5225 871.20 > >48192 327.5224 871.21 > >58192 327.5224 871.21 > > 108192 327.5223 871.22 > >98192 327.5223 871.23 > >28192 327.5222 871.23 > >88192 327.5222 871.23 > >68192 327.5218 871.28 > > > > No need to mention the above test result in patch description. ok, thanks > > > Signed-off-by: Zhangfei Gao > > --- > > drivers/crypto/uadk/uadk_crypto_pmd.c | 19 +-- > > drivers/crypto/uadk/uadk_crypto_pmd_private.h | 1 + > > 2 files changed, 18 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/crypto/uadk/uadk_crypto_pmd.c > > b/drivers/crypto/uadk/uadk_crypto_pmd.c > > index 4f729e0f07..34aae99342 100644 > > --- a/drivers/crypto/uadk/uadk_crypto_pmd.c > > +++ b/drivers/crypto/uadk/uadk_crypto_pmd.c > > @@ -357,8 +357,15 @@ static const struct rte_cryptodev_capabilities > > uadk_crypto_v2_capabilities[] = { > > /* Configure device */ > > static int > > uadk_crypto_pmd_config(struct rte_cryptodev *dev __rte_unused, > > -struct rte_cryptodev_config *config __rte_unused) > > +struct rte_cryptodev_config *config) > > { > > + char env[128]; > > + > > + /* set queue pairs num via env */ > > + sprintf(env, "sync:%d@0", config->nb_queue_pairs); > > + setenv("WD_CIPHER_CTX_NUM", env, 1); > > + setenv("WD_DIGEST_CTX_NUM", env, 1); > > + > > Who is the intended user of this environment variable? wd_cipher_env_init and wd_digest_env_init in set_session_xxx_parameters will fetch the env and allocate queue pairs accordingly. > > > return 0; > > } > > > > @@ -434,7 +441,7 @@ uadk_crypto_pmd_info_get(struct rte_cryptodev *dev, > > if (dev_info != NULL) { > > dev_info->driver_id = dev->driver_id; > > dev_info->driver_name = dev->device->driver->name; > > - dev_info->max_nb_queue_pairs = 128; > > + dev_info->max_nb_queue_pairs = priv->max_nb_qpairs; > > /* No limit of number of sessions */ > > dev_info->sym.max_nb_sessions = 0; > > dev_info->feature_flags = dev->feature_flags; > > @@ -1015,6 +1022,7 @@ uadk_cryptodev_probe(struct rte_vdev_device *vdev) > > struct uadk_crypto_priv *priv; > > struct rte_cryptodev *dev; > > struct uacce_dev *udev; > > + const char *input_args; > > const char *name; > > > > udev = wd_get_accel_dev("cipher"); > > @@ -1030,6 +1038,9 @@ uadk_cryptodev_probe(struct rte_vdev_device *vdev) > > if (name == NULL) > > return -EINVAL; > > > > + input_args = rte_vdev_device_args(vdev); > > + rte_cryptodev_pmd_parse_input_args(&init_params, input_args); > > + > > dev = rte_cryptodev_pmd_create(name, &vdev->device, &init_params); > > if (dev == NULL) { > > UADK_LOG(ERR, "driver %s: create failed", init_params.name); > > @@ -1044,6 +1055,7 @@ uadk_cryptodev_probe(struct rte_vdev_device *vdev) > >
[PATCH v2] app/mldev: add internal function for file read
Added internal function to read model, input and reference files with required error checks. This change fixes the unchecked return value and improper use of negative value issues reported by coverity scan for file read operations. Coverity issue: 383742, 383743 Fixes: f6661e6d9a3a ("app/mldev: validate model operations") Fixes: da6793390596 ("app/mldev: support inference validation") Signed-off-by: Srikanth Yalavarthi --- app/test-mldev/test_common.c | 59 ++ app/test-mldev/test_common.h | 2 + app/test-mldev/test_inference_common.c | 54 +-- app/test-mldev/test_model_common.c | 33 +++--- 4 files changed, 87 insertions(+), 61 deletions(-) diff --git a/app/test-mldev/test_common.c b/app/test-mldev/test_common.c index 016b31c6ba..be67ea487c 100644 --- a/app/test-mldev/test_common.c +++ b/app/test-mldev/test_common.c @@ -5,12 +5,71 @@ #include #include +#include #include #include #include "ml_common.h" #include "test_common.h" +int +ml_read_file(char *file, size_t *size, char **buffer) +{ + char *file_buffer = NULL; + long file_size = 0; + int ret = 0; + FILE *fp; + + fp = fopen(file, "r"); + if (fp == NULL) { + ml_err("Failed to open file: %s\n", file); + return -EIO; + } + + if (fseek(fp, 0, SEEK_END) == 0) { + file_size = ftell(fp); + if (file_size == -1) { + ret = -EIO; + goto error; + } + + file_buffer = malloc(file_size); + if (file_buffer == NULL) { + ml_err("Failed to allocate memory: %s\n", file); + ret = -ENOMEM; + goto error; + } + + if (fseek(fp, 0, SEEK_SET) != 0) { + ret = -EIO; + goto error; + } + + if (fread(file_buffer, sizeof(char), file_size, fp) != (unsigned long)file_size) { + ml_err("Failed to read file : %s\n", file); + ret = -EIO; + goto error; + } + fclose(fp); + } else { + ret = -EIO; + goto error; + } + + *buffer = file_buffer; + *size = file_size; + + return 0; + +error: + rte_free(file_buffer); + + if (fp != NULL) + fclose(fp); + + return ret; +} + bool ml_test_cap_check(struct ml_options *opt) { diff --git a/app/test-mldev/test_common.h b/app/test-mldev/test_common.h index a7b2ea652a..7e3634b0c6 100644 --- a/app/test-mldev/test_common.h +++ b/app/test-mldev/test_common.h @@ -24,4 +24,6 @@ int ml_test_device_close(struct ml_test *test, struct ml_options *opt); int ml_test_device_start(struct ml_test *test, struct ml_options *opt); int ml_test_device_stop(struct ml_test *test, struct ml_options *opt); +int ml_read_file(char *file, size_t *size, char **buffer); + #endif /* TEST_COMMON_H */ diff --git a/app/test-mldev/test_inference_common.c b/app/test-mldev/test_inference_common.c index 29c18bbc85..96213413a2 100644 --- a/app/test-mldev/test_inference_common.c +++ b/app/test-mldev/test_inference_common.c @@ -610,10 +610,10 @@ ml_inference_iomem_setup(struct ml_test *test, struct ml_options *opt, uint16_t char mp_name[RTE_MEMPOOL_NAMESIZE]; const struct rte_memzone *mz; uint64_t nb_buffers; + char *buffer = NULL; uint32_t buff_size; uint32_t mz_size; - uint32_t fsize; - FILE *fp; + size_t fsize; int ret; /* get input buffer size */ @@ -653,51 +653,35 @@ ml_inference_iomem_setup(struct ml_test *test, struct ml_options *opt, uint16_t t->model[fid].reference = NULL; /* load input file */ - fp = fopen(opt->filelist[fid].input, "r"); - if (fp == NULL) { - ml_err("Failed to open input file : %s\n", opt->filelist[fid].input); - ret = -errno; + ret = ml_read_file(opt->filelist[fid].input, &fsize, &buffer); + if (ret != 0) goto error; - } - fseek(fp, 0, SEEK_END); - fsize = ftell(fp); - fseek(fp, 0, SEEK_SET); - if (fsize != t->model[fid].inp_dsize) { - ml_err("Invalid input file, size = %u (expected size = %" PRIu64 ")\n", fsize, + if (fsize == t->model[fid].inp_dsize) { + rte_memcpy(t->model[fid].input, buffer, fsize); + rte_free(buffer); + } else { + ml_err("Invalid input file, size = %zu (expected size = %" PRIu64 ")\n", fsize, t->model[fid].inp_dsize); ret = -EINVAL; - fclose(fp); - goto error; - } - - if (fread(t->model[fid].input, 1, t->model[fid].inp_dsize, fp) != t->model[fid].inp_dsize) {
[PATCH v2] app/mldev: fix code formatting and typos
Updated ML application source files to have uniform code formatting style across. Remove extra blank lines. Fix typos in application help. Fixes: 8cb22a545447 ("app/mldev: fix debug build") Fixes: da6793390596 ("app/mldev: support inference validation") Fixes: c0e871657d6a ("app/mldev: support queue pairs and size") Signed-off-by: Srikanth Yalavarthi --- app/test-mldev/ml_options.c| 4 +-- app/test-mldev/test_inference_common.c | 36 +- 2 files changed, 20 insertions(+), 20 deletions(-) diff --git a/app/test-mldev/ml_options.c b/app/test-mldev/ml_options.c index 2efcc3532c..e2f3c4dec8 100644 --- a/app/test-mldev/ml_options.c +++ b/app/test-mldev/ml_options.c @@ -200,7 +200,7 @@ ml_dump_test_options(const char *testname) { if (strcmp(testname, "device_ops") == 0) { printf("\t\t--queue_pairs : number of queue pairs to create\n" - "\t\t--queue_size : size fo queue-pair\n"); + "\t\t--queue_size : size of queue-pair\n"); printf("\n"); } @@ -215,7 +215,7 @@ ml_dump_test_options(const char *testname) "\t\t--repetitions : number of inference repetitions\n" "\t\t--burst_size : inference burst size\n" "\t\t--queue_pairs : number of queue pairs to create\n" - "\t\t--queue_size : size fo queue-pair\n" + "\t\t--queue_size : size of queue-pair\n" "\t\t--batches : number of batches of input\n" "\t\t--tolerance: maximum tolerance (%%) for output validation\n" "\t\t--stats: enable reporting performance statistics\n"); diff --git a/app/test-mldev/test_inference_common.c b/app/test-mldev/test_inference_common.c index bf7e6bbe10..29c18bbc85 100644 --- a/app/test-mldev/test_inference_common.c +++ b/app/test-mldev/test_inference_common.c @@ -20,23 +20,23 @@ #define ML_TEST_READ_TYPE(buffer, type) (*((type *)buffer)) -#define ML_TEST_CHECK_OUTPUT(output, reference, tolerance) \ +#define ML_TEST_CHECK_OUTPUT(output, reference, tolerance) \ (((float)output - (float)reference) <= (((float)reference * tolerance) / 100.0)) -#define ML_OPEN_WRITE_GET_ERR(name, buffer, size, err) \ - do { \ - FILE *fp = fopen(name, "w+"); \ - if (fp == NULL) { \ - ml_err("Unable to create file: %s, error: %s", name, strerror(errno)); \ - err = true; \ - } else { \ - if (fwrite(buffer, 1, size, fp) != size) { \ - ml_err("Error writing output, file: %s, error: %s", name, \ - strerror(errno)); \ - err = true; \ - } \ - fclose(fp); \ - } \ +#define ML_OPEN_WRITE_GET_ERR(name, buffer, size, err) \ + do { \ + FILE *fp = fopen(name, "w+"); \ + if (fp == NULL) { \ + ml_err("Unable to create file: %s, error: %s", name, strerror(errno)); \ + err = true; \ + } else { \ + if (fwrite(buffer, 1, size, fp) != size) { \ + ml_err("Error writing output, file: %s, error: %s", name, \ + strerror(errno)); \ + err = true; \ + } \ + fclose(fp); \ + } \ } while (0) static void @@ -951,7 +951,7 @@ ml_request_finish(struct rte_mempool *mp, void *opaque, void *obj, unsigned int if (t->cmn.opt->debug) { /* dump quantized output buffer */ if (asprintf(&dump_path, "%s.q.%u", t->cmn.opt->filelist[req->fid].output, - obj_idx) == -1) +obj_idx) == -1) return; ML_OPEN_WRITE_GET_ERR(dump_path, req->output, model->out_qsize,
[PATCH v1] ml/cnxk: fix xstat type names in documentation
Fix incorrect type names for xstats in ML cnxk driver documentation. Fixes: 4ff4ab8e1a20 ("ml/cnxk: support extended statistics") Signed-off-by: Srikanth Yalavarthi --- doc/guides/mldevs/cnxk.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/doc/guides/mldevs/cnxk.rst b/doc/guides/mldevs/cnxk.rst index 91e5df095a..1aa9225765 100644 --- a/doc/guides/mldevs/cnxk.rst +++ b/doc/guides/mldevs/cnxk.rst @@ -231,11 +231,11 @@ Total number of extended stats would be equal to 6 x number of models loaded. +---+-+--+ | 3 | Max-HW-Latency | Maximum hardware latency | +---+-+--+ - | 4 | Avg-HW-Latency | Average firmware latency | + | 4 | Avg-FW-Latency | Average firmware latency | +---+-+--+ - | 5 | Avg-HW-Latency | Minimum firmware latency | + | 5 | Min-FW-Latency | Minimum firmware latency | +---+-+--+ - | 6 | Avg-HW-Latency | Maximum firmware latency | + | 6 | Max-FW-Latency | Maximum firmware latency | +---+-+--+ Latency values reported by the PMD through xstats can have units, -- 2.17.1
[PATCH v1 0/3] Add support for 32 I/O per model
This patch series adds support for 32 inputs / outputs per each model. Changes required to enable the required support include: 1. Splitiing model metadata fields into structures. 2. Update model metadata to v2301 which supports 32 I/O. 3. Update ML driver code to support metadata v2301 . Srikanth Yalavarthi (3): ml/cnxk: split metadata fields into sections ml/cnxk: update model metadata to v2301 ml/cnxk: add support for 32 I/O per model drivers/ml/cnxk/cn10k_ml_model.c | 401 +--- drivers/ml/cnxk/cn10k_ml_model.h | 512 +-- drivers/ml/cnxk/cn10k_ml_ops.c | 133 ++-- 3 files changed, 659 insertions(+), 387 deletions(-) -- 2.17.1
[PATCH v1 1/3] ml/cnxk: split metadata fields into sections
Split metadata into header, model sections, weights & bias, input / output and data sections. This is a preparatory step to introduce v2301 of model metadata. Signed-off-by: Srikanth Yalavarthi --- drivers/ml/cnxk/cn10k_ml_model.c | 26 +- drivers/ml/cnxk/cn10k_ml_model.h | 487 --- 2 files changed, 270 insertions(+), 243 deletions(-) diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c index 2ded05c5dc..c0b7b061f5 100644 --- a/drivers/ml/cnxk/cn10k_ml_model.c +++ b/drivers/ml/cnxk/cn10k_ml_model.c @@ -47,42 +47,42 @@ cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size) metadata = (struct cn10k_ml_model_metadata *)buffer; /* Header CRC check */ - if (metadata->metadata_header.header_crc32c != 0) { - header_crc32c = rte_hash_crc( - buffer, sizeof(metadata->metadata_header) - sizeof(uint32_t), 0); + if (metadata->header.header_crc32c != 0) { + header_crc32c = + rte_hash_crc(buffer, sizeof(metadata->header) - sizeof(uint32_t), 0); - if (header_crc32c != metadata->metadata_header.header_crc32c) { + if (header_crc32c != metadata->header.header_crc32c) { plt_err("Invalid model, Header CRC mismatch"); return -EINVAL; } } /* Payload CRC check */ - if (metadata->metadata_header.payload_crc32c != 0) { - payload_crc32c = rte_hash_crc(buffer + sizeof(metadata->metadata_header), - size - sizeof(metadata->metadata_header), 0); + if (metadata->header.payload_crc32c != 0) { + payload_crc32c = rte_hash_crc(buffer + sizeof(metadata->header), + size - sizeof(metadata->header), 0); - if (payload_crc32c != metadata->metadata_header.payload_crc32c) { + if (payload_crc32c != metadata->header.payload_crc32c) { plt_err("Invalid model, Payload CRC mismatch"); return -EINVAL; } } /* Model magic string */ - if (strncmp((char *)metadata->metadata_header.magic, MRVL_ML_MODEL_MAGIC_STRING, 4) != 0) { - plt_err("Invalid model, magic = %s", metadata->metadata_header.magic); + if (strncmp((char *)metadata->header.magic, MRVL_ML_MODEL_MAGIC_STRING, 4) != 0) { + plt_err("Invalid model, magic = %s", metadata->header.magic); return -EINVAL; } /* Target architecture */ - if (metadata->metadata_header.target_architecture != MRVL_ML_MODEL_TARGET_ARCH) { + if (metadata->header.target_architecture != MRVL_ML_MODEL_TARGET_ARCH) { plt_err("Model target architecture (%u) not supported", - metadata->metadata_header.target_architecture); + metadata->header.target_architecture); return -ENOTSUP; } /* Header version */ - rte_memcpy(version, metadata->metadata_header.version, 4 * sizeof(uint8_t)); + rte_memcpy(version, metadata->header.version, 4 * sizeof(uint8_t)); if (version[0] * 1000 + version[1] * 100 < MRVL_ML_MODEL_VERSION) { plt_err("Metadata version = %u.%u.%u.%u (< %u.%u.%u.%u) not supported", version[0], version[1], version[2], version[3], (MRVL_ML_MODEL_VERSION / 1000) % 10, diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h index 1bc748265d..b30ad5a981 100644 --- a/drivers/ml/cnxk/cn10k_ml_model.h +++ b/drivers/ml/cnxk/cn10k_ml_model.h @@ -30,298 +30,325 @@ enum cn10k_ml_model_state { #define MRVL_ML_OUTPUT_NAME_LEN 16 #define MRVL_ML_INPUT_OUTPUT_SIZE 8 -/* Model file metadata structure */ -struct cn10k_ml_model_metadata { - /* Header (256-byte) */ - struct { - /* Magic string ('M', 'R', 'V', 'L') */ - uint8_t magic[4]; +/* Header (256-byte) */ +struct cn10k_ml_model_metadata_header { + /* Magic string ('M', 'R', 'V', 'L') */ + uint8_t magic[4]; - /* Metadata version */ - uint8_t version[4]; + /* Metadata version */ + uint8_t version[4]; - /* Metadata size */ - uint32_t metadata_size; + /* Metadata size */ + uint32_t metadata_size; - /* Unique ID */ - uint8_t uuid[128]; + /* Unique ID */ + uint8_t uuid[128]; - /* Model target architecture -* 0 = Undefined -* 1 = M1K -* 128 = MLIP -* 256 = Experimental -*/ - uint32_t target_architecture; - uint8_t reserved[104]; + /* Model target architecture +* 0 = Undefined +* 1 = M1
[PATCH v1 2/3] ml/cnxk: update model metadata to v2301
Update model metadata to v2301. Revised metadata introduces fields to support up to 32 inputs/outputs per model, scratch relocation and updates to names of existing fields. Update driver files to include changes in names of metadata fields. Signed-off-by: Srikanth Yalavarthi --- drivers/ml/cnxk/cn10k_ml_model.c | 111 --- drivers/ml/cnxk/cn10k_ml_model.h | 36 +++--- drivers/ml/cnxk/cn10k_ml_ops.c | 50 +++--- 3 files changed, 106 insertions(+), 91 deletions(-) diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c index c0b7b061f5..a15df700aa 100644 --- a/drivers/ml/cnxk/cn10k_ml_model.c +++ b/drivers/ml/cnxk/cn10k_ml_model.c @@ -83,11 +83,11 @@ cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size) /* Header version */ rte_memcpy(version, metadata->header.version, 4 * sizeof(uint8_t)); - if (version[0] * 1000 + version[1] * 100 < MRVL_ML_MODEL_VERSION) { + if (version[0] * 1000 + version[1] * 100 != MRVL_ML_MODEL_VERSION_MIN) { plt_err("Metadata version = %u.%u.%u.%u (< %u.%u.%u.%u) not supported", version[0], - version[1], version[2], version[3], (MRVL_ML_MODEL_VERSION / 1000) % 10, - (MRVL_ML_MODEL_VERSION / 100) % 10, (MRVL_ML_MODEL_VERSION / 10) % 10, - MRVL_ML_MODEL_VERSION % 10); + version[1], version[2], version[3], (MRVL_ML_MODEL_VERSION_MIN / 1000) % 10, + (MRVL_ML_MODEL_VERSION_MIN / 100) % 10, + (MRVL_ML_MODEL_VERSION_MIN / 10) % 10, MRVL_ML_MODEL_VERSION_MIN % 10); return -ENOTSUP; } @@ -125,36 +125,36 @@ cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size) } /* Check input count */ - if (metadata->model.num_input > MRVL_ML_INPUT_OUTPUT_SIZE) { + if (metadata->model.num_input > MRVL_ML_NUM_INPUT_OUTPUT_1) { plt_err("Invalid metadata, num_input = %u (> %u)", metadata->model.num_input, - MRVL_ML_INPUT_OUTPUT_SIZE); + MRVL_ML_NUM_INPUT_OUTPUT_1); return -EINVAL; } /* Check output count */ - if (metadata->model.num_output > MRVL_ML_INPUT_OUTPUT_SIZE) { + if (metadata->model.num_output > MRVL_ML_NUM_INPUT_OUTPUT_1) { plt_err("Invalid metadata, num_output = %u (> %u)", metadata->model.num_output, - MRVL_ML_INPUT_OUTPUT_SIZE); + MRVL_ML_NUM_INPUT_OUTPUT_1); return -EINVAL; } /* Inputs */ for (i = 0; i < metadata->model.num_input; i++) { - if (rte_ml_io_type_size_get(cn10k_ml_io_type_map(metadata->input[i].input_type)) <= + if (rte_ml_io_type_size_get(cn10k_ml_io_type_map(metadata->input1[i].input_type)) <= 0) { plt_err("Invalid metadata, input[%u] : input_type = %u", i, - metadata->input[i].input_type); + metadata->input1[i].input_type); return -EINVAL; } if (rte_ml_io_type_size_get( - cn10k_ml_io_type_map(metadata->input[i].model_input_type)) <= 0) { + cn10k_ml_io_type_map(metadata->input1[i].model_input_type)) <= 0) { plt_err("Invalid metadata, input[%u] : model_input_type = %u", i, - metadata->input[i].model_input_type); + metadata->input1[i].model_input_type); return -EINVAL; } - if (metadata->input[i].relocatable != 1) { + if (metadata->input1[i].relocatable != 1) { plt_err("Model not supported, non-relocatable input: %u", i); return -ENOTSUP; } @@ -163,20 +163,20 @@ cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size) /* Outputs */ for (i = 0; i < metadata->model.num_output; i++) { if (rte_ml_io_type_size_get( - cn10k_ml_io_type_map(metadata->output[i].output_type)) <= 0) { + cn10k_ml_io_type_map(metadata->output1[i].output_type)) <= 0) { plt_err("Invalid metadata, output[%u] : output_type = %u", i, - metadata->output[i].output_type); + metadata->output1[i].output_type); return -EINVAL; } if (rte_ml_io_type_size_get( - cn10k_ml_io_type_map(metadata->output[i].model_output_type)) <= 0) { + cn10k_ml_io_type_map(metadata->output1[i].model_output_type)) <= 0) { plt_err("Invalid metadata, output[
[PATCH v1 3/3] ml/cnxk: add support for 32 I/O per model
Added support for 32 inputs and outputs per model. Signed-off-by: Srikanth Yalavarthi --- drivers/ml/cnxk/cn10k_ml_model.c | 374 ++- drivers/ml/cnxk/cn10k_ml_model.h | 5 +- drivers/ml/cnxk/cn10k_ml_ops.c | 125 --- 3 files changed, 367 insertions(+), 137 deletions(-) diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c index a15df700aa..92c47d39ba 100644 --- a/drivers/ml/cnxk/cn10k_ml_model.c +++ b/drivers/ml/cnxk/cn10k_ml_model.c @@ -41,8 +41,9 @@ cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size) struct cn10k_ml_model_metadata *metadata; uint32_t payload_crc32c; uint32_t header_crc32c; - uint8_t version[4]; + uint32_t version; uint8_t i; + uint8_t j; metadata = (struct cn10k_ml_model_metadata *)buffer; @@ -82,10 +83,13 @@ cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size) } /* Header version */ - rte_memcpy(version, metadata->header.version, 4 * sizeof(uint8_t)); - if (version[0] * 1000 + version[1] * 100 != MRVL_ML_MODEL_VERSION_MIN) { - plt_err("Metadata version = %u.%u.%u.%u (< %u.%u.%u.%u) not supported", version[0], - version[1], version[2], version[3], (MRVL_ML_MODEL_VERSION_MIN / 1000) % 10, + version = metadata->header.version[0] * 1000 + metadata->header.version[1] * 100 + + metadata->header.version[2] * 10 + metadata->header.version[3]; + if (version < MRVL_ML_MODEL_VERSION_MIN) { + plt_err("Metadata version = %u.%u.%u.%u (< %u.%u.%u.%u) not supported", + metadata->header.version[0], metadata->header.version[1], + metadata->header.version[2], metadata->header.version[3], + (MRVL_ML_MODEL_VERSION_MIN / 1000) % 10, (MRVL_ML_MODEL_VERSION_MIN / 100) % 10, (MRVL_ML_MODEL_VERSION_MIN / 10) % 10, MRVL_ML_MODEL_VERSION_MIN % 10); return -ENOTSUP; @@ -125,60 +129,119 @@ cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size) } /* Check input count */ - if (metadata->model.num_input > MRVL_ML_NUM_INPUT_OUTPUT_1) { - plt_err("Invalid metadata, num_input = %u (> %u)", metadata->model.num_input, - MRVL_ML_NUM_INPUT_OUTPUT_1); - return -EINVAL; - } - - /* Check output count */ - if (metadata->model.num_output > MRVL_ML_NUM_INPUT_OUTPUT_1) { - plt_err("Invalid metadata, num_output = %u (> %u)", metadata->model.num_output, - MRVL_ML_NUM_INPUT_OUTPUT_1); - return -EINVAL; - } - - /* Inputs */ - for (i = 0; i < metadata->model.num_input; i++) { - if (rte_ml_io_type_size_get(cn10k_ml_io_type_map(metadata->input1[i].input_type)) <= - 0) { - plt_err("Invalid metadata, input[%u] : input_type = %u", i, - metadata->input1[i].input_type); + if (version < 2301) { + if (metadata->model.num_input > MRVL_ML_NUM_INPUT_OUTPUT_1) { + plt_err("Invalid metadata, num_input = %u (> %u)", + metadata->model.num_input, MRVL_ML_NUM_INPUT_OUTPUT_1); return -EINVAL; } - if (rte_ml_io_type_size_get( - cn10k_ml_io_type_map(metadata->input1[i].model_input_type)) <= 0) { - plt_err("Invalid metadata, input[%u] : model_input_type = %u", i, - metadata->input1[i].model_input_type); + /* Check output count */ + if (metadata->model.num_output > MRVL_ML_NUM_INPUT_OUTPUT_1) { + plt_err("Invalid metadata, num_output = %u (> %u)", + metadata->model.num_output, MRVL_ML_NUM_INPUT_OUTPUT_1); return -EINVAL; } - - if (metadata->input1[i].relocatable != 1) { - plt_err("Model not supported, non-relocatable input: %u", i); - return -ENOTSUP; + } else { + if (metadata->model.num_input > MRVL_ML_NUM_INPUT_OUTPUT) { + plt_err("Invalid metadata, num_input = %u (> %u)", + metadata->model.num_input, MRVL_ML_NUM_INPUT_OUTPUT); + return -EINVAL; } - } - /* Outputs */ - for (i = 0; i < metadata->model.num_output; i++) { - if (rte_ml_io_type_size_get( - cn10k_ml_io_type_map(metadata->output1[i].output_type)) <= 0) { - plt_err("Invalid metadata, output[%u] : output_type = %u", i, - metadata->output1[i].output_t
[PATCH v1 0/5] Implementation of revised ML xstats spec
This series of patches introduces revised xstats specification for ML device. The revised xstats spec is based on eventdev xstats and supports DEVICE and MODEL modes to get xstats. This enables retrieving xstats for device and each model separately. Srikanth Yalavarthi (5): mldev: remove xstats APIs from library mldev: introduce revised xstats mldev: implement xstats library functions app/mldev: enable reporting xstats ml/cnxk: implement xstats driver functions app/test-mldev/meson.build | 1 + app/test-mldev/ml_common.h | 11 + app/test-mldev/ml_options.c| 5 +- app/test-mldev/test_common.h | 3 + app/test-mldev/test_inference_common.c | 113 - app/test-mldev/test_inference_common.h | 1 - app/test-mldev/test_inference_interleave.c | 6 +- app/test-mldev/test_inference_ordered.c| 5 +- app/test-mldev/test_model_ops.c| 3 + app/test-mldev/test_stats.c| 129 + app/test-mldev/test_stats.h| 13 + doc/guides/mldevs/cnxk.rst | 30 +- drivers/ml/cnxk/cn10k_ml_dev.h | 96 +++- drivers/ml/cnxk/cn10k_ml_model.h | 21 - drivers/ml/cnxk/cn10k_ml_ops.c | 520 +++-- lib/mldev/rte_mldev.c | 15 +- lib/mldev/rte_mldev.h | 97 ++-- lib/mldev/rte_mldev_core.h | 28 +- 18 files changed, 757 insertions(+), 340 deletions(-) create mode 100644 app/test-mldev/test_stats.c create mode 100644 app/test-mldev/test_stats.h -- 2.17.1
[PATCH v1 1/5] mldev: remove xstats APIs from library
This change is a preparatoy step for revised xstats APIs. Revised xstats APIs support reporting device and per model stats, which is based on eventdev xstats. Removed xstats APIs from spec and library implementation. Disabled reporting xstats in test application and disabled xstats functions in drivers. Renamed stats_get function to throughput_get. This change is needed as the revised APIs are not backward compatible with the current xstats. Signed-off-by: Srikanth Yalavarthi --- app/test-mldev/test_inference_common.c | 55 + app/test-mldev/test_inference_common.h | 2 +- app/test-mldev/test_inference_interleave.c | 2 +- app/test-mldev/test_inference_ordered.c| 2 +- drivers/ml/cnxk/cn10k_ml_ops.c | 10 +-- lib/mldev/rte_mldev.c | 88 lib/mldev/rte_mldev.h | 90 - lib/mldev/rte_mldev_core.h | 93 -- lib/mldev/version.map | 4 - 9 files changed, 7 insertions(+), 339 deletions(-) diff --git a/app/test-mldev/test_inference_common.c b/app/test-mldev/test_inference_common.c index af831fc1bf..1e16608582 100644 --- a/app/test-mldev/test_inference_common.c +++ b/app/test-mldev/test_inference_common.c @@ -1029,7 +1029,7 @@ ml_inference_launch_cores(struct ml_test *test, struct ml_options *opt, uint16_t } int -ml_inference_stats_get(struct ml_test *test, struct ml_options *opt) +ml_inference_throughput_get(struct ml_test *test, struct ml_options *opt) { struct test_inference *t = ml_test_priv(test); uint64_t total_cycles = 0; @@ -1038,56 +1038,10 @@ ml_inference_stats_get(struct ml_test *test, struct ml_options *opt) uint64_t avg_e2e; uint32_t qp_id; uint64_t freq; - int ret; - int i; if (!opt->stats) return 0; - /* get xstats size */ - t->xstats_size = rte_ml_dev_xstats_names_get(opt->dev_id, NULL, 0); - if (t->xstats_size >= 0) { - /* allocate for xstats_map and values */ - t->xstats_map = rte_malloc( - "ml_xstats_map", t->xstats_size * sizeof(struct rte_ml_dev_xstats_map), 0); - if (t->xstats_map == NULL) { - ret = -ENOMEM; - goto error; - } - - t->xstats_values = - rte_malloc("ml_xstats_values", t->xstats_size * sizeof(uint64_t), 0); - if (t->xstats_values == NULL) { - ret = -ENOMEM; - goto error; - } - - ret = rte_ml_dev_xstats_names_get(opt->dev_id, t->xstats_map, t->xstats_size); - if (ret != t->xstats_size) { - printf("Unable to get xstats names, ret = %d\n", ret); - ret = -1; - goto error; - } - - for (i = 0; i < t->xstats_size; i++) - rte_ml_dev_xstats_get(opt->dev_id, &t->xstats_map[i].id, - &t->xstats_values[i], 1); - } - - /* print xstats*/ - printf("\n"); - print_line(80); - printf(" ML Device Extended Statistics\n"); - print_line(80); - for (i = 0; i < t->xstats_size; i++) - printf(" %-64s = %" PRIu64 "\n", t->xstats_map[i].name, t->xstats_values[i]); - print_line(80); - - /* release buffers */ - rte_free(t->xstats_map); - - rte_free(t->xstats_values); - /* print end-to-end stats */ freq = rte_get_tsc_hz(); for (qp_id = 0; qp_id < RTE_MAX_LCORE; qp_id++) @@ -1121,11 +1075,4 @@ ml_inference_stats_get(struct ml_test *test, struct ml_options *opt) print_line(80); return 0; - -error: - rte_free(t->xstats_map); - - rte_free(t->xstats_values); - - return ret; } diff --git a/app/test-mldev/test_inference_common.h b/app/test-mldev/test_inference_common.h index e79344cea4..0a9b930788 100644 --- a/app/test-mldev/test_inference_common.h +++ b/app/test-mldev/test_inference_common.h @@ -70,6 +70,6 @@ void ml_inference_mem_destroy(struct ml_test *test, struct ml_options *opt); int ml_inference_result(struct ml_test *test, struct ml_options *opt, uint16_t fid); int ml_inference_launch_cores(struct ml_test *test, struct ml_options *opt, uint16_t start_fid, uint16_t end_fid); -int ml_inference_stats_get(struct ml_test *test, struct ml_options *opt); +int ml_inference_throughput_get(struct ml_test *test, struct ml_options *opt); #endif /* TEST_INFERENCE_COMMON_H */ diff --git a/app/test-mldev/test_inference_interleave.c b/app/test-mldev/test_inference_interleave.c index bd2c286737..23b8efe4f0 100644 --- a/app/test-mldev/test_inference_interleave.c +++ b/app/test-mldev/test_inference_interleave.c @@ -58,7 +58,7 @@ test_inference_interleave_
[PATCH v1 2/5] mldev: introduce revised xstats
Introduce revised xstats APIs to support reporting device and per-model xstats. Stat type is selected through mode parameter. Support modes include device and model. Signed-off-by: Srikanth Yalavarthi --- lib/mldev/rte_mldev.h | 113 ++ 1 file changed, 113 insertions(+) diff --git a/lib/mldev/rte_mldev.h b/lib/mldev/rte_mldev.h index 1e967a7c2a..222ecbdbe1 100644 --- a/lib/mldev/rte_mldev.h +++ b/lib/mldev/rte_mldev.h @@ -593,6 +593,16 @@ __rte_experimental void rte_ml_dev_stats_reset(int16_t dev_id); +/** + * Selects the component of the mldev to retrieve statistics from. + */ +enum rte_ml_dev_xstats_mode { + RTE_ML_DEV_XSTATS_DEVICE, + /**< Device xstats */ + RTE_ML_DEV_XSTATS_MODEL, + /**< Model xstats */ +}; + /** * A name-key lookup element for extended statistics. * @@ -605,6 +615,109 @@ struct rte_ml_dev_xstats_map { /**< xstat name */ }; +/** + * Retrieve names of extended statistics of an ML device. + * + * @param dev_id + * The identifier of the device. + * @param mode + * Mode of statistics to retrieve. Choices include the device statistics and model statistics. + * @param model_id + * Used to specify the model number in model mode, and is ignored in device mode. + * @param[out] xstats_map + * Block of memory to insert names and ids into. Must be at least size in capacity. If set to + * NULL, function returns required capacity. The id values returned can be passed to + * *rte_ml_dev_xstats_get* to select statistics. + * @param size + * Capacity of xstats_names (number of xstats_map). + * @return + * - Positive value lower or equal to size: success. The return value is the number of entries + * filled in the stats table. + * - Positive value higher than size: error, the given statistics table is too small. The return + * value corresponds to the size that should be given to succeed. The entries in the table are not + * valid and shall not be used by the caller. + * - Negative value on error: + *-ENODEV for invalid *dev_id*. + *-EINVAL for invalid mode, model parameters. + *-ENOTSUP if the device doesn't support this function. + */ +__rte_experimental +int +rte_ml_dev_xstats_names_get(int16_t dev_id, enum rte_ml_dev_xstats_mode mode, int32_t model_id, + struct rte_ml_dev_xstats_map *xstats_map, uint32_t size); + +/** + * Retrieve the value of a single stat by requesting it by name. + * + * @param dev_id + * The identifier of the device. + * @param name + * Name of stat name to retrieve. + * @param[out] stat_id + * If non-NULL, the numerical id of the stat will be returned, so that further requests for the + * stat can be got using rte_ml_dev_xstats_get, which will be faster as it doesn't need to scan a + * list of names for the stat. If the stat cannot be found, the id returned will be (unsigned)-1. + * @param[out] value + * Value of the stat to be returned. + * @return + * - Zero: No error. + * - Negative value: -EINVAL if stat not found, -ENOTSUP if not supported. + */ +__rte_experimental +int +rte_ml_dev_xstats_by_name_get(int16_t dev_id, const char *name, uint16_t *stat_id, uint64_t *value); + +/** + * Retrieve extended statistics of an ML device. + * + * @param dev_id + * The identifier of the device. + * @param mode + * Mode of statistics to retrieve. Choices include the device statistics and model statistics. + * @param model_id + * Used to specify the model id in model mode, and is ignored in device mode. + * @param stat_ids + * ID numbers of the stats to get. The ids can be got from the stat position in the stat list from + * rte_ml_dev_xstats_names_get(), or by using rte_ml_dev_xstats_by_name_get(). + * @param[out] values + * Values for each stats request by ID. + * @param nb_ids + * Number of stats requested. + * @return + * - Positive value: number of stat entries filled into the values array + * - Negative value on error: + *-ENODEV for invalid *dev_id*. + *-EINVAL for invalid mode, model id or stat id parameters. + *-ENOTSUP if the device doesn't support this function. + */ +__rte_experimental +int +rte_ml_dev_xstats_get(int16_t dev_id, enum rte_ml_dev_xstats_mode mode, int32_t model_id, + const uint16_t stat_ids[], uint64_t values[], uint16_t nb_ids); + +/** + * Reset the values of the xstats of the selected component in the device. + * + * @param dev_id + * The identifier of the device. + * @param mode + * Mode of the statistics to reset. Choose from device or model. + * @param model_id + * Model stats to reset. 0 and positive values select models, while -1 indicates all models. + * @param stat_ids + * Selects specific statistics to be reset. When NULL, all statistics selected by *mode* will be + * reset. If non-NULL, must point to array of at least *nb_ids* size. + * @param nb_ids + * The number of ids available from the *ids* arra
[PATCH v1 3/5] mldev: implement xstats library functions
Implemented xstats library functions as per revised spec. Signed-off-by: Srikanth Yalavarthi --- lib/mldev/rte_mldev.c | 91 +++ lib/mldev/rte_mldev_core.h | 107 + lib/mldev/version.map | 4 ++ 3 files changed, 202 insertions(+) diff --git a/lib/mldev/rte_mldev.c b/lib/mldev/rte_mldev.c index 72d4d7a165..0d8ccd3212 100644 --- a/lib/mldev/rte_mldev.c +++ b/lib/mldev/rte_mldev.c @@ -438,6 +438,97 @@ rte_ml_dev_stats_reset(int16_t dev_id) (*dev->dev_ops->dev_stats_reset)(dev); } +int +rte_ml_dev_xstats_names_get(int16_t dev_id, enum rte_ml_dev_xstats_mode mode, int32_t model_id, + struct rte_ml_dev_xstats_map *xstats_map, uint32_t size) +{ + struct rte_ml_dev *dev; + + if (!rte_ml_dev_is_valid_dev(dev_id)) { + RTE_MLDEV_LOG(ERR, "Invalid dev_id = %d\n", dev_id); + return -EINVAL; + } + + dev = rte_ml_dev_pmd_get_dev(dev_id); + if (*dev->dev_ops->dev_xstats_names_get == NULL) + return -ENOTSUP; + + return (*dev->dev_ops->dev_xstats_names_get)(dev, mode, model_id, xstats_map, size); +} + +int +rte_ml_dev_xstats_by_name_get(int16_t dev_id, const char *name, uint16_t *stat_id, uint64_t *value) +{ + struct rte_ml_dev *dev; + + if (!rte_ml_dev_is_valid_dev(dev_id)) { + RTE_MLDEV_LOG(ERR, "Invalid dev_id = %d\n", dev_id); + return -EINVAL; + } + + dev = rte_ml_dev_pmd_get_dev(dev_id); + if (*dev->dev_ops->dev_xstats_by_name_get == NULL) + return -ENOTSUP; + + if (name == NULL) { + RTE_MLDEV_LOG(ERR, "Dev %d, name cannot be NULL\n", dev_id); + return -EINVAL; + } + + if (value == NULL) { + RTE_MLDEV_LOG(ERR, "Dev %d, value cannot be NULL\n", dev_id); + return -EINVAL; + } + + return (*dev->dev_ops->dev_xstats_by_name_get)(dev, name, stat_id, value); +} + +int +rte_ml_dev_xstats_get(int16_t dev_id, enum rte_ml_dev_xstats_mode mode, int32_t model_id, + const uint16_t stat_ids[], uint64_t values[], uint16_t nb_ids) +{ + struct rte_ml_dev *dev; + + if (!rte_ml_dev_is_valid_dev(dev_id)) { + RTE_MLDEV_LOG(ERR, "Invalid dev_id = %d\n", dev_id); + return -EINVAL; + } + + dev = rte_ml_dev_pmd_get_dev(dev_id); + if (*dev->dev_ops->dev_xstats_get == NULL) + return -ENOTSUP; + + if (stat_ids == NULL) { + RTE_MLDEV_LOG(ERR, "Dev %d, stat_ids cannot be NULL\n", dev_id); + return -EINVAL; + } + + if (values == NULL) { + RTE_MLDEV_LOG(ERR, "Dev %d, values cannot be NULL\n", dev_id); + return -EINVAL; + } + + return (*dev->dev_ops->dev_xstats_get)(dev, mode, model_id, stat_ids, values, nb_ids); +} + +int +rte_ml_dev_xstats_reset(int16_t dev_id, enum rte_ml_dev_xstats_mode mode, int32_t model_id, + const uint16_t stat_ids[], uint16_t nb_ids) +{ + struct rte_ml_dev *dev; + + if (!rte_ml_dev_is_valid_dev(dev_id)) { + RTE_MLDEV_LOG(ERR, "Invalid dev_id = %d\n", dev_id); + return -EINVAL; + } + + dev = rte_ml_dev_pmd_get_dev(dev_id); + if (*dev->dev_ops->dev_xstats_reset == NULL) + return -ENOTSUP; + + return (*dev->dev_ops->dev_xstats_reset)(dev, mode, model_id, stat_ids, nb_ids); +} + int rte_ml_dev_dump(int16_t dev_id, FILE *fd) { diff --git a/lib/mldev/rte_mldev_core.h b/lib/mldev/rte_mldev_core.h index 926a652397..78b8b7633d 100644 --- a/lib/mldev/rte_mldev_core.h +++ b/lib/mldev/rte_mldev_core.h @@ -236,6 +236,101 @@ typedef int (*mldev_stats_get_t)(struct rte_ml_dev *dev, struct rte_ml_dev_stats */ typedef void (*mldev_stats_reset_t)(struct rte_ml_dev *dev); +/** + * @internal + * + * Function used to get names of extended stats. + * + * @param dev + * ML device pointer. + * @param mode + * Mode of stats to retrieve. + * @param model_id + * Used to specify model id in model mode. Ignored in device mode. + * @param xstats_map + * Array to insert id and names into. + * @param size + * Size of xstats_map array. + * + * @return + * - >= 0 and <= size on success. + * - > size, error. Returns the size of xstats_map array required. + * - < 0, error code on failure. + */ +typedef int (*mldev_xstats_names_get_t)(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode, + int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map, + uint32_t size); + +/** + * @internal + * + * Function used to get a single extended stat by name. + * + * @param dev + * ML device pointer. + * @param name + * Name of the stat to retrieve. + * @param stat_id + * ID of the stat to be returned. + * @param
[PATCH v1 4/5] app/mldev: enable reporting xstats
Enabled reporting xstats in ML test application. Enabled stats option for model_ops test case. Added common files for xstats and throughput functions. Signed-off-by: Srikanth Yalavarthi --- app/test-mldev/meson.build | 1 + app/test-mldev/ml_common.h | 11 ++ app/test-mldev/ml_options.c| 5 +- app/test-mldev/test_common.h | 3 + app/test-mldev/test_inference_common.c | 60 -- app/test-mldev/test_inference_common.h | 1 - app/test-mldev/test_inference_interleave.c | 6 +- app/test-mldev/test_inference_ordered.c| 5 +- app/test-mldev/test_model_ops.c| 3 + app/test-mldev/test_stats.c| 129 + app/test-mldev/test_stats.h| 13 +++ 11 files changed, 172 insertions(+), 65 deletions(-) create mode 100644 app/test-mldev/test_stats.c create mode 100644 app/test-mldev/test_stats.h diff --git a/app/test-mldev/meson.build b/app/test-mldev/meson.build index 15db534dc2..18e28f2713 100644 --- a/app/test-mldev/meson.build +++ b/app/test-mldev/meson.build @@ -19,6 +19,7 @@ sources = files( 'test_inference_common.c', 'test_inference_ordered.c', 'test_inference_interleave.c', +'test_stats.c' ) deps += ['mldev', 'hash'] diff --git a/app/test-mldev/ml_common.h b/app/test-mldev/ml_common.h index 624a5aff50..8d7cc9eeb7 100644 --- a/app/test-mldev/ml_common.h +++ b/app/test-mldev/ml_common.h @@ -26,4 +26,15 @@ #define ml_dump_end printf("\b\t}\n\n") +static inline void +ml_print_line(uint16_t len) +{ + uint16_t i; + + for (i = 0; i < len; i++) + printf("-"); + + printf("\n"); +} + #endif /* ML_COMMON_H */ diff --git a/app/test-mldev/ml_options.c b/app/test-mldev/ml_options.c index 2efcc3532c..1daa229748 100644 --- a/app/test-mldev/ml_options.c +++ b/app/test-mldev/ml_options.c @@ -205,7 +205,8 @@ ml_dump_test_options(const char *testname) } if (strcmp(testname, "model_ops") == 0) { - printf("\t\t--models : comma separated list of models\n"); + printf("\t\t--models : comma separated list of models\n" + "\t\t--stats: enable reporting device statistics\n"); printf("\n"); } @@ -218,7 +219,7 @@ ml_dump_test_options(const char *testname) "\t\t--queue_size : size fo queue-pair\n" "\t\t--batches : number of batches of input\n" "\t\t--tolerance: maximum tolerance (%%) for output validation\n" - "\t\t--stats: enable reporting performance statistics\n"); + "\t\t--stats: enable reporting device and model statistics\n"); printf("\n"); } } diff --git a/app/test-mldev/test_common.h b/app/test-mldev/test_common.h index a7b2ea652a..def108d5b2 100644 --- a/app/test-mldev/test_common.h +++ b/app/test-mldev/test_common.h @@ -14,6 +14,9 @@ struct test_common { struct ml_options *opt; enum ml_test_result result; struct rte_ml_dev_info dev_info; + struct rte_ml_dev_xstats_map *xstats_map; + uint64_t *xstats_values; + int xstats_size; }; bool ml_test_cap_check(struct ml_options *opt); diff --git a/app/test-mldev/test_inference_common.c b/app/test-mldev/test_inference_common.c index 1e16608582..469ed35f6c 100644 --- a/app/test-mldev/test_inference_common.c +++ b/app/test-mldev/test_inference_common.c @@ -39,17 +39,6 @@ } \ } while (0) -static void -print_line(uint16_t len) -{ - uint16_t i; - - for (i = 0; i < len; i++) - printf("-"); - - printf("\n"); -} - /* Enqueue inference requests with burst size equal to 1 */ static int ml_enqueue_single(void *arg) @@ -1027,52 +1016,3 @@ ml_inference_launch_cores(struct ml_test *test, struct ml_options *opt, uint16_t return 0; } - -int -ml_inference_throughput_get(struct ml_test *test, struct ml_options *opt) -{ - struct test_inference *t = ml_test_priv(test); - uint64_t total_cycles = 0; - uint32_t nb_filelist; - uint64_t throughput; - uint64_t avg_e2e; - uint32_t qp_id; - uint64_t freq; - - if (!opt->stats) - return 0; - - /* print end-to-end stats */ - freq = rte_get_tsc_hz(); - for (qp_id = 0; qp_id < RTE_MAX_LCORE; qp_id++) - total_cycles += t->args[qp_id].end_cycles - t->args[qp_id].start_cycles; - avg_e2e = total_cycles / opt->repetitions; - - if (freq == 0) { - avg_e2e = total_cycles / opt->repetitions; - printf(" %-64s = %" PRIu64 "\n", "Average End-to-End Latency (cycles)", avg_e2e); - } else { - avg_e2e =
[PATCH v1 5/5] ml/cnxk: implement xstats driver functions
Added support for revised xstats APIs in cnxk ML driver. Signed-off-by: Srikanth Yalavarthi --- doc/guides/mldevs/cnxk.rst | 30 +- drivers/ml/cnxk/cn10k_ml_dev.h | 96 +- drivers/ml/cnxk/cn10k_ml_model.h | 21 -- drivers/ml/cnxk/cn10k_ml_ops.c | 530 ++- 4 files changed, 502 insertions(+), 175 deletions(-) diff --git a/doc/guides/mldevs/cnxk.rst b/doc/guides/mldevs/cnxk.rst index 91e5df095a..2a339451fd 100644 --- a/doc/guides/mldevs/cnxk.rst +++ b/doc/guides/mldevs/cnxk.rst @@ -213,14 +213,32 @@ Debugging Options Extended stats -- -Marvell cnxk ML PMD supports reporting the inference latencies -through extended statistics. -The PMD supports the below list of 6 extended stats types per each model. -Total number of extended stats would be equal to 6 x number of models loaded. +Marvell cnxk ML PMD supports reporting the device and model extended statistics. -.. _table_octeon_cnxk_ml_xstats_names: +PMD supports the below list of 4 device extended stats. -.. table:: OCTEON cnxk ML PMD xstats names +.. _table_octeon_cnxk_ml_device_xstats_names: + +.. table:: OCTEON cnxk ML PMD device xstats names + + +---+-+--+ + | # | Type| Description | + +===+=+==+ + | 1 | nb_models_loaded| Number of models loaded | + +---+-+--+ + | 2 | nb_models_unloaded | Number of models unloaded| + +---+-+--+ + | 3 | nb_models_started | Number of models started | + +---+-+--+ + | 4 | nb_models_stopped | Number of models stopped | + +---+-+--+ + + +PMD supports the below list of 6 extended stats types per each model. + +.. _table_octeon_cnxk_ml_model_xstats_names: + +.. table:: OCTEON cnxk ML PMD model xstats names +---+-+--+ | # | Type| Description | diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h index b4e46899c0..5a8c8206b2 100644 --- a/drivers/ml/cnxk/cn10k_ml_dev.h +++ b/drivers/ml/cnxk/cn10k_ml_dev.h @@ -380,6 +380,89 @@ struct cn10k_ml_fw { struct cn10k_ml_req *req; }; +/* Extended stats types enum */ +enum cn10k_ml_xstats_type { + /* Number of models loaded */ + nb_models_loaded, + + /* Number of models unloaded */ + nb_models_unloaded, + + /* Number of models started */ + nb_models_started, + + /* Number of models stopped */ + nb_models_stopped, + + /* Average inference hardware latency */ + avg_hw_latency, + + /* Minimum hardware latency */ + min_hw_latency, + + /* Maximum hardware latency */ + max_hw_latency, + + /* Average firmware latency */ + avg_fw_latency, + + /* Minimum firmware latency */ + min_fw_latency, + + /* Maximum firmware latency */ + max_fw_latency, +}; + +/* Extended stats function type enum. */ +enum cn10k_ml_xstats_fn_type { + /* Device function */ + CN10K_ML_XSTATS_FN_DEVICE, + + /* Model function */ + CN10K_ML_XSTATS_FN_MODEL, +}; + +/* Function pointer to get xstats for a type */ +typedef uint64_t (*cn10k_ml_xstats_fn)(struct rte_ml_dev *dev, uint16_t obj_idx, + enum cn10k_ml_xstats_type stat); + +/* Extended stats entry structure */ +struct cn10k_ml_xstats_entry { + /* Name-ID map */ + struct rte_ml_dev_xstats_map map; + + /* xstats mode, device or model */ + enum rte_ml_dev_xstats_mode mode; + + /* Type of xstats */ + enum cn10k_ml_xstats_type type; + + /* xstats function */ + enum cn10k_ml_xstats_fn_type fn_id; + + /* Object ID, model ID for model stat type */ + uint16_t obj_idx; + + /* Allowed to reset the stat */ + uint8_t reset_allowed; + + /* An offset to be taken away to emulate resets */ + uint64_t reset_value; +}; + +/* Extended stats data */ +struct cn10k_ml_xstats { + /* Pointer to xstats entries */ + struct cn10k_ml_xstats_entry *entries; + + /* Store num stats and offset of the stats for each model */ + uint16_t count_per_model[ML_CN10K_MAX_MODELS]; + uint16_t offset_for_model[ML_CN10K_MAX_MODELS]; + uint16_t count_mode_device; + uint16_t count_mode_model; + uint16_t count; +}; + /* Device private data */ struct cn10k_ml_dev { /* Device ROC */ @@ -397,8 +480,17 @@ struct cn10k_ml_dev { /* Number of model
RE: [PATCH v5] enhance NUMA affinity heuristic
> -Original Message- > From: Thomas Monjalon > Sent: 2023年4月21日 16:13 > To: You, KaisenX > Cc: dev@dpdk.org; Zhou, YidingX ; > david.march...@redhat.com; Matz, Olivier ; > ferruh.yi...@amd.com; zhou...@loongson.cn; sta...@dpdk.org; > Richardson, Bruce ; jer...@marvell.com; > Burakov, Anatoly > Subject: Re: [PATCH v5] enhance NUMA affinity heuristic > > 21/04/2023 04:34, You, KaisenX: > > From: Thomas Monjalon > > > 13/04/2023 02:56, You, KaisenX: > > > > From: You, KaisenX > > > > > From: Thomas Monjalon > > > > > > > > > > > > I'm not comfortable with this patch. > > > > > > > > > > > > First, there is no comment in the code which helps to > > > > > > understand the > > > logic. > > > > > > Second, I'm afraid changing the value of the per-core variable > > > > > > _socket_id may have an impact on some applications. > > > > > > > > > > Hi Thomas, I'm sorry to bother you again, but we can't think of a > > > > better solution for now, would you please give me some suggestion, > > > > and > > > then I will modify it accordingly. > > > > > > You need to better explain the logic both in the commit message and > > > in code comments. > > > When it will be done, it will be easier to have a discussion with > > > other maintainers and community experts. > > > Thank you > > > > > Thank you for your reply, I'll explain my patch in more detail next. > > > > When a DPDK application is started on only one numa node, > > What do you mean by started on only one node? When the dpdk application is started with the startup parameter "-l 40-59" (this range is on the same node as the system cpu processor).Only memory is allocated for this node when the process is initialized. > > > memory is allocated for only one socket. > > When interrupt threads use memory, memory may not be found on the > > socket where the interrupt thread is currently located, > > Why interrupt thread is on a different socket? The above only allocates memory on node1, but the interrupt thread is created on node0. Interrupt threads are created by rte_ctrl_thread_create() ,rte_ctrl_thread_create()' does NOT run on main lcore, it can run on any core except data plane cores. So interrupt thread can run on any core. > > and memory has to be reallocated on the hugepage, this operation can > > lead to performance degradation. > > > > So my modification is in the function malloc_get_numa_socket to make > > sure that the first socket with memory can be returned. > > > > If you can accept my explanation and modification, I will send the V6 > > version to improve the commit message and code comments. > > > > > > > Thank you for your reply. > > > > > First, about comments, I can submit a new patch to add comments > > > > > to help understand. > > > > > Second, if you do not change the value of the per-core variable_ > > > > > socket_ id, /lib/eal/common/malloc_heap.c > > > > > malloc_get_numa_socket(void) > > > > > { > > > > > const struct internal_config *conf = > eal_get_internal_configuration(); > > > > > unsigned int socket_id = rte_socket_id(); // The return > > > > > value of > > > > > "rte_socket_id()" is 1 > > > > > unsigned int idx; > > > > > > > > > > if (socket_id != (unsigned int)SOCKET_ID_ANY) > > > > > return socket_id;//so return here > > > > > > > > > > This will cause return here, This function returns the socket_id > > > > > of unallocated memory. > > > > > > > > > > If you have a better solution, I can modify it. > > >