Added user guide for Marvell cnxk ML driver for Marvell Octeon cnxk Soc family. Added details about device initialization, debug options and runtime device args supported by the driver.
Signed-off-by: Srikanth Yalavarthi <syalavar...@marvell.com> --- MAINTAINERS | 1 + doc/guides/index.rst | 1 + doc/guides/mldevs/cnxk.rst | 238 ++++++++++++++++++++++++++++++++++++ doc/guides/mldevs/index.rst | 14 +++ 4 files changed, 254 insertions(+) create mode 100644 doc/guides/mldevs/cnxk.rst create mode 100644 doc/guides/mldevs/index.rst diff --git a/MAINTAINERS b/MAINTAINERS index 8e9d6dc946..65153948d2 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1442,6 +1442,7 @@ M: Srikanth Yalavarthi <syalavar...@marvell.com> F: drivers/common/cnxk/hw/ml.h F: drivers/common/cnxk/roc_ml* F: drivers/ml/cnxk/ +F: doc/guides/mldevs/cnxk.rst Packet processing diff --git a/doc/guides/index.rst b/doc/guides/index.rst index 5eb5bd9c9a..0bd729530a 100644 --- a/doc/guides/index.rst +++ b/doc/guides/index.rst @@ -26,6 +26,7 @@ DPDK documentation eventdevs/index rawdevs/index mempool/index + mldevs/index platform/index contributing/index rel_notes/index diff --git a/doc/guides/mldevs/cnxk.rst b/doc/guides/mldevs/cnxk.rst new file mode 100644 index 0000000000..da40336299 --- /dev/null +++ b/doc/guides/mldevs/cnxk.rst @@ -0,0 +1,238 @@ +.. SPDX-License-Identifier: BSD-3-Clause + Copyright (c) 2022 Marvell. + +Marvell cnxk Machine Learning Poll Mode Driver +============================================== + +The cnxk ML poll mode driver provides support for offloading Machine +Learning inference operations to Machine Learning accelerator units +on the **Marvell OCTEON cnxk** SoC family. + +The cnxk ML PMD code is organized into multiple files with all file names +starting with cn10k, providing support for CN106XX and CN106XXS. + +More information about OCTEON cnxk SoCs may be obtained from `<https://www.marvell.com>`_ + +Supported OCTEON cnxk SoCs +-------------------------- + +- CN106XX +- CN106XXS + +Features +-------- + +The OCTEON cnxk ML PMD provides support for the following set of operations: + +Slow-path device and ML model handling: + +* ``Device probing, configuration and close`` +* ``Device start / stop`` +* ``Model loading and unloading`` +* ``Model start / stop`` +* ``Data quantization and dequantization`` + +Fast-path Inference: + +* ``Inference execution`` +* ``Error handling`` + + +Installation +------------ + +The OCTEON cnxk ML PMD may be compiled natively on an OCTEON cnxk platform +or cross-compiled on an x86 platform. + +Refer to :doc:`../platform/cnxk` for instructions to build your DPDK +application. + + +Initialization +-------------- + +``CN10K Initialization`` + +List the ML PF devices available on cn10k platform: + +.. code-block:: console + + lspci -d:a092 + +``a092`` is the ML device PF id. You should see output similar to: + +.. code-block:: console + + 0000:00:10.0 System peripheral: Cavium, Inc. Device a092 + +Bind the ML PF device to the vfio_pci driver: + +.. code-block:: console + + cd <dpdk directory> + ./usertools/dpdk-devbind.py -u 0000:00:10.0 + ./usertools/dpdk-devbind.py -b vfio-pci 0000:00:10.0 + +Runtime Config Options +---------------------- + +- ``Firmware file path`` (default ``/lib/firmware/mlip-fw.bin``) + + Path to the firmware binary to be loaded during device configuration. + The ``fw_path`` ``devargs`` parameter can be used by the user to load + ML firmware from a custom path. + + For example:: + + -a 0000:00:10.0,fw_path="/home/user/ml_fw.bin" + + With the above configuration, driver loads the firmware from the path + "/home/user/ml_fw.bin". + +- ``Enable DPE warnings`` (default ``1``) + + ML firmware can be configured during load to handle the DPE errors reported + by ML inference engine. When enabled, firmware would mask the DPE non-fatal + hardware errors as warnings. The parameter ``enable_dpe_warnings`` ``devargs`` + is used fo this configuration. + + For example:: + + -a 0000:00:10.0,enable_dpe_warnings=0 + + With the above configuration, DPE non-fatal errors reported by HW are + considered as errors. + + +- ``Model data caching`` (default ``1``) + + Enable caching model data on ML ACC cores. Enabling this option executes a + dummy inference request in synchronous mode during model start stage. Caching + of model data improves the inferencing throughput / latency for the model. + The parameter ``cache_model_data`` ``devargs`` is used to enable data caching. + + For example:: + + -a 0000:00:10.0,cache_model_data=0 + + With the above configuration, model data caching is disabled. + + +- ``OCM allocation mode`` (default ``lowest``) + + Option to specify the method to be used while allocating OCM memory for a + model during model start. Two modes are supported by the driver. The + parameter ``ocm_alloc_mode`` ``devargs`` is used to select the OCM + allocation mode. + + ``lowest`` - Allocate OCM for the model from first available free slot. Search + for the free slot is done starting from the lowest tile ID and lowest page ID. + ``largest`` - Allocate OCM for the model from the slot with largest amount of + free space. + + For example:: + + -a 0000:00:10.0,ocm_alloc_mode=lowest + + With the above configuration, OCM allocation fo the model would be done from + the first available free slot / from the lowest possible tile ID. + + +- ``Enable hardware queue lock`` (default ``0``) + + Option to select the job request enqueue function to used to queue the requests + to hardware queue. The parameter ``hw_queue_lock`` ``devargs`` is used to select + the enqueue function. + + ``0`` - Disable (default), use lock free version of hardware enqueue function + for job queuing in enqueue burst operation. To avoid race condition in request + queuing to hardware, disabling hw_queue_lock restricts the number of queue-pairs + supported by cnxk driver to 1. + ``1`` - Enable, use spin-lock version of hardware enqueue function for job queuing. + Enabling spinlock version would disable restrictions on the number of queue-pairs + that can be supported by the driver. + + For example:: + + -a 0000:00:10.0,hw_queue_lock=1 + + With the above configuration, spinlock version of hardware enqueue function is used + in the fast path enqueue burst operation. + + +- ``Polling memory location`` (default ``ddr``) + + ML cnxk driver provides the option to select the memory location to be used + for polling to check the inference request completion. Driver supports using + the either DDR address space (``ddr``) or ML registers (``register``) as + polling locations. The parameter ``poll_mem`` ``devargs`` is used to specify + the poll location. + + For example:: + + -a 0000:00:10.0,poll_mem="register" + + With the above configuration, ML cnxk driver is configured to use ML registers + for polling in fastpath requests. + + +Debugging Options +----------------- + +.. _table_octeon_cnxk_ml_debug_options: + +.. table:: OCTEON cnxk ML PMD debug options + + +---+------------+-------------------------------------------------------+ + | # | Component | EAL log command | + +===+============+=======================================================+ + | 1 | ML | --log-level='pmd\.ml\.cnxk,8' | + +---+------------+-------------------------------------------------------+ + + +Extended stats +-------------- + +Marvell cnxk ML PMD supports reporting the inference latencies through extended +stats. The PMD supports the below list of 6 extended stats types per each model. +Total number of extended stats would be equal to 6 x number of models loaded. + +.. _table_octeon_cnxk_ml_xstats_names: + +.. table:: OCTEON cnxk ML PMD xstats names + + +---+---------------------+----------------------------------------------+ + | # | Type | Description | + +===+=====================+==============================================+ + | 1 | Avg-HW-Latency | Average hardware latency | + +---+---------------------+----------------------------------------------+ + | 2 | Min-HW-Latency | Minimum hardware latency | + +---+---------------------+----------------------------------------------+ + | 3 | Max-HW-Latency | Maximum hardware latency | + +---+---------------------+----------------------------------------------+ + | 4 | Avg-HW-Latency | Average firmware latency | + +---+---------------------+----------------------------------------------+ + | 5 | Avg-HW-Latency | Minimum firmware latency | + +---+---------------------+----------------------------------------------+ + | 6 | Avg-HW-Latency | Maximum firmware latency | + +---+---------------------+----------------------------------------------+ + +Latency values reported by the PMD through xstats can have units, either in +cycles or nano seconds. The units of the latency is determined during DPDK +initialization and would depend on the availability of SCLK. Latencies are +reported in nao seconds when the SCLK is available and in cycles otherwise. +Application needs to initialize at least one RVU for the clock to be available. + +xstats names are dynamically generated by the PMD and would have the format +"Model-<model_id>-Type-<units>". + +For example:: + Model-1-Avg-FW-Latency-ns + +The above xstat name would report average firmware latency in nano seconds for +model with model ID 1. + +Number of xstats made available by the PMD change dynamically. The number would +increase with loading a model and would decrease with unloading a model. +Application needs to update the xstats map after a model is either loaded or +unloaded. diff --git a/doc/guides/mldevs/index.rst b/doc/guides/mldevs/index.rst new file mode 100644 index 0000000000..f201e54175 --- /dev/null +++ b/doc/guides/mldevs/index.rst @@ -0,0 +1,14 @@ +.. SPDX-License-Identifier: BSD-3-Clause + Copyright (c) 2022 Marvell. + +Machine Learning Device Driver +============================== + +The following are a list of ML device PMDs, which can be used from an +application through the ML device API. + +.. toctree:: + :maxdepth: 2 + :numbered: + + cnxk -- 2.17.1