RE: [PATCH v1 5/6] power: add eventdev support for power management

2023-10-18 Thread Tummala, Sivaprasad
[AMD Official Use Only - General]

Hi Jerin,

> -Original Message-
> From: Jerin Jacob 
> Sent: Tuesday, October 17, 2023 8:53 AM
> To: Tummala, Sivaprasad 
> Cc: harry.van.haa...@intel.com; anatoly.bura...@intel.com; dev@dpdk.org; 
> Yigit,
> Ferruh ; david.h...@intel.com
> Subject: Re: [PATCH v1 5/6] power: add eventdev support for power management
>
> Caution: This message originated from an External Source. Use proper caution
> when opening attachments, clicking links, or responding.
>
>
> On Tue, Oct 17, 2023 at 2:27 AM Sivaprasad Tummala
>  wrote:
> >
> > Add eventdev support to enable power saving when no events are
> > arriving. It is based on counting the number of empty polls and, when
> > the number reaches a certain threshold, entering an
> > architecture-defined optimized power state that will either wait until
> > a TSC timestamp expires, or when events arrive.
> >
> > This API mandates a core-to-single-port mapping (i.e. one core polling
> > multiple ports of event device is not supported). This should be ok as
> > the general use case will have one CPU core using one port to
> > enqueue/dequeue events from an eventdev.
> >
> > This design is using Eventdev PMD Dequeue callbacks.
> >
> > 1. MWAITX/MONITORX:
> >
> >When a certain threshold of empty polls is reached, the core will go
> >into a power optimized sleep while waiting on an address of next RX
> >descriptor to be written to.
> >
> > 2. Pause instruction
> >
> >This method uses the pause instruction to avoid busy polling.
> >
> > Signed-off-by: Sivaprasad Tummala 
>
>
> Hi Siva,
>
> It does not look it is aligned with previous discussion.
>
> I spend a couple of minutes to draft semantics. Please treat as reference.
>
> # IMO, only following public SLOW PATH eventdev API required.(Just share the
> concept)
>
> enum rte_event_pmgmt_modes {
>/** Default power management scheme */
> RTE_EVENT_POWER_MGMT_TYPE_DEFAULT;
>/** Use power-optimized monitoring to wait for incoming traffic */
> RTE_EVENT_POWER_MGMT_TYPE_F_CPU_MONITOR = RTE_BIT(0),
> /** Use power-optimized sleep to avoid busy polling */
> RTE_EVENT_POWER_MGMT_TYPE_F_CPU_PAUSE = RTE_BIT(1),
>/** HW based power management scheme found in ARM64 machines, where
> core goes to sleep state till event available on dequeue */
> RTE_EVENT_POWER_MGMT_TYPE_F_HW_WFE_ON_DEQUEUE = RTE_BIT(2),
>
> };
>
> int rte_event_port_pmgmt_type_supported_get(uint8_t dev_id, enum
> rte_event_pmgmt_modes *mode_flags)
> /** Device must be in stop state */
> int rte_event_port_pmgmt_enable(uint8_t dev_id, uint8_t port_id, enum
> rte_event_pmgmt_modes mode); int rte_event_port_pmgmt_disable(uint8_t
> dev_id, uint8_t port_id);
>
> # It should be self-contained, No need to add to rte_power as it is CPU only 
> power
> mgmt.(See RTE_EVENT_POWER_MGMT_TYPE_F_HW_WFE_ON_DEQUEUE
> above)
>
> # Add: lib/eventdev/eventdev_pmd_pmgmt.c or so and have CPU based on power
> management helper functions so that all SW PMD can be reused.
> example:
> eventdev_pmd_pmgmt_handle_monitor(uint8_t dev_id, uint8_t port_id, struct
> rte_event ev[], uint16_t nb_events);
> eventdev_pmd_pmgmt_handle_pause(uint8_t dev_id, uint8_t port_id, struct
> rte_event ev[], uint16_t nb_events);
>
>
> # In rte_event_dev_start(), Fixup dev->dequeue_burst if CPU based power
> management is applicable,and it is selected.
> ie. new dev->dequeue_burst is existing PMD's  dev->dequeue_burst +
> eventdev_pmd_pmgmt_handle_.. (based on power management mode selected)

Thanks for the clarification. Will incorporate the changes in next version of 
the patch (to support power management on event port).
With the time constraints, I will defer the power management support on event 
port to next release. However to avoid ABI breakage,
I will split the Patchset and push the patches to support callbacks in this 
release, so we don't have to wait for next stable release to
get these changes integrated.

Please let me know your thoughts.

Thanks & Regards,
Sivaprasad


Re: [PATCH v1 5/6] power: add eventdev support for power management

2023-10-18 Thread Jerin Jacob
On Wed, Oct 18, 2023 at 12:38 PM Tummala, Sivaprasad
 wrote:
>
> [AMD Official Use Only - General]
>
> Hi Jerin,
>
> > -Original Message-
> > From: Jerin Jacob 
> > Sent: Tuesday, October 17, 2023 8:53 AM
> > To: Tummala, Sivaprasad 
> > Cc: harry.van.haa...@intel.com; anatoly.bura...@intel.com; dev@dpdk.org; 
> > Yigit,
> > Ferruh ; david.h...@intel.com
> > Subject: Re: [PATCH v1 5/6] power: add eventdev support for power management
> >
> > Caution: This message originated from an External Source. Use proper caution
> > when opening attachments, clicking links, or responding.
> >
> >
> > On Tue, Oct 17, 2023 at 2:27 AM Sivaprasad Tummala
> >  wrote:
> > >
> > > Add eventdev support to enable power saving when no events are
> > > arriving. It is based on counting the number of empty polls and, when
> > > the number reaches a certain threshold, entering an
> > > architecture-defined optimized power state that will either wait until
> > > a TSC timestamp expires, or when events arrive.
> > >
> > > This API mandates a core-to-single-port mapping (i.e. one core polling
> > > multiple ports of event device is not supported). This should be ok as
> > > the general use case will have one CPU core using one port to
> > > enqueue/dequeue events from an eventdev.
> > >
> > > This design is using Eventdev PMD Dequeue callbacks.
> > >
> > > 1. MWAITX/MONITORX:
> > >
> > >When a certain threshold of empty polls is reached, the core will go
> > >into a power optimized sleep while waiting on an address of next RX
> > >descriptor to be written to.
> > >
> > > 2. Pause instruction
> > >
> > >This method uses the pause instruction to avoid busy polling.
> > >
> > > Signed-off-by: Sivaprasad Tummala 
> >
> >
> > Hi Siva,
> >
> > It does not look it is aligned with previous discussion.
> >
> > I spend a couple of minutes to draft semantics. Please treat as reference.
> >
> > # IMO, only following public SLOW PATH eventdev API required.(Just share the
> > concept)
> >
> > enum rte_event_pmgmt_modes {
> >/** Default power management scheme */
> > RTE_EVENT_POWER_MGMT_TYPE_DEFAULT;
> >/** Use power-optimized monitoring to wait for incoming traffic */
> > RTE_EVENT_POWER_MGMT_TYPE_F_CPU_MONITOR = RTE_BIT(0),
> > /** Use power-optimized sleep to avoid busy polling */
> > RTE_EVENT_POWER_MGMT_TYPE_F_CPU_PAUSE = RTE_BIT(1),
> >/** HW based power management scheme found in ARM64 machines, where
> > core goes to sleep state till event available on dequeue */
> > RTE_EVENT_POWER_MGMT_TYPE_F_HW_WFE_ON_DEQUEUE = RTE_BIT(2),
> >
> > };
> >
> > int rte_event_port_pmgmt_type_supported_get(uint8_t dev_id, enum
> > rte_event_pmgmt_modes *mode_flags)
> > /** Device must be in stop state */
> > int rte_event_port_pmgmt_enable(uint8_t dev_id, uint8_t port_id, enum
> > rte_event_pmgmt_modes mode); int rte_event_port_pmgmt_disable(uint8_t
> > dev_id, uint8_t port_id);
> >
> > # It should be self-contained, No need to add to rte_power as it is CPU 
> > only power
> > mgmt.(See RTE_EVENT_POWER_MGMT_TYPE_F_HW_WFE_ON_DEQUEUE
> > above)
> >
> > # Add: lib/eventdev/eventdev_pmd_pmgmt.c or so and have CPU based on power
> > management helper functions so that all SW PMD can be reused.
> > example:
> > eventdev_pmd_pmgmt_handle_monitor(uint8_t dev_id, uint8_t port_id, struct
> > rte_event ev[], uint16_t nb_events);
> > eventdev_pmd_pmgmt_handle_pause(uint8_t dev_id, uint8_t port_id, struct
> > rte_event ev[], uint16_t nb_events);
> >
> >
> > # In rte_event_dev_start(), Fixup dev->dequeue_burst if CPU based power
> > management is applicable,and it is selected.
> > ie. new dev->dequeue_burst is existing PMD's  dev->dequeue_burst +
> > eventdev_pmd_pmgmt_handle_.. (based on power management mode selected)
>
> Thanks for the clarification. Will incorporate the changes in next version of 
> the patch (to support power management on event port).
> With the time constraints, I will defer the power management support on event 
> port to next release. However to avoid ABI breakage,
> I will split the Patchset and push the patches to support callbacks in this 
> release, so we don't have to wait for next stable release to
> get these changes integrated.

if you follow this scheme, public callback API is not needed.

 # In rte_event_dev_start(), Fixup dev->dequeue_burst if CPU based power
 management is applicable,and it is selected.
ie. new dev->dequeue_burst is existing PMD's  dev->dequeue_burst +
 eventdev_pmd_pmgmt_handle_.. (based on power management mode selected)


>
> Please let me know your thoughts.
>
> Thanks & Regards,
> Sivaprasad


Re: [PATCH v1] common/cnxk: fix flow add in age flow list

2023-10-18 Thread Jerin Jacob
On Tue, Oct 17, 2023 at 4:48 PM Ankur Dwivedi  wrote:
>
> While adding flow in npc_flow_list, the flow can be added before the
> current flow iterator. The function returns after adding this flow.
> This prevents flow to be added in age flow list correctly. This patch moves
> the addition of age flow list before npc_flow_list add to prevent the
> error. Also the flow is added or deleted to/from age flow list if the flow
> has age action.
>
> Fixes: 357f5ebc8a24 ("common/cnxk: support flow aging")
> Cc: sta...@dpdk.org
>
> Signed-off-by: Ankur Dwivedi 

Updated the git commit as follows and applied to
dpdk-next-net-mrvl/for-next-net. Thanks


common/cnxk: fix age flow list update

While adding flow in npc_flow_list, the flow can be added before the
current flow iterator. The function returns after adding this flow.
This prevents flow to be added in age flow list correctly. This patch moves
the addition of age flow list before npc_flow_list add to prevent the
error. Also the flow is added or deleted to/from age flow list if the flow
has age action.

Fixes: 357f5ebc8a24 ("common/cnxk: support flow aging")
Cc: sta...@dpdk.org

Signed-off-by: Ankur Dwivedi 


RE: [EXT] Re: [PATCH v2 1/1] usertools/rss: add CNXK RSS key

2023-10-18 Thread Sunil Kumar Kori
> -Original Message-
> From: Robin Jarry 
> Sent: Tuesday, October 17, 2023 5:47 PM
> To: Sunil Kumar Kori 
> Cc: dev@dpdk.org; Thomas Monjalon ; Jerin Jacob
> 
> Subject: [EXT] Re: [PATCH v2 1/1] usertools/rss: add CNXK RSS key
> 
> External Email
> 
> --
> , Oct 09, 2023 at 18:36:
> > From: Sunil Kumar Kori 
> >
> > This patch adds RSS key for CNXK platforms. CNXK platform uses
> > 48 bytes long key for hash calculations.
> >
> > For the same patch also updates help mesaages to provide range
> > information for supporting NICs/platforms.
> >
> > Also CNXK uses reta size as 64 so to get correct offset to retrieve
> > queue index, user must pass reta_size option as 64 i.e. -t 64.
> 
> I think we should add some driver abstraction that contains the required key
> length and default reta size. Instead of requiring the user to guess the 
> correct
> values. Is that something you could do?
> 
Okay but in either case i.e. -t option or driver abstraction, user must know 
the reta size and key size before configuring.
So  I am not sure that how adding driver abstraction will help to solve this 
issue unless/until its documented somewhere. 

So for current release, I am planning to go this version as it is because we 
are close.
Later on we can think of it and add required support. 
Please provide input on it. 

> >
> > Examples:
> > $ ./dpdk-rss-flows.py -k cnxk 8 28.0.0.0/24 40.0.0.0/24 -t 64
> > SRC_IP  DST_IP   QUEUE
> > 28.0.0.140.0.0.1 7
> > 28.0.0.140.0.0.2 2
> > 28.0.0.140.0.0.3 4
> > 28.0.0.140.0.0.7 1
> > 28.0.0.140.0.0.8 3
> > 28.0.0.140.0.0.9 5
> > 28.0.0.140.0.0.100
> > 28.0.0.140.0.0.116
> >
> > Signed-off-by: Sunil Kumar Kori 
> > ---
> > v1..v2:
> >  - Fix checkpatch errors.
> 
> Hi Sunil,
> 
> >  usertools/dpdk-rss-flows.py | 17 +++--
> >  1 file changed, 15 insertions(+), 2 deletions(-)
> >
> > diff --git a/usertools/dpdk-rss-flows.py b/usertools/dpdk-rss-flows.py
> > index 73821eb471..b6edd7a2e0 100755
> > --- a/usertools/dpdk-rss-flows.py
> > +++ b/usertools/dpdk-rss-flows.py
> > @@ -188,11 +188,24 @@ def balanced_traffic(
> >  0x81, 0x15, 0x03, 0x66,
> >  )
> >  )
> > +# rss_key_default, see drivers/net/cnxk/cnxk_flow.c
> 
> Are you referring to roc_nix_rss_key_default_fill in
> drivers/common/cnxk/roc_nix_rss.c?
> 
Yes, that is the correct file name. Will fix in next version. 

> > +# Marvell's cnxk NICs take 48 bytes keys
> > +RSS_KEY_CNXK = bytes(
> > +(
> > +0xfe, 0xed, 0x0b, 0xad, 0xfe, 0xed, 0x0b, 0xad,
> > +0xfe, 0xed, 0x0b, 0xad, 0xfe, 0xed, 0x0b, 0xad,
> > +0xfe, 0xed, 0x0b, 0xad, 0xfe, 0xed, 0x0b, 0xad,
> > +0xfe, 0xed, 0x0b, 0xad, 0xfe, 0xed, 0x0b, 0xad,
> > +0xfe, 0xed, 0x0b, 0xad, 0xfe, 0xed, 0x0b, 0xad,
> > +0xfe, 0xed, 0x0b, 0xad, 0xfe, 0xed, 0x0b, 0xad,
> > +)
> > +)
> >  # fmt: on
> >  DEFAULT_DRIVER_KEYS = {
> >  "intel": RSS_KEY_INTEL,
> >  "mlx": RSS_KEY_MLX,
> >  "i40e": RSS_KEY_I40E,
> > +"cnxk": RSS_KEY_CNXK,
> >  }
> >
> >
> > @@ -202,7 +215,7 @@ def rss_key(value):
> >  try:
> >  key = binascii.unhexlify(value)
> >  if len(key) not in (40, 52):
> > -raise argparse.ArgumentTypeError("The key must be 40 or 52 
> > bytes
> long")
> > +raise argparse.ArgumentTypeError("The key must be 40 to 52 
> > bytes
> long")
> 
> You are not changing the length test, so passing a 48 bytes key will
> trigger an error.
> 
Ack. Will fix in next version.

> >  return key
> >  except (TypeError, ValueError) as e:
> >  raise argparse.ArgumentTypeError(str(e)) from e
> > @@ -299,7 +312,7 @@ def parse_args():
> >  default=RSS_KEY_INTEL,
> >  type=rss_key,
> >  help="""
> > -The random 40-bytes key used to compute the RSS hash. This option
> > +The random 40 to 52 bytes key used to compute the RSS hash. This
> option
> >  supports either a well-known name or the hex value of the key
> >  (well-known names: "intel", "mlx", default: "intel").
> >  """,
> > --
> > 2.25.1
> 
> 
> 
> --
> Robin Jarry
> Principal Software Engineer
> Red Hat, Telco/NFV



RE: [PATCH v2] dma/cnxk: offload source buffer free

2023-10-18 Thread Vamsi Krishna Attunuru



> -Original Message-
> From: Amit Prakash Shukla 
> Sent: Wednesday, October 18, 2023 12:24 AM
> To: Vamsi Krishna Attunuru 
> Cc: dev@dpdk.org; Jerin Jacob Kollanukkaran ;
> fengcheng...@huawei.com; kevin.la...@intel.com;
> bruce.richard...@intel.com; conor.wa...@intel.com; g.si...@nxp.com;
> sachin.sax...@oss.nxp.com; hemant.agra...@nxp.com;
> cheng1.ji...@intel.com; Nithin Kumar Dabilpuram
> ; Anoob Joseph ;
> m...@smartsharesystems.com; Amit Prakash Shukla
> 
> Subject: [PATCH v2] dma/cnxk: offload source buffer free
> 
> Added support in driver, to offload source buffer free to hardware on
> completion of DMA transfer.
> 
> Signed-off-by: Amit Prakash Shukla 
> ---
> v2:
> - Patch rebased.
> 
> v1:
> - Driver implementation from RFC.
> 

Acked-by: Vamsi Attunuru 

>  drivers/dma/cnxk/cnxk_dmadev.c| 48
> +++
>  drivers/dma/cnxk/cnxk_dmadev_fp.c |  8 +++---
>  2 files changed, 46 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/dma/cnxk/cnxk_dmadev.c
> b/drivers/dma/cnxk/cnxk_dmadev.c index 26680edfde..1e7f49792c 100644
> --- a/drivers/dma/cnxk/cnxk_dmadev.c
> +++ b/drivers/dma/cnxk/cnxk_dmadev.c
> @@ -16,7 +16,8 @@ cnxk_dmadev_info_get(const struct rte_dma_dev
> *dev, struct rte_dma_info *dev_inf
>   dev_info->nb_vchans = dpivf->num_vchans;
>   dev_info->dev_capa = RTE_DMA_CAPA_MEM_TO_MEM |
> RTE_DMA_CAPA_MEM_TO_DEV |
>RTE_DMA_CAPA_DEV_TO_MEM |
> RTE_DMA_CAPA_DEV_TO_DEV |
> -  RTE_DMA_CAPA_OPS_COPY |
> RTE_DMA_CAPA_OPS_COPY_SG;
> +  RTE_DMA_CAPA_OPS_COPY |
> RTE_DMA_CAPA_OPS_COPY_SG |
> +  RTE_DMA_CAPA_M2D_AUTO_FREE;
>   dev_info->max_desc = CNXK_DPI_MAX_DESC;
>   dev_info->min_desc = CNXK_DPI_MIN_DESC;
>   dev_info->max_sges = CNXK_DPI_MAX_POINTER; @@ -115,9
> +116,26 @@ cnxk_dmadev_configure(struct rte_dma_dev *dev, const struct
> rte_dma_conf *conf,
>   return 0;
>  }
> 
> -static void
> +static int
> +dmadev_src_buf_aura_get(struct rte_mempool *sb_mp, const char
> +*mp_ops_name) {
> + struct rte_mempool_ops *ops;
> +
> + if (sb_mp == NULL)
> + return 0;
> +
> + ops = rte_mempool_get_ops(sb_mp->ops_index);
> + if (strcmp(ops->name, mp_ops_name) != 0)
> + return -EINVAL;
> +
> + return roc_npa_aura_handle_to_aura(sb_mp->pool_id);
> +}
> +
> +static int
>  cn9k_dmadev_setup_hdr(union cnxk_dpi_instr_cmd *header, const struct
> rte_dma_vchan_conf *conf)  {
> + int aura;
> +
>   header->cn9k.pt = DPI_HDR_PT_ZBW_CA;
> 
>   switch (conf->direction) {
> @@ -140,6 +158,11 @@ cn9k_dmadev_setup_hdr(union
> cnxk_dpi_instr_cmd *header, const struct rte_dma_vch
>   header->cn9k.func = conf->dst_port.pcie.pfid << 12;
>   header->cn9k.func |= conf->dst_port.pcie.vfid;
>   }
> + aura = dmadev_src_buf_aura_get(conf-
> >auto_free.m2d.pool, "cn9k_mempool_ops");
> + if (aura < 0)
> + return aura;
> + header->cn9k.aura = aura;
> + header->cn9k.ii = 1;
>   break;
>   case RTE_DMA_DIR_MEM_TO_MEM:
>   header->cn9k.xtype = DPI_XTYPE_INTERNAL_ONLY; @@ -
> 153,11 +176,15 @@ cn9k_dmadev_setup_hdr(union cnxk_dpi_instr_cmd
> *header, const struct rte_dma_vch
>   header->cn9k.fport = conf->dst_port.pcie.coreid;
>   header->cn9k.pvfe = 0;
>   };
> +
> + return 0;
>  }
> 
> -static void
> +static int
>  cn10k_dmadev_setup_hdr(union cnxk_dpi_instr_cmd *header, const struct
> rte_dma_vchan_conf *conf)  {
> + int aura;
> +
>   header->cn10k.pt = DPI_HDR_PT_ZBW_CA;
> 
>   switch (conf->direction) {
> @@ -180,6 +207,10 @@ cn10k_dmadev_setup_hdr(union
> cnxk_dpi_instr_cmd *header, const struct rte_dma_vc
>   header->cn10k.func = conf->dst_port.pcie.pfid <<
> 12;
>   header->cn10k.func |= conf->dst_port.pcie.vfid;
>   }
> + aura = dmadev_src_buf_aura_get(conf-
> >auto_free.m2d.pool, "cn10k_mempool_ops");
> + if (aura < 0)
> + return aura;
> + header->cn10k.aura = aura;
>   break;
>   case RTE_DMA_DIR_MEM_TO_MEM:
>   header->cn10k.xtype = DPI_XTYPE_INTERNAL_ONLY; @@ -
> 193,6 +224,8 @@ cn10k_dmadev_setup_hdr(union cnxk_dpi_instr_cmd
> *header, const struct rte_dma_vc
>   header->cn10k.fport = conf->dst_port.pcie.coreid;
>   header->cn10k.pvfe = 0;
>   };
> +
> + return 0;
>  }
> 
>  static int
> @@ -204,16 +237,19 @@ cnxk_dmadev_vchan_setup(struct rte_dma_dev
> *dev, uint16_t vchan,
>   union cnxk_dpi_instr_cmd *header;
>   uint16_t max_desc;
>   uint32_t size;
> - int i;
> + int i, ret;
> 
>   RTE_SET_USED(conf_sz);
> 
>   header = (union cnxk_dpi_instr_cmd *)&dpi_conf->cmd.u;
> 
>   if (dpivf->is_cn10k)
> - cn10k_dm

[PATCH v3 1/1] usertools/rss: add CNXK RSS key

2023-10-18 Thread skori
From: Sunil Kumar Kori 

This patch adds RSS key for CNXK platforms. CNXK platform uses
48 bytes long key for hash calculations.

For the same patch also updates help mesaages to provide range
information for supporting NICs/platforms.

Also CNXK uses reta size as 64 so to get correct offset to retrieve
queue index, user must pass reta_size option as 64 i.e. -t 64.

Examples:
$ ./dpdk-rss-flows.py -k cnxk 8 28.0.0.0/24 40.0.0.0/24 -t 64
SRC_IP  DST_IP   QUEUE
28.0.0.140.0.0.1 7
28.0.0.140.0.0.2 2
28.0.0.140.0.0.3 4
28.0.0.140.0.0.7 1
28.0.0.140.0.0.8 3
28.0.0.140.0.0.9 5
28.0.0.140.0.0.100
28.0.0.140.0.0.116

Signed-off-by: Sunil Kumar Kori 
Acked-by: Jerin Jacob 
---
v2..v3:
 - Fix key size range check.
 - Fix default key size file name.

v1..v2:
 - Fix checkpatch errors.

 usertools/dpdk-rss-flows.py | 19 ---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/usertools/dpdk-rss-flows.py b/usertools/dpdk-rss-flows.py
index 73821eb471..937f9e1927 100755
--- a/usertools/dpdk-rss-flows.py
+++ b/usertools/dpdk-rss-flows.py
@@ -188,11 +188,24 @@ def balanced_traffic(
 0x81, 0x15, 0x03, 0x66,
 )
 )
+# default_key, see drivers/common/cnxk/roc_nix_rss.c
+# Marvell's cnxk NICs take 48 bytes keys
+RSS_KEY_CNXK = bytes(
+(
+0xfe, 0xed, 0x0b, 0xad, 0xfe, 0xed, 0x0b, 0xad,
+0xfe, 0xed, 0x0b, 0xad, 0xfe, 0xed, 0x0b, 0xad,
+0xfe, 0xed, 0x0b, 0xad, 0xfe, 0xed, 0x0b, 0xad,
+0xfe, 0xed, 0x0b, 0xad, 0xfe, 0xed, 0x0b, 0xad,
+0xfe, 0xed, 0x0b, 0xad, 0xfe, 0xed, 0x0b, 0xad,
+0xfe, 0xed, 0x0b, 0xad, 0xfe, 0xed, 0x0b, 0xad,
+)
+)
 # fmt: on
 DEFAULT_DRIVER_KEYS = {
 "intel": RSS_KEY_INTEL,
 "mlx": RSS_KEY_MLX,
 "i40e": RSS_KEY_I40E,
+"cnxk": RSS_KEY_CNXK,
 }
 
 
@@ -201,8 +214,8 @@ def rss_key(value):
 return DEFAULT_DRIVER_KEYS[value]
 try:
 key = binascii.unhexlify(value)
-if len(key) not in (40, 52):
-raise argparse.ArgumentTypeError("The key must be 40 or 52 bytes 
long")
+if len(key) not in range(40, 52):
+raise argparse.ArgumentTypeError("The key must be 40 to 52 bytes 
long")
 return key
 except (TypeError, ValueError) as e:
 raise argparse.ArgumentTypeError(str(e)) from e
@@ -299,7 +312,7 @@ def parse_args():
 default=RSS_KEY_INTEL,
 type=rss_key,
 help="""
-The random 40-bytes key used to compute the RSS hash. This option
+The random 40 to 52 bytes key used to compute the RSS hash. This option
 supports either a well-known name or the hex value of the key
 (well-known names: "intel", "mlx", default: "intel").
 """,
-- 
2.25.1



[PATCH v4 0/6] Enhance the bond framework to support offload

2023-10-18 Thread Chaoyong He
This patch series try to enhance the bond framework to support the
offload feature better:
* Add new API to make the member port can access some information of the
  bond port which belongs.
* Add new API to get the result of whether bond port is created by the
  member port.
* Add two command line argument to control if enable member port
  notification and dedicated queue features.
* Add logic to support add ports which share the same PCI address into
  bond port.
* Also modify the testpmd application to test the new APIs and logics
  added by this patch series.

---
v2:
* Fix compile error on github-robot by removing the redundancy function
  declaration in the header file.
v3:
* Use the hole in the structure for the new added flag data field.
v4:
* Drop two commits not necessary for this series.
* Modify some logic as the review comments from reviewers.
---

Long Wu (6):
  ethdev: add member notification for bonding port
  ethdev: add API to get hardware creation of bonding port
  net/bonding: add bonding port arguments
  net/bonding: support add port by data name
  net/bonding: support checking valid bonding port ID
  net/bonding: add commands for bonding port notification

 .../link_bonding_poll_mode_drv_lib.rst|  19 +++
 drivers/net/bonding/bonding_testpmd.c | 128 ++
 drivers/net/bonding/eth_bond_private.h|  11 ++
 drivers/net/bonding/rte_eth_bond.h|  88 
 drivers/net/bonding/rte_eth_bond_api.c| 121 +
 drivers/net/bonding/rte_eth_bond_args.c   |  47 +++
 drivers/net/bonding/rte_eth_bond_pmd.c|  93 -
 drivers/net/bonding/version.map   |   5 +
 lib/ethdev/ethdev_driver.h|  38 ++
 9 files changed, 546 insertions(+), 4 deletions(-)

-- 
2.39.1



[PATCH v4 1/6] ethdev: add member notification for bonding port

2023-10-18 Thread Chaoyong He
From: Long Wu 

Bonding PMD does not let member ports know the bonding port's
information, like how many member ports the bonding port has,
what mode the bonding port is in and so on.

Add the notification interface for bonding port to let member
port know it is added to a bonding port and what the bonding
port's configuration is. If so the member ports have chance to
achieve its bond-flow-offlod or other private bonding functions.

Signed-off-by: Long Wu 
Reviewed-by: James Hershaw 
Reviewed-by: Chaoyong He 
---
 drivers/net/bonding/eth_bond_private.h |  1 +
 drivers/net/bonding/rte_eth_bond.h | 46 
 drivers/net/bonding/rte_eth_bond_api.c | 72 ++
 drivers/net/bonding/rte_eth_bond_pmd.c | 32 ++--
 drivers/net/bonding/version.map|  3 ++
 lib/ethdev/ethdev_driver.h | 18 +++
 6 files changed, 169 insertions(+), 3 deletions(-)

diff --git a/drivers/net/bonding/eth_bond_private.h 
b/drivers/net/bonding/eth_bond_private.h
index e688894210..f69e85c199 100644
--- a/drivers/net/bonding/eth_bond_private.h
+++ b/drivers/net/bonding/eth_bond_private.h
@@ -180,6 +180,7 @@ struct bond_dev_private {
uint8_t member_update_idx;
 
bool kvargs_processing_is_done;
+   bool notify_member; /**< Enable member notification of bonding port. */
 
uint32_t candidate_max_rx_pktlen;
uint32_t max_rx_pktlen;
diff --git a/drivers/net/bonding/rte_eth_bond.h 
b/drivers/net/bonding/rte_eth_bond.h
index f10165f2c6..f6c773615c 100644
--- a/drivers/net/bonding/rte_eth_bond.h
+++ b/drivers/net/bonding/rte_eth_bond.h
@@ -351,6 +351,52 @@ rte_eth_bond_link_up_prop_delay_set(uint16_t 
bonding_port_id,
 int
 rte_eth_bond_link_up_prop_delay_get(uint16_t bonding_port_id);
 
+/**
+ * Set the flag of whether bonding port notifies member ports.
+ *
+ * @param bonding_port_id
+ *   Port ID of bonding device.
+ * @param notify
+ *   Flag of whether bonding port notifies member ports.
+ *
+ * @return
+ *   0 on success, negative value otherwise.
+ */
+__rte_experimental
+int
+rte_eth_bond_notify_member_flag_set(uint16_t bonding_port_id, bool notify);
+
+/**
+ * Get the flag of whether bonding port notifies member ports.
+ *
+ * @param bonding_port_id
+ *   Port ID of bonding device.
+ * @param notify
+ *   Flag of whether bonding port notifies member ports.
+ *
+ * @return
+ *   0 on success, negative value otherwise.
+ */
+__rte_experimental
+int
+rte_eth_bond_notify_member_flag_get(uint16_t bonding_port_id, bool *notify);
+
+/**
+ * Notify the member ports of bonding port's information.
+ *
+ * This interface is called in the following functions:
+ * - bond_ethdev_lsc_event_callback()
+ * - bond_ethdev_configure()
+ *
+ * @param bonding_port_id
+ *   Port ID of bonding device.
+ *
+ * @return
+ *   0 on success, negative value otherwise.
+ */
+__rte_experimental
+int
+rte_eth_bond_notify_members(uint16_t bonding_port_id);
 
 #ifdef __cplusplus
 }
diff --git a/drivers/net/bonding/rte_eth_bond_api.c 
b/drivers/net/bonding/rte_eth_bond_api.c
index 99e496556a..239f86ee92 100644
--- a/drivers/net/bonding/rte_eth_bond_api.c
+++ b/drivers/net/bonding/rte_eth_bond_api.c
@@ -627,6 +627,17 @@ __eth_bond_member_add_lock_free(uint16_t bonding_port_id, 
uint16_t member_port_i
 
member_vlan_filter_set(bonding_port_id, member_port_id);
 
+   if (internals->notify_member &&
+   *member_eth_dev->dev_ops->bond_notify_member != NULL) {
+   ret = 
member_eth_dev->dev_ops->bond_notify_member(member_eth_dev,
+   bonding_eth_dev);
+   if (ret < 0) {
+   RTE_BOND_LOG(ERR, "Add member (port %u) notify failed!",
+   member_port_id);
+   return -1;
+   }
+   }
+
return 0;
 
 }
@@ -733,6 +744,10 @@ __eth_bond_member_remove_lock_free(uint16_t 
bonding_port_id,
member_eth_dev = &rte_eth_devices[member_port_id];
member_remove(internals, member_eth_dev);
member_eth_dev->data->dev_flags &= (~RTE_ETH_DEV_BONDING_MEMBER);
+   if (internals->notify_member &&
+   *member_eth_dev->dev_ops->bond_notify_member != NULL)
+   member_eth_dev->dev_ops->bond_notify_member(member_eth_dev,
+   bonding_eth_dev);
 
/*  first member in the active list will be the primary by default,
 *  otherwise use first device in list */
@@ -1098,3 +1113,60 @@ rte_eth_bond_link_up_prop_delay_get(uint16_t 
bonding_port_id)
 
return internals->link_up_delay_ms;
 }
+
+int
+rte_eth_bond_notify_member_flag_set(uint16_t bonding_port_id, bool notify)
+{
+   struct bond_dev_private *internals;
+
+   if (valid_bonding_port_id(bonding_port_id) != 0)
+   return -EINVAL;
+
+   internals = rte_eth_devices[bonding_port_id].data->dev_private;
+
+   internals->notify_member = notify;
+
+   return 0;
+}

[PATCH v4 2/6] ethdev: add API to get hardware creation of bonding port

2023-10-18 Thread Chaoyong He
From: Long Wu 

After bonding port notification, member port hardware may create the
bonding port. We want to get the result of creatition, so we add this
API to do the getting action.

Signed-off-by: Long Wu 
Reviewed-by: James Hershaw 
Reviewed-by: Chaoyong He 
---
 drivers/net/bonding/rte_eth_bond.h | 15 ++
 drivers/net/bonding/rte_eth_bond_api.c | 28 ++
 drivers/net/bonding/version.map|  1 +
 lib/ethdev/ethdev_driver.h | 20 ++
 4 files changed, 64 insertions(+)

diff --git a/drivers/net/bonding/rte_eth_bond.h 
b/drivers/net/bonding/rte_eth_bond.h
index f6c773615c..987269b323 100644
--- a/drivers/net/bonding/rte_eth_bond.h
+++ b/drivers/net/bonding/rte_eth_bond.h
@@ -398,6 +398,21 @@ __rte_experimental
 int
 rte_eth_bond_notify_members(uint16_t bonding_port_id);
 
+/**
+ * Get the status of specified bonding port created by member port hardware.
+ *
+ * @param bonding_port_id
+ *   Port ID of bonding device.
+ * @param member_port_id
+ *   Port ID of member device.
+ *
+ * @return
+ *   0 on success, negative value otherwise.
+ */
+__rte_experimental
+int
+rte_eth_bond_hw_create_get(uint16_t bonding_port_id, uint16_t member_port_id);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/drivers/net/bonding/rte_eth_bond_api.c 
b/drivers/net/bonding/rte_eth_bond_api.c
index 239f86ee92..317c3c1542 100644
--- a/drivers/net/bonding/rte_eth_bond_api.c
+++ b/drivers/net/bonding/rte_eth_bond_api.c
@@ -1170,3 +1170,31 @@ rte_eth_bond_notify_members(uint16_t bonding_port_id)
 
return 0;
 }
+
+int
+rte_eth_bond_hw_create_get(uint16_t bonding_port_id, uint16_t member_port_id)
+{
+   uint32_t i;
+   struct rte_eth_dev *bonding_dev;
+   struct rte_eth_dev *member_dev;
+   struct bond_dev_private *internals;
+
+   if (valid_bonding_port_id(bonding_port_id) != 0)
+   return -EINVAL;
+
+   bonding_dev = &rte_eth_devices[bonding_port_id];
+   internals = bonding_dev->data->dev_private;
+   for (i = 0; i < internals->member_count; i++) {
+   if (internals->members[i].port_id == member_port_id)
+   break;
+   }
+
+   if (i == internals->member_count)
+   return -EINVAL;
+
+   member_dev = &rte_eth_devices[member_port_id];
+   if (*member_dev->dev_ops->bond_hw_create_get == NULL)
+   return -ENOTSUP;
+
+   return member_dev->dev_ops->bond_hw_create_get(member_dev, bonding_dev);
+}
diff --git a/drivers/net/bonding/version.map b/drivers/net/bonding/version.map
index 3bd5e8ad11..3cfff51269 100644
--- a/drivers/net/bonding/version.map
+++ b/drivers/net/bonding/version.map
@@ -32,6 +32,7 @@ EXPERIMENTAL {
global:
rte_eth_bond_8023ad_member_info;
rte_eth_bond_active_members_get;
+   rte_eth_bond_hw_create_get;
rte_eth_bond_member_add;
rte_eth_bond_member_remove;
rte_eth_bond_members_get;
diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
index f626f971e5..18ff5db969 100644
--- a/lib/ethdev/ethdev_driver.h
+++ b/lib/ethdev/ethdev_driver.h
@@ -1231,6 +1231,21 @@ typedef int (*eth_map_aggr_tx_affinity_t)(struct 
rte_eth_dev *dev, uint16_t tx_q
 typedef int (*eth_bond_notify_member)(struct rte_eth_dev *dev,
  struct rte_eth_dev *bonding_dev);
 
+/**
+ * @internal
+ * Get the status of specified bonding port created by member port hardware.
+ *
+ * @param dev
+ *   Member port (ethdev) handle.
+ * @param bonding_dev
+ *   Bonding port (ethdev) handle.
+ *
+ * @return
+ *   Negative on error, 0 on success.
+ */
+typedef int (*eth_bond_hw_create_get)(struct rte_eth_dev *dev,
+ struct rte_eth_dev *bonding_dev);
+
 /**
  * @internal A structure containing the functions exported by an Ethernet 
driver.
  */
@@ -1473,6 +1488,11 @@ struct eth_dev_ops {
 
/** Notify the member port of bonding port information */
eth_bond_notify_member bond_notify_member;
+   /**
+* Get the status of whether bonding port is successfully created by
+* the member port hardware.
+*/
+   eth_bond_hw_create_get bond_hw_create_get;
 };
 
 /**
-- 
2.39.1



[PATCH v4 3/6] net/bonding: add bonding port arguments

2023-10-18 Thread Chaoyong He
From: Long Wu 

Include the following new arguments for bonding ports:
- "notify_member" to enable/disable member notification.
- "dedicated_queue" to enable/disable dedicated queue.

Add these two arguments in initial argument.

Signed-off-by: Long Wu 
Reviewed-by: James Hershaw 
Reviewed-by: Chaoyong He 
---
 drivers/net/bonding/eth_bond_private.h  | 10 
 drivers/net/bonding/rte_eth_bond.h  | 14 ++
 drivers/net/bonding/rte_eth_bond_api.c  | 14 ++
 drivers/net/bonding/rte_eth_bond_args.c | 44 ++
 drivers/net/bonding/rte_eth_bond_pmd.c  | 61 -
 5 files changed, 142 insertions(+), 1 deletion(-)

diff --git a/drivers/net/bonding/eth_bond_private.h 
b/drivers/net/bonding/eth_bond_private.h
index f69e85c199..f9603a0f6b 100644
--- a/drivers/net/bonding/eth_bond_private.h
+++ b/drivers/net/bonding/eth_bond_private.h
@@ -28,6 +28,8 @@
 #define PMD_BOND_LSC_POLL_PERIOD_KVARG ("lsc_poll_period_ms")
 #define PMD_BOND_LINK_UP_PROP_DELAY_KVARG  ("up_delay")
 #define PMD_BOND_LINK_DOWN_PROP_DELAY_KVARG("down_delay")
+#define PMD_BOND_NOTIFY_MEMBER_KVARG   ("notify_member")
+#define PMD_BOND_DEDICATED_QUEUE_KVARG ("dedicated_queue")
 
 #define PMD_BOND_XMIT_POLICY_LAYER2_KVARG  ("l2")
 #define PMD_BOND_XMIT_POLICY_LAYER23_KVARG ("l23")
@@ -319,6 +321,14 @@ int
 bond_ethdev_parse_time_ms_kvarg(const char *key,
const char *value, void *extra_args);
 
+int
+bond_ethdev_parse_notify_member_kvarg(const char *key __rte_unused,
+   const char *value, void *extra_args);
+
+int
+bond_ethdev_parse_dedicated_queue_kvarg(const char *key __rte_unused,
+   const char *value, void *extra_args);
+
 void
 bond_tlb_disable(struct bond_dev_private *internals);
 
diff --git a/drivers/net/bonding/rte_eth_bond.h 
b/drivers/net/bonding/rte_eth_bond.h
index 987269b323..936ab8c3a0 100644
--- a/drivers/net/bonding/rte_eth_bond.h
+++ b/drivers/net/bonding/rte_eth_bond.h
@@ -351,6 +351,20 @@ rte_eth_bond_link_up_prop_delay_set(uint16_t 
bonding_port_id,
 int
 rte_eth_bond_link_up_prop_delay_get(uint16_t bonding_port_id);
 
+/**
+ * Set the flag that whether bonding device enable dedicated queue.
+ *
+ * @param bonding_port_id
+ *   Port ID of bonding device.
+ * @param queue_flag
+ *   The flag of enable bond dedicated queue
+ *
+ * @return
+ *   0 on success, negative value otherwise.
+ */
+int
+rte_eth_bond_dedicated_queue_flag_set(uint16_t bonding_port_id, bool 
queue_flag);
+
 /**
  * Set the flag of whether bonding port notifies member ports.
  *
diff --git a/drivers/net/bonding/rte_eth_bond_api.c 
b/drivers/net/bonding/rte_eth_bond_api.c
index 317c3c1542..656ddd35a7 100644
--- a/drivers/net/bonding/rte_eth_bond_api.c
+++ b/drivers/net/bonding/rte_eth_bond_api.c
@@ -1114,6 +1114,20 @@ rte_eth_bond_link_up_prop_delay_get(uint16_t 
bonding_port_id)
return internals->link_up_delay_ms;
 }
 
+int
+rte_eth_bond_dedicated_queue_flag_set(uint16_t bonding_port_id, bool 
queue_flag)
+{
+   struct bond_dev_private *internals;
+
+   if (valid_bonding_port_id(bonding_port_id) != 0)
+   return -1;
+
+   internals = rte_eth_devices[bonding_port_id].data->dev_private;
+   internals->mode4.dedicated_queues.enabled = queue_flag;
+
+   return 0;
+}
+
 int
 rte_eth_bond_notify_member_flag_set(uint16_t bonding_port_id, bool notify)
 {
diff --git a/drivers/net/bonding/rte_eth_bond_args.c 
b/drivers/net/bonding/rte_eth_bond_args.c
index bdec5d61d4..8a3e4656ef 100644
--- a/drivers/net/bonding/rte_eth_bond_args.c
+++ b/drivers/net/bonding/rte_eth_bond_args.c
@@ -20,6 +20,8 @@ const char *pmd_bond_init_valid_arguments[] = {
PMD_BOND_MAC_ADDR_KVARG,
PMD_BOND_AGG_MODE_KVARG,
RTE_DEVARGS_KEY_DRIVER,
+   PMD_BOND_NOTIFY_MEMBER_KVARG,
+   PMD_BOND_DEDICATED_QUEUE_KVARG,
NULL
 };
 
@@ -297,3 +299,45 @@ bond_ethdev_parse_time_ms_kvarg(const char *key 
__rte_unused,
 
return 0;
 }
+
+int
+bond_ethdev_parse_notify_member_kvarg(const char *key __rte_unused,
+   const char *value, void *extra_args)
+{
+   bool *notify_member;
+
+   if (value == NULL || extra_args == NULL)
+   return -1;
+
+   notify_member = extra_args;
+
+   if (strcmp("enable", value) == 0)
+   *notify_member = true;
+   else if (strcmp("disable", value) == 0)
+   *notify_member = false;
+   else
+   return -1;
+
+   return 0;
+}
+
+int
+bond_ethdev_parse_dedicated_queue_kvarg(const char *key __rte_unused,
+   const char *value, void *extra_args)
+{
+   bool *dedicated_queue;
+
+   if (value == NULL || extra_args == NULL)
+   return -1;
+
+   dedicated_queue = extra_args;
+
+   if (strcmp("enable", value) == 0)
+   *dedicated_queue = true;
+   else if (strcmp("disable", value) == 0)
+   *dedicated_queue = false;
+   else
+   

[PATCH v4 4/6] net/bonding: support add port by data name

2023-10-18 Thread Chaoyong He
From: Long Wu 

Several ports may share the same PCI address, like nfp representor.
So we cannot add this type of ports to bonding port by "--vdev"
argument in dpdk-testpmd. But the port's data name is unique between
them, we include an option to add such ports to the bonding port.

After adding this feature, we can create a bonding port that member
port is this type of port by "--vdev" in dpdk-testpmd start command.

For example:
dpdk-testpmd -l 2-10 -s 0x8 -a ca:00.0,representor=[0-2]
--vdev 'net_bonding0,member=flower_repr_p0,member=flower_repr_p1,
mode=4,socket_id=1,xmit_policy=l34' -- -i

Note:
1. "ca:00.0" is nfp 4000 card.
2. "flower_repr_p0" and "flower_repr_p1" are nfp phy representor's
data name.

Signed-off-by: Long Wu 
Reviewed-by: James Hershaw 
Reviewed-by: Chaoyong He 
---
 drivers/net/bonding/rte_eth_bond_args.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/bonding/rte_eth_bond_args.c 
b/drivers/net/bonding/rte_eth_bond_args.c
index 8a3e4656ef..b320eb3038 100644
--- a/drivers/net/bonding/rte_eth_bond_args.c
+++ b/drivers/net/bonding/rte_eth_bond_args.c
@@ -70,6 +70,9 @@ find_port_id_by_dev_name(const char *name)
 
if (strcmp(rte_eth_devices[i].device->name, name) == 0)
return i;
+
+   if (strcmp(rte_eth_devices[i].data->name, name) == 0)
+   return i;
}
return -1;
 }
-- 
2.39.1



[PATCH v4 5/6] net/bonding: support checking valid bonding port ID

2023-10-18 Thread Chaoyong He
From: Long Wu 

Add API to support checking if the port id is a bonding
port id.

Signed-off-by: Long Wu 
Reviewed-by: James Hershaw 
Reviewed-by: Chaoyong He 
---
 drivers/net/bonding/rte_eth_bond.h | 13 +
 drivers/net/bonding/rte_eth_bond_api.c |  7 +++
 drivers/net/bonding/version.map|  1 +
 3 files changed, 21 insertions(+)

diff --git a/drivers/net/bonding/rte_eth_bond.h 
b/drivers/net/bonding/rte_eth_bond.h
index 936ab8c3a0..02ddb496bb 100644
--- a/drivers/net/bonding/rte_eth_bond.h
+++ b/drivers/net/bonding/rte_eth_bond.h
@@ -427,6 +427,19 @@ __rte_experimental
 int
 rte_eth_bond_hw_create_get(uint16_t bonding_port_id, uint16_t member_port_id);
 
+/**
+ * Check whether bonding port id is valid.
+ *
+ * @param port_id
+ *   Port ID of bonding device.
+ *
+ * @return
+ *   true means the port is a bonding device, false means not.
+ */
+__rte_experimental
+bool
+rte_eth_bond_is_valid_port(uint16_t port_id);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/drivers/net/bonding/rte_eth_bond_api.c 
b/drivers/net/bonding/rte_eth_bond_api.c
index 656ddd35a7..07efc7a7e5 100644
--- a/drivers/net/bonding/rte_eth_bond_api.c
+++ b/drivers/net/bonding/rte_eth_bond_api.c
@@ -1212,3 +1212,10 @@ rte_eth_bond_hw_create_get(uint16_t bonding_port_id, 
uint16_t member_port_id)
 
return member_dev->dev_ops->bond_hw_create_get(member_dev, bonding_dev);
 }
+
+
+bool
+rte_eth_bond_is_valid_port(uint16_t port_id)
+{
+   return (valid_bonding_port_id(port_id) == 0);
+}
diff --git a/drivers/net/bonding/version.map b/drivers/net/bonding/version.map
index 3cfff51269..97ef24dcdb 100644
--- a/drivers/net/bonding/version.map
+++ b/drivers/net/bonding/version.map
@@ -33,6 +33,7 @@ EXPERIMENTAL {
rte_eth_bond_8023ad_member_info;
rte_eth_bond_active_members_get;
rte_eth_bond_hw_create_get;
+   rte_eth_bond_is_valid_port;
rte_eth_bond_member_add;
rte_eth_bond_member_remove;
rte_eth_bond_members_get;
-- 
2.39.1



[PATCH v4 6/6] net/bonding: add commands for bonding port notification

2023-10-18 Thread Chaoyong He
From: Long Wu 

Add some commands to support bonding port notification in
dpdk-testpmd.

1. We can enable the notification by command:
"set bonding notify_member (port_id) (enable|disable)"

2. If member port hardware try to create the bonding port after
notification we can get the status by command:
"get bonding member hardware create (member_port_id) (bonding_port_id)"

Signed-off-by: Long Wu 
Reviewed-by: James Hershaw 
Reviewed-by: Chaoyong He 
---
 .../link_bonding_poll_mode_drv_lib.rst|  19 +++
 drivers/net/bonding/bonding_testpmd.c | 128 ++
 2 files changed, 147 insertions(+)

diff --git a/doc/guides/prog_guide/link_bonding_poll_mode_drv_lib.rst 
b/doc/guides/prog_guide/link_bonding_poll_mode_drv_lib.rst
index 60717a3587..9f6443ebd8 100644
--- a/doc/guides/prog_guide/link_bonding_poll_mode_drv_lib.rst
+++ b/doc/guides/prog_guide/link_bonding_poll_mode_drv_lib.rst
@@ -637,3 +637,22 @@ in balance mode with a transmission policy of layer 2+3::
 Members (3): [1 3 4]
 Active Members (3): [1 3 4]
 Primary: [3]
+
+set bonding notify_member
+~
+
+Set the notify member flag of bonding port::
+
+   testpmd> set bonding notify_member (port_id) (enable|disable)
+
+This command just set the flag of notification.
+If we enable it, bonding PMD will notify member ports when its some
+configurations changed. So member ports can do some private things, maybe 
hardware
+bonding creation and etc.
+
+get bonding member hardware create
+~~
+
+Get the status of member port hardware creating the bonding port::
+
+   testpmd> get bonding member hardware create (member_port_id) 
(bonding_port_id)
diff --git a/drivers/net/bonding/bonding_testpmd.c 
b/drivers/net/bonding/bonding_testpmd.c
index 8fcd6cadd0..da7d9cc58f 100644
--- a/drivers/net/bonding/bonding_testpmd.c
+++ b/drivers/net/bonding/bonding_testpmd.c
@@ -692,6 +692,124 @@ static cmdline_parse_inst_t 
cmd_set_bonding_agg_mode_policy = {
}
 };
 
+struct cmd_set_bonding_notify_member_result {
+   cmdline_fixed_string_t set;
+   cmdline_fixed_string_t bonding;
+   cmdline_fixed_string_t notify_member;
+   uint16_t port_num;
+   cmdline_fixed_string_t mode;
+};
+
+static void
+cmd_set_bonding_notify_member_parsed(void *parsed_result,
+   __rte_unused struct cmdline *cl, __rte_unused void *data)
+{
+   struct cmd_set_bonding_notify_member_result *res = parsed_result;
+   bool notify_member = false;
+
+   if (strcmp(res->notify_member, "enable") == 0)
+   notify_member = true;
+   else if (strcmp(res->notify_member, "disable") == 0)
+   notify_member = false;
+
+   rte_eth_bond_notify_member_flag_set(res->port_num, notify_member);
+}
+
+static cmdline_parse_token_string_t cmd_set_bonding_notify_member_set =
+   TOKEN_STRING_INITIALIZER(struct cmd_set_bonding_notify_member_result,
+   set, "set");
+static cmdline_parse_token_string_t cmd_set_bonding_notify_member_bonding =
+   TOKEN_STRING_INITIALIZER(struct cmd_set_bonding_notify_member_result,
+   bonding, "bonding");
+static cmdline_parse_token_string_t cmd_set_bonding_notify_member =
+   TOKEN_STRING_INITIALIZER(struct cmd_set_bonding_notify_member_result,
+   notify_member, "notify_member");
+static cmdline_parse_token_num_t cmd_set_bonding_notify_member_portnum =
+   TOKEN_NUM_INITIALIZER(struct cmd_set_bonding_notify_member_result,
+   port_num, RTE_UINT16);
+static cmdline_parse_token_string_t cmd_set_bonding_notify_member_mode_string =
+   TOKEN_STRING_INITIALIZER(struct cmd_set_bonding_notify_member_result,
+   mode, "enable#disable");
+
+static cmdline_parse_inst_t cmd_set_bonding_notify_member_ports = {
+   .f = cmd_set_bonding_notify_member_parsed,
+   .data = NULL,
+   .help_str = "set bonding notify_member (port_id) (enable|disable)",
+   .tokens = {
+   (void *)&cmd_set_bonding_notify_member_set,
+   (void *)&cmd_set_bonding_notify_member_bonding,
+   (void *)&cmd_set_bonding_notify_member,
+   (void *)&cmd_set_bonding_notify_member_portnum,
+   (void *)&cmd_set_bonding_notify_member_mode_string,
+   NULL
+   }
+};
+
+struct cmd_get_bonding_member_hw_create_result {
+   cmdline_fixed_string_t get;
+   cmdline_fixed_string_t bonding;
+   cmdline_fixed_string_t member;
+   cmdline_fixed_string_t hardware;
+   cmdline_fixed_string_t create;
+   uint16_t member_port_id;
+   uint16_t bonding_port_id;
+};
+
+static void
+cmd_get_bonding_member_hw_create_parsed(void *parsed_result,
+   __rte_unused struct cmdline *cl, __rte_unused void *data)
+{
+   struct cmd_get_bonding_member_hw_create_result *res = parsed_result;
+   int ret;
+
+   ret = rte_eth_bond_hw_create_get(res->bonding_port_id, 
res->member_port_id);
+  

[PATCH v5 0/3] rewrite fastpath routines

2023-10-18 Thread Vamsi Attunuru
This series adds new fastpath routines for cn10k & cn9k endpoint
devices and supports 32B Tx desciptor format which improves the
performance.

V5 changes:
- Series rebased

v4 changes:
- Use rte_atomic_xxx instead of __atomic_xxx built-ins

v2 & v3 changes:
- Fixed CI

Shijith Thotton (1):
  net/octeon_ep: support 32B IQ descriptor size

Vamsi Attunuru (2):
  net/octeon_ep: clean up receive routine
  net/octeon_ep: add new fastpath routines

 drivers/net/octeon_ep/cnxk_ep_rx.c| 310 ++
 drivers/net/octeon_ep/cnxk_ep_tx.c| 210 +
 drivers/net/octeon_ep/cnxk_ep_vf.c|  12 +-
 drivers/net/octeon_ep/cnxk_ep_vf.h|  13 ++
 drivers/net/octeon_ep/meson.build |   2 +
 drivers/net/octeon_ep/otx2_ep_vf.c|  11 +-
 drivers/net/octeon_ep/otx_ep_common.h | 127 ++-
 drivers/net/octeon_ep/otx_ep_ethdev.c |  69 +-
 drivers/net/octeon_ep/otx_ep_rxtx.c   | 257 +++--
 drivers/net/octeon_ep/otx_ep_rxtx.h   |  38 +++-
 drivers/net/octeon_ep/otx_ep_vf.c |   8 +
 11 files changed, 805 insertions(+), 252 deletions(-)
 create mode 100644 drivers/net/octeon_ep/cnxk_ep_rx.c
 create mode 100644 drivers/net/octeon_ep/cnxk_ep_tx.c

-- 
2.25.1



[PATCH v5 1/3] net/octeon_ep: support 32B IQ descriptor size

2023-10-18 Thread Vamsi Attunuru
From: Shijith Thotton 

Update input queue setup to consider descriptor size in driver conf.
The default instruction size for otx2 and cnxk devices has been updated
to 32 bytes.

Signed-off-by: Shijith Thotton 
---
 drivers/net/octeon_ep/cnxk_ep_vf.c| 10 +-
 drivers/net/octeon_ep/otx2_ep_vf.c| 10 +-
 drivers/net/octeon_ep/otx_ep_common.h |  4 
 drivers/net/octeon_ep/otx_ep_vf.c |  8 
 4 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/drivers/net/octeon_ep/cnxk_ep_vf.c 
b/drivers/net/octeon_ep/cnxk_ep_vf.c
index 92c2d2ca5c..7b3669fe0c 100644
--- a/drivers/net/octeon_ep/cnxk_ep_vf.c
+++ b/drivers/net/octeon_ep/cnxk_ep_vf.c
@@ -106,6 +106,14 @@ cnxk_ep_vf_setup_iq_regs(struct otx_ep_device *otx_ep, 
uint32_t iq_no)
return -EIO;
}
 
+   /* Configure input queue instruction size. */
+   if (otx_ep->conf->iq.instr_type == OTX_EP_32BYTE_INSTR)
+   reg_val &= ~(CNXK_EP_R_IN_CTL_IS_64B);
+   else
+   reg_val |= CNXK_EP_R_IN_CTL_IS_64B;
+   oct_ep_write64(reg_val, otx_ep->hw_addr + CNXK_EP_R_IN_CONTROL(iq_no));
+   iq->desc_size = otx_ep->conf->iq.instr_type;
+
/* Write the start of the input queue's ring and its size  */
oct_ep_write64(iq->base_addr_dma, otx_ep->hw_addr + 
CNXK_EP_R_IN_INSTR_BADDR(iq_no));
oct_ep_write64(iq->nb_desc, otx_ep->hw_addr + 
CNXK_EP_R_IN_INSTR_RSIZE(iq_no));
@@ -354,7 +362,7 @@ static const struct otx_ep_config default_cnxk_ep_conf = {
/* IQ attributes */
.iq= {
.max_iqs   = OTX_EP_CFG_IO_QUEUES,
-   .instr_type= OTX_EP_64BYTE_INSTR,
+   .instr_type= OTX_EP_32BYTE_INSTR,
.pending_list_size = (OTX_EP_MAX_IQ_DESCRIPTORS *
  OTX_EP_CFG_IO_QUEUES),
},
diff --git a/drivers/net/octeon_ep/otx2_ep_vf.c 
b/drivers/net/octeon_ep/otx2_ep_vf.c
index ced3a415a5..f72b8d25d7 100644
--- a/drivers/net/octeon_ep/otx2_ep_vf.c
+++ b/drivers/net/octeon_ep/otx2_ep_vf.c
@@ -256,6 +256,14 @@ otx2_vf_setup_iq_regs(struct otx_ep_device *otx_ep, 
uint32_t iq_no)
return -EIO;
}
 
+   /* Configure input queue instruction size. */
+   if (otx_ep->conf->iq.instr_type == OTX_EP_32BYTE_INSTR)
+   reg_val &= ~(SDP_VF_R_IN_CTL_IS_64B);
+   else
+   reg_val |= SDP_VF_R_IN_CTL_IS_64B;
+   oct_ep_write64(reg_val, otx_ep->hw_addr + SDP_VF_R_IN_CONTROL(iq_no));
+   iq->desc_size = otx_ep->conf->iq.instr_type;
+
/* Write the start of the input queue's ring and its size  */
oct_ep_write64(iq->base_addr_dma, otx_ep->hw_addr + 
SDP_VF_R_IN_INSTR_BADDR(iq_no));
oct_ep_write64(iq->nb_desc, otx_ep->hw_addr + 
SDP_VF_R_IN_INSTR_RSIZE(iq_no));
@@ -500,7 +508,7 @@ static const struct otx_ep_config default_otx2_ep_conf = {
/* IQ attributes */
.iq= {
.max_iqs   = OTX_EP_CFG_IO_QUEUES,
-   .instr_type= OTX_EP_64BYTE_INSTR,
+   .instr_type= OTX_EP_32BYTE_INSTR,
.pending_list_size = (OTX_EP_MAX_IQ_DESCRIPTORS *
  OTX_EP_CFG_IO_QUEUES),
},
diff --git a/drivers/net/octeon_ep/otx_ep_common.h 
b/drivers/net/octeon_ep/otx_ep_common.h
index c150cbe619..90e059cad0 100644
--- a/drivers/net/octeon_ep/otx_ep_common.h
+++ b/drivers/net/octeon_ep/otx_ep_common.h
@@ -11,6 +11,7 @@
 
 #define OTX_EP_MAX_RINGS_PER_VF(8)
 #define OTX_EP_CFG_IO_QUEUESOTX_EP_MAX_RINGS_PER_VF
+#define OTX_EP_32BYTE_INSTR (32)
 #define OTX_EP_64BYTE_INSTR (64)
 /*
  * Backpressure for SDP is configured on Octeon, and the minimum queue sizes
@@ -215,6 +216,9 @@ struct otx_ep_instr_queue {
/* Number of  descriptors in this ring. */
uint32_t nb_desc;
 
+   /* Size of the descriptor. */
+   uint8_t desc_size;
+
/* Input ring index, where the driver should write the next packet */
uint32_t host_write_index;
 
diff --git a/drivers/net/octeon_ep/otx_ep_vf.c 
b/drivers/net/octeon_ep/otx_ep_vf.c
index 4f3538146b..236b7a874c 100644
--- a/drivers/net/octeon_ep/otx_ep_vf.c
+++ b/drivers/net/octeon_ep/otx_ep_vf.c
@@ -120,6 +120,14 @@ otx_ep_setup_iq_regs(struct otx_ep_device *otx_ep, 
uint32_t iq_no)
return -EIO;
}
 
+   /* Configure input queue instruction size. */
+   if (iq->desc_size == OTX_EP_32BYTE_INSTR)
+   reg_val &= ~(OTX_EP_R_IN_CTL_IS_64B);
+   else
+   reg_val |= OTX_EP_R_IN_CTL_IS_64B;
+   oct_ep_write64(reg_val, otx_ep->hw_addr + OTX_EP_R_IN_CONTROL(iq_no));
+   iq->desc_size = otx_ep->conf->iq.instr_type;
+
/* Write the start of the input queue's ring and its size  */
otx_ep_write64(iq->base_addr_dma, otx_ep->hw_addr,
   OTX_EP_R_IN_INSTR_

[PATCH v5 2/3] net/octeon_ep: clean up receive routine

2023-10-18 Thread Vamsi Attunuru
Patch improves Rx routine and pkt count update routines,
packet count update routines need to drain inflight ISM
memory updates while decrementing the packet count register.

Signed-off-by: Vamsi Attunuru 
---
 drivers/net/octeon_ep/otx_ep_rxtx.c | 164 
 1 file changed, 70 insertions(+), 94 deletions(-)

diff --git a/drivers/net/octeon_ep/otx_ep_rxtx.c 
b/drivers/net/octeon_ep/otx_ep_rxtx.c
index b37fc8109f..2654e13e18 100644
--- a/drivers/net/octeon_ep/otx_ep_rxtx.c
+++ b/drivers/net/octeon_ep/otx_ep_rxtx.c
@@ -442,7 +442,15 @@ otx_vf_update_read_index(struct otx_ep_instr_queue *iq)
 * when count above halfway to saturation.
 */
rte_write32(val, iq->inst_cnt_reg);
-   *iq->inst_cnt_ism = 0;
+   rte_mb();
+
+   rte_write64(OTX2_SDP_REQUEST_ISM, iq->inst_cnt_reg);
+   while (rte_atomic_load_explicit(iq->inst_cnt_ism, 
rte_memory_order_relaxed) >=
+  val) {
+   rte_write64(OTX2_SDP_REQUEST_ISM, iq->inst_cnt_reg);
+   rte_mb();
+   }
+
iq->inst_cnt_ism_prev = 0;
}
rte_write64(OTX2_SDP_REQUEST_ISM, iq->inst_cnt_reg);
@@ -567,9 +575,7 @@ prepare_xmit_gather_list(struct otx_ep_instr_queue *iq, 
struct rte_mbuf *m, uint
 
finfo = &iq->req_list[iq->host_write_index].finfo;
*dptr = rte_mem_virt2iova(finfo->g.sg);
-   ih->s.tlen = pkt_len + ih->s.fsz;
-   ih->s.gsz = frags;
-   ih->s.gather = 1;
+   ih->u64 |= ((1ULL << 62) | ((uint64_t)frags << 48) | (pkt_len + 
ih->s.fsz));
 
while (frags--) {
finfo->g.sg[(j >> 2)].ptr[(j & mask)] = rte_mbuf_data_iova(m);
@@ -752,36 +758,26 @@ otx2_ep_xmit_pkts(void *tx_queue, struct rte_mbuf **pkts, 
uint16_t nb_pkts)
 static uint32_t
 otx_ep_droq_refill(struct otx_ep_droq *droq)
 {
-   struct otx_ep_droq_desc *desc_ring;
+   struct otx_ep_droq_desc *desc_ring = droq->desc_ring;
struct otx_ep_droq_info *info;
struct rte_mbuf *buf = NULL;
uint32_t desc_refilled = 0;
 
-   desc_ring = droq->desc_ring;
-
while (droq->refill_count && (desc_refilled < droq->nb_desc)) {
-   /* If a valid buffer exists (happens if there is no dispatch),
-* reuse the buffer, else allocate.
-*/
-   if (droq->recv_buf_list[droq->refill_idx] != NULL)
-   break;
-
buf = rte_pktmbuf_alloc(droq->mpool);
/* If a buffer could not be allocated, no point in
 * continuing
 */
-   if (buf == NULL) {
+   if (unlikely(!buf)) {
droq->stats.rx_alloc_failure++;
break;
}
info = rte_pktmbuf_mtod(buf, struct otx_ep_droq_info *);
-   memset(info, 0, sizeof(*info));
+   info->length = 0;
 
droq->recv_buf_list[droq->refill_idx] = buf;
desc_ring[droq->refill_idx].buffer_ptr =
rte_mbuf_data_iova_default(buf);
-
-
droq->refill_idx = otx_ep_incr_index(droq->refill_idx, 1,
droq->nb_desc);
 
@@ -793,21 +789,18 @@ otx_ep_droq_refill(struct otx_ep_droq *droq)
 }
 
 static struct rte_mbuf *
-otx_ep_droq_read_packet(struct otx_ep_device *otx_ep,
-   struct otx_ep_droq *droq, int next_fetch)
+otx_ep_droq_read_packet(struct otx_ep_device *otx_ep, struct otx_ep_droq 
*droq, int next_fetch)
 {
volatile struct otx_ep_droq_info *info;
-   struct rte_mbuf *droq_pkt2 = NULL;
-   struct rte_mbuf *droq_pkt = NULL;
-   struct rte_net_hdr_lens hdr_lens;
-   struct otx_ep_droq_info *info2;
+   struct rte_mbuf *mbuf_next = NULL;
+   struct rte_mbuf *mbuf = NULL;
uint64_t total_pkt_len;
uint32_t pkt_len = 0;
int next_idx;
 
-   droq_pkt  = droq->recv_buf_list[droq->read_idx];
-   droq_pkt2  = droq->recv_buf_list[droq->read_idx];
-   info = rte_pktmbuf_mtod(droq_pkt, struct otx_ep_droq_info *);
+   mbuf = droq->recv_buf_list[droq->read_idx];
+   info = rte_pktmbuf_mtod(mbuf, struct otx_ep_droq_info *);
+
/* make sure info is available */
rte_rmb();
if (unlikely(!info->length)) {
@@ -828,32 +821,25 @@ otx_ep_droq_read_packet(struct otx_ep_device *otx_ep,
assert(0);
}
}
+
if (next_fetch) {
next_idx = otx_ep_incr_index(droq->read_idx, 1, droq->nb_desc);
-   droq_pkt2  = droq->recv_buf_list[next_idx];
-   info2 = rte_pktmbuf_mtod(droq_pkt2, struct otx_ep_droq_info *);
-   rte_prefetch_non_temporal((const void *)info2);
+   mbuf_next = droq->recv_buf_list[next_idx];
+   rte_prefetch0(rte_pktmbuf_mtod(mbuf_ne

[PATCH v5 3/3] net/octeon_ep: add new fastpath routines

2023-10-18 Thread Vamsi Attunuru
Adds new fastpath routines for cn10k & cn9k endpoint
devices and assigns the fastpath routines based on
the offload flags.

Patch also adds misc changes to improve performance
and code-readability.

Signed-off-by: Vamsi Attunuru 
---
 drivers/net/octeon_ep/cnxk_ep_rx.c| 310 ++
 drivers/net/octeon_ep/cnxk_ep_tx.c| 210 +
 drivers/net/octeon_ep/cnxk_ep_vf.c|   2 +
 drivers/net/octeon_ep/cnxk_ep_vf.h|  13 ++
 drivers/net/octeon_ep/meson.build |   2 +
 drivers/net/octeon_ep/otx2_ep_vf.c|   1 +
 drivers/net/octeon_ep/otx_ep_common.h | 125 ++-
 drivers/net/octeon_ep/otx_ep_ethdev.c |  69 +-
 drivers/net/octeon_ep/otx_ep_rxtx.c   |  93 +---
 drivers/net/octeon_ep/otx_ep_rxtx.h   |  38 +++-
 10 files changed, 706 insertions(+), 157 deletions(-)

diff --git a/drivers/net/octeon_ep/cnxk_ep_rx.c 
b/drivers/net/octeon_ep/cnxk_ep_rx.c
new file mode 100644
index 00..22bf3ce7a7
--- /dev/null
+++ b/drivers/net/octeon_ep/cnxk_ep_rx.c
@@ -0,0 +1,310 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Marvell.
+ */
+
+#include "otx_ep_common.h"
+#include "otx2_ep_vf.h"
+#include "otx_ep_rxtx.h"
+
+static inline int
+cnxk_ep_rx_refill_mbuf(struct otx_ep_droq *droq, uint32_t count)
+{
+   struct otx_ep_droq_desc *desc_ring = droq->desc_ring;
+   struct rte_mbuf **recv_buf_list = droq->recv_buf_list;
+   uint32_t refill_idx = droq->refill_idx;
+   struct rte_mbuf *buf;
+   uint32_t i;
+   int rc;
+
+   rc = rte_pktmbuf_alloc_bulk(droq->mpool, &recv_buf_list[refill_idx], 
count);
+   if (unlikely(rc)) {
+   droq->stats.rx_alloc_failure++;
+   return rc;
+   }
+
+   for (i = 0; i < count; i++) {
+   buf = recv_buf_list[refill_idx];
+   desc_ring[refill_idx].buffer_ptr = 
rte_mbuf_data_iova_default(buf);
+   refill_idx++;
+   }
+
+   droq->refill_idx = otx_ep_incr_index(droq->refill_idx, count, 
droq->nb_desc);
+   droq->refill_count -= count;
+
+   return 0;
+}
+
+static inline void
+cnxk_ep_rx_refill(struct otx_ep_droq *droq)
+{
+   uint32_t desc_refilled = 0, count;
+   uint32_t nb_desc = droq->nb_desc;
+   uint32_t refill_idx = droq->refill_idx;
+   int rc;
+
+   if (unlikely(droq->read_idx == refill_idx))
+   return;
+
+   if (refill_idx < droq->read_idx) {
+   count = droq->read_idx - refill_idx;
+   rc = cnxk_ep_rx_refill_mbuf(droq, count);
+   if (unlikely(rc)) {
+   droq->stats.rx_alloc_failure++;
+   return;
+   }
+   desc_refilled = count;
+   } else {
+   count = nb_desc - refill_idx;
+   rc = cnxk_ep_rx_refill_mbuf(droq, count);
+   if (unlikely(rc)) {
+   droq->stats.rx_alloc_failure++;
+   return;
+   }
+
+   desc_refilled = count;
+   count = droq->read_idx;
+   rc = cnxk_ep_rx_refill_mbuf(droq, count);
+   if (unlikely(rc)) {
+   droq->stats.rx_alloc_failure++;
+   return;
+   }
+   desc_refilled += count;
+   }
+
+   /* Flush the droq descriptor data to memory to be sure
+* that when we update the credits the data in memory is
+* accurate.
+*/
+   rte_io_wmb();
+   rte_write32(desc_refilled, droq->pkts_credit_reg);
+}
+
+static inline uint32_t
+cnxk_ep_check_rx_pkts(struct otx_ep_droq *droq)
+{
+   uint32_t new_pkts;
+   uint32_t val;
+
+   /* Batch subtractions from the HW counter to reduce PCIe traffic
+* This adds an extra local variable, but almost halves the
+* number of PCIe writes.
+*/
+   val = rte_atomic_load_explicit(droq->pkts_sent_ism, 
rte_memory_order_relaxed);
+   new_pkts = val - droq->pkts_sent_ism_prev;
+   droq->pkts_sent_ism_prev = val;
+
+   if (val > (uint32_t)(1 << 31)) {
+   /* Only subtract the packet count in the HW counter
+* when count above halfway to saturation.
+*/
+   rte_write64((uint64_t)val, droq->pkts_sent_reg);
+   rte_mb();
+
+   rte_write64(OTX2_SDP_REQUEST_ISM, droq->pkts_sent_reg);
+   while (rte_atomic_load_explicit(droq->pkts_sent_ism, 
rte_memory_order_relaxed) >=
+  val) {
+   rte_write64(OTX2_SDP_REQUEST_ISM, droq->pkts_sent_reg);
+   rte_mb();
+   }
+
+   droq->pkts_sent_ism_prev = 0;
+   }
+   rte_write64(OTX2_SDP_REQUEST_ISM, droq->pkts_sent_reg);
+   droq->pkts_pending += new_pkts;
+
+   return new_pkts;
+}
+
+static inline int16_t __rte_hot
+cnxk_ep_rx_pkts_to_process(struct otx_ep_droq *droq, uint16_t nb_pkts)
+{
+   if (droq->pkts_

[PATCH] bitops: mark new symbols as stable

2023-10-18 Thread David Marchand
Calling an experimental symbol from an inline helper triggers a warning
when such code is not compiled with experimental API.
This can be seen when rte_bitops.h gets (indirectly) included in OVS
builds.

On the other hand, rte_clz32, rte_clz64, rte_ctz32, rte_ctz64,
rte_popcount32, rte_popcount64 are inline helpers for abstracting common
bit counting functions. This part of the API is unlikely to change.

Mark those symbols as stable.

Fixes: 18898c4d06f9 ("eal: use abstracted bit count functions")

Signed-off-by: David Marchand 
---
Copying Techboard for info, as this goes against the usual policy of
marking new API as experimental.

---
 lib/eal/include/rte_bitops.h | 48 
 1 file changed, 48 deletions(-)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index 6b8ae8d3ac..174d25216d 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -280,9 +280,6 @@ rte_bit_relaxed_test_and_clear64(unsigned int nr, volatile 
uint64_t *addr)
 #ifdef RTE_TOOLCHAIN_MSVC
 
 /**
- * @warning
- * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
- *
  * Get the count of leading 0-bits in v.
  *
  * @param v
@@ -290,7 +287,6 @@ rte_bit_relaxed_test_and_clear64(unsigned int nr, volatile 
uint64_t *addr)
  * @return
  *   The count of leading zero bits.
  */
-__rte_experimental
 static inline unsigned int
 rte_clz32(uint32_t v)
 {
@@ -302,9 +298,6 @@ rte_clz32(uint32_t v)
 }
 
 /**
- * @warning
- * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
- *
  * Get the count of leading 0-bits in v.
  *
  * @param v
@@ -312,7 +305,6 @@ rte_clz32(uint32_t v)
  * @return
  *   The count of leading zero bits.
  */
-__rte_experimental
 static inline unsigned int
 rte_clz64(uint64_t v)
 {
@@ -324,9 +316,6 @@ rte_clz64(uint64_t v)
 }
 
 /**
- * @warning
- * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
- *
  * Get the count of trailing 0-bits in v.
  *
  * @param v
@@ -334,7 +323,6 @@ rte_clz64(uint64_t v)
  * @return
  *   The count of trailing zero bits.
  */
-__rte_experimental
 static inline unsigned int
 rte_ctz32(uint32_t v)
 {
@@ -346,9 +334,6 @@ rte_ctz32(uint32_t v)
 }
 
 /**
- * @warning
- * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
- *
  * Get the count of trailing 0-bits in v.
  *
  * @param v
@@ -356,7 +341,6 @@ rte_ctz32(uint32_t v)
  * @return
  *   The count of trailing zero bits.
  */
-__rte_experimental
 static inline unsigned int
 rte_ctz64(uint64_t v)
 {
@@ -368,9 +352,6 @@ rte_ctz64(uint64_t v)
 }
 
 /**
- * @warning
- * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
- *
  * Get the count of 1-bits in v.
  *
  * @param v
@@ -378,7 +359,6 @@ rte_ctz64(uint64_t v)
  * @return
  *   The count of 1-bits.
  */
-__rte_experimental
 static inline unsigned int
 rte_popcount32(uint32_t v)
 {
@@ -386,9 +366,6 @@ rte_popcount32(uint32_t v)
 }
 
 /**
- * @warning
- * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
- *
  * Get the count of 1-bits in v.
  *
  * @param v
@@ -396,7 +373,6 @@ rte_popcount32(uint32_t v)
  * @return
  *   The count of 1-bits.
  */
-__rte_experimental
 static inline unsigned int
 rte_popcount64(uint64_t v)
 {
@@ -406,9 +382,6 @@ rte_popcount64(uint64_t v)
 #else
 
 /**
- * @warning
- * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
- *
  * Get the count of leading 0-bits in v.
  *
  * @param v
@@ -416,7 +389,6 @@ rte_popcount64(uint64_t v)
  * @return
  *   The count of leading zero bits.
  */
-__rte_experimental
 static inline unsigned int
 rte_clz32(uint32_t v)
 {
@@ -424,9 +396,6 @@ rte_clz32(uint32_t v)
 }
 
 /**
- * @warning
- * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
- *
  * Get the count of leading 0-bits in v.
  *
  * @param v
@@ -434,7 +403,6 @@ rte_clz32(uint32_t v)
  * @return
  *   The count of leading zero bits.
  */
-__rte_experimental
 static inline unsigned int
 rte_clz64(uint64_t v)
 {
@@ -442,9 +410,6 @@ rte_clz64(uint64_t v)
 }
 
 /**
- * @warning
- * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
- *
  * Get the count of trailing 0-bits in v.
  *
  * @param v
@@ -452,7 +417,6 @@ rte_clz64(uint64_t v)
  * @return
  *   The count of trailing zero bits.
  */
-__rte_experimental
 static inline unsigned int
 rte_ctz32(uint32_t v)
 {
@@ -460,9 +424,6 @@ rte_ctz32(uint32_t v)
 }
 
 /**
- * @warning
- * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
- *
  * Get the count of trailing 0-bits in v.
  *
  * @param v
@@ -470,7 +431,6 @@ rte_ctz32(uint32_t v)
  * @return
  *   The count of trailing zero bits.
  */
-__rte_experimental
 static inline unsigned int
 rte_ctz64(uint64_t v)
 {
@@ -478,9 +438,6 @@ rte_ctz64(uint64_t v)
 }
 
 /**
- * @warning
- * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
- *
  * Get the co

Re: [EXT] Re: [PATCH v2 1/1] usertools/rss: add CNXK RSS key

2023-10-18 Thread Thomas Monjalon
18/10/2023 09:26, Sunil Kumar Kori:
> From: Robin Jarry 
> > From: Sunil Kumar Kori 
> > >
> > > This patch adds RSS key for CNXK platforms. CNXK platform uses
> > > 48 bytes long key for hash calculations.
> > >
> > > For the same patch also updates help mesaages to provide range
> > > information for supporting NICs/platforms.
> > >
> > > Also CNXK uses reta size as 64 so to get correct offset to retrieve
> > > queue index, user must pass reta_size option as 64 i.e. -t 64.
> > 
> > I think we should add some driver abstraction that contains the required key
> > length and default reta size. Instead of requiring the user to guess the 
> > correct
> > values. Is that something you could do?
> > 
> Okay but in either case i.e. -t option or driver abstraction, user must know 
> the reta size and key size before configuring.
> So  I am not sure that how adding driver abstraction will help to solve this 
> issue unless/until its documented somewhere.

You can start with an option to get the size printed, depending on driver name.

> So for current release, I am planning to go this version as it is because we 
> are close.
> Later on we can think of it and add required support. 
> Please provide input on it.

Please provide a more user friendly experience in this release.





Re: [PATCH v2] dma/cnxk: offload source buffer free

2023-10-18 Thread Jerin Jacob
On Wed, Oct 18, 2023 at 1:17 PM Vamsi Krishna Attunuru
 wrote:
>
>
>
> > -Original Message-
> > From: Amit Prakash Shukla 
> > Sent: Wednesday, October 18, 2023 12:24 AM
> > To: Vamsi Krishna Attunuru 
> > Cc: dev@dpdk.org; Jerin Jacob Kollanukkaran ;
> > fengcheng...@huawei.com; kevin.la...@intel.com;
> > bruce.richard...@intel.com; conor.wa...@intel.com; g.si...@nxp.com;
> > sachin.sax...@oss.nxp.com; hemant.agra...@nxp.com;
> > cheng1.ji...@intel.com; Nithin Kumar Dabilpuram
> > ; Anoob Joseph ;
> > m...@smartsharesystems.com; Amit Prakash Shukla
> > 
> > Subject: [PATCH v2] dma/cnxk: offload source buffer free
> >
> > Added support in driver, to offload source buffer free to hardware on
> > completion of DMA transfer.
> >
> > Signed-off-by: Amit Prakash Shukla 
> > ---
> > v2:
> > - Patch rebased.
> >
> > v1:
> > - Driver implementation from RFC.
> >
>
> Acked-by: Vamsi Attunuru 

Updated release notes as follows

[for-next-net]dell[dpdk-next-net-mrvl] $ git diff
diff --git a/doc/guides/rel_notes/release_23_11.rst
b/doc/guides/rel_notes/release_23_11.rst
index 0a6fc76a9d..d7f4484558 100644
--- a/doc/guides/rel_notes/release_23_11.rst
+++ b/doc/guides/rel_notes/release_23_11.rst
@@ -238,6 +238,10 @@ New Features
 to get the remaining ticks to expire for a given event timer.
   * Added link profiles support, up to two link profiles are supported.

+* **Updated Marvell cnxk dmadev driver.**
+
+  * Added support for source buffer auto free for memory to device DMA.
+
 * **Added dispatcher library.**


Updated the git commit as follows and applied to
dpdk-next-net-mrvl/for-next-net. Thanks

dma/cnxk: support source buffer auto free

Added support to offload source buffer free to hardware
on completion of DMA transfer.

Signed-off-by: Amit Prakash Shukla 
Acked-by: Vamsi Attunuru 


[PATCH] ml/cnxk: don't export internal headers

2023-10-18 Thread David Marchand
driver_sdk_headers is used to expose headers that may be used by
external drivers.
Don't export ml/cnxk internal headers.

Fixes: fe83ffd9ec2e ("ml/cnxk: add skeleton")

Signed-off-by: David Marchand 
---
 drivers/ml/cnxk/meson.build | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 94fa4283b1..5bf17d8ae3 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -7,13 +7,6 @@ if not is_linux or not dpdk_conf.get('RTE_ARCH_64')
 subdir_done()
 endif
 
-driver_sdk_headers = files(
-'cn10k_ml_dev.h',
-'cn10k_ml_ops.h',
-'cn10k_ml_model.h',
-'cn10k_ml_ocm.h',
-)
-
 sources = files(
 'cn10k_ml_dev.c',
 'cn10k_ml_ops.c',
-- 
2.41.0



Re: [EXT] Re: [PATCH v2 1/1] usertools/rss: add CNXK RSS key

2023-10-18 Thread Robin Jarry

Thomas Monjalon, Oct 18, 2023 at 11:14:

18/10/2023 09:26, Sunil Kumar Kori:
> From: Robin Jarry 
> > From: Sunil Kumar Kori 
> > >
> > > This patch adds RSS key for CNXK platforms. CNXK platform uses
> > > 48 bytes long key for hash calculations.
> > >
> > > For the same patch also updates help mesaages to provide range
> > > information for supporting NICs/platforms.
> > >
> > > Also CNXK uses reta size as 64 so to get correct offset to retrieve
> > > queue index, user must pass reta_size option as 64 i.e. -t 64.
> > 
> > I think we should add some driver abstraction that contains the required key

> > length and default reta size. Instead of requiring the user to guess the 
correct
> > values. Is that something you could do?
> > 
> Okay but in either case i.e. -t option or driver abstraction, user must know the reta size and key size before configuring.

> So  I am not sure that how adding driver abstraction will help to solve this 
issue unless/until its documented somewhere.

You can start with an option to get the size printed, depending on driver name.

> So for current release, I am planning to go this version as it is because we 
are close.
> Later on we can think of it and add required support. 
> Please provide input on it.


Please provide a more user friendly experience in this release.


I could have a shot at it since it may involve some refactoring. Also, 
existing supported drivers will benefit from it. This does not seem like 
it is directly related to CNXK.




[PATCH] hash: fix build with GFNI

2023-10-18 Thread David Marchand
As an external header, rte_thash_x86_gfni.h should be self sufficient.

Fixes: 3d4e27fd7ff0 ("use abstracted bit count functions")

Signed-off-by: David Marchand 
---
 lib/hash/rte_thash_x86_gfni.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/hash/rte_thash_x86_gfni.h b/lib/hash/rte_thash_x86_gfni.h
index fbec16dde0..b7c5a4ba7d 100644
--- a/lib/hash/rte_thash_x86_gfni.h
+++ b/lib/hash/rte_thash_x86_gfni.h
@@ -12,6 +12,7 @@
  * using Galois Fields New Instructions.
  */
 
+#include 
 #include 
 #include 
 
-- 
2.41.0



RE: [EXT] [PATCH] ml/cnxk: don't export internal headers

2023-10-18 Thread Srikanth Yalavarthi
> -Original Message-
> From: David Marchand 
> Sent: 18 October 2023 14:46
> To: dev@dpdk.org
> Cc: Jerin Jacob Kollanukkaran ; tho...@monjalon.net;
> Srikanth Yalavarthi ; Prince Takkar
> 
> Subject: [EXT] [PATCH] ml/cnxk: don't export internal headers
> 
> External Email
> 
> --
> driver_sdk_headers is used to expose headers that may be used by external
> drivers.
> Don't export ml/cnxk internal headers.
> 
> Fixes: fe83ffd9ec2e ("ml/cnxk: add skeleton")
> 
> Signed-off-by: David Marchand 

Acked-by: Srikanth Yalavarthi 


Re: [PATCH] event/dsw: fix missing device pointer

2023-10-18 Thread Bruce Richardson
On Wed, Oct 18, 2023 at 10:48:14AM +0530, Jerin Jacob wrote:
> On Tue, Oct 17, 2023 at 10:21 PM Jerin Jacob  wrote:
> >
> > On Tue, Oct 17, 2023 at 9:45 PM Bruce Richardson
> >  wrote:
> > >
> > > On Tue, Oct 17, 2023 at 09:34:04PM +0530, Jerin Jacob wrote:
> > > > On Tue, Oct 17, 2023 at 9:32 PM Bruce Richardson
> > > >  wrote:
> > > > >
> > > > > After calling rte_event_dev_info_get() the ".dev" field of the info
> > > > > structure should have a pointer to the underlying device, allowing the
> > > > > user to e.g. get the device name using using rte_dev_name(info.dev).
> > > > >
> > > > > The distributed software eventdev info structure did not return a
> > > > > correct device pointer, though, instead returning NULL, which caused
> > > > > crashes getting "rte_dev_name". Initializing the dev pointer inside 
> > > > > the
> > > > > "eventdev" struct in the device probe function fixes this by ensuring 
> > > > > we
> > > > > have a valid pointer to return in info_get calls.
> > > > >
> > > > > Fixes: 46a186b1f0c5 ("event/dsw: add device registration and build 
> > > > > system")
> > > > > Cc: mattias.ronnb...@ericsson.com
> > > > >
> > > > > Signed-off-by: Bruce Richardson 
> > > >
> > > > Is this issue for all "vdev" devices? if so, Please check for
> > > > drivers/event/skeleton too.
> > > >
> > > Yes, good point, looks like event/skeleton also returns NULL for the 
> > > device
> > > pointer.
> > >
> > > I'll do up a v3 with the extra patch in it.
> >
> > Looks there are more vdev devuces. Can we have common PMD function or
> > extend rte_event_pmd_vdev_init or so.
> 
> 
> @Richardson, Bruce I will be on vacation from Friday, So would like to
> give PR for rc2 before that.
> 
> Adding helper function in rc2 may be risky, Could you fix all vdev
> mentioned below.
> Helper work, I think, we can take in next release.
> 
Yes, I was going to reply with some similar sentiment. I think it would be
risky to try and do a proper solution in a hurry. I will attempt to fix all
vdevs for rc2.

/Bruce


RE: [EXT] Re: [PATCH v2 1/1] usertools/rss: add CNXK RSS key

2023-10-18 Thread Sunil Kumar Kori
> -Original Message-
> From: Robin Jarry 
> Sent: Wednesday, October 18, 2023 2:48 PM
> To: Thomas Monjalon ; Sunil Kumar Kori
> 
> Cc: dev@dpdk.org; Jerin Jacob 
> Subject: Re: [EXT] Re: [PATCH v2 1/1] usertools/rss: add CNXK RSS key
> 
> Thomas Monjalon, Oct 18, 2023 at 11:14:
> > 18/10/2023 09:26, Sunil Kumar Kori:
> > > From: Robin Jarry 
> > > > From: Sunil Kumar Kori 
> > > > >
> > > > > This patch adds RSS key for CNXK platforms. CNXK platform uses
> > > > > 48 bytes long key for hash calculations.
> > > > >
> > > > > For the same patch also updates help mesaages to provide range
> > > > > information for supporting NICs/platforms.
> > > > >
> > > > > Also CNXK uses reta size as 64 so to get correct offset to
> > > > > retrieve queue index, user must pass reta_size option as 64 i.e. -t 
> > > > > 64.
> > > >
> > > > I think we should add some driver abstraction that contains the
> > > > required key length and default reta size. Instead of requiring
> > > > the user to guess the correct values. Is that something you could do?
> > > >
> > > Okay but in either case i.e. -t option or driver abstraction, user must
> know the reta size and key size before configuring.
> > > So  I am not sure that how adding driver abstraction will help to solve 
> > > this
> issue unless/until its documented somewhere.
> >
> > You can start with an option to get the size printed, depending on driver
> name.
> >
> > > So for current release, I am planning to go this version as it is because 
> > > we
> are close.
> > > Later on we can think of it and add required support.
> > > Please provide input on it.
> >
> > Please provide a more user friendly experience in this release.
> 
> I could have a shot at it since it may involve some refactoring. Also, 
> existing
> supported drivers will benefit from it. This does not seem like it is directly
> related to CNXK.
Sure, Thanks. 


Re: [PATCH v4] bus/pci: fix legacy device IO port map in secondary process

2023-10-18 Thread David Marchand
On Mon, Oct 9, 2023 at 5:06 AM Ma, WenwuX  wrote:
> > From a pci bus API pov, nothing prevents a driver from mixing memory
> > mapped with vfio and ioport resources (iow, calls to
> > rte_pci_map_resource() and rte_pci_ioport_map()).
> > IOW, it may not be the case with the net/virtio driver but, in theory,
> > rte_pci_ioport_map()/pci_vfio_ioport_map() may be called after a
> > rte_pci_map_resource() call.
> >
> > In a similar manner, from the API pov,
> > rte_pci_ioport_map()/pci_vfio_ioport_map() may be called for multiple bars.
> >
> > In summary, nothing in this patch checks that vfio has been configured 
> > already
> > and I think we need a refcount to handle those situations.
> >
> We call rte_vfio_setup_device just to get device info, we can call 
> rte_vfio_release_device as soon as pci_vfio_fill_regions is done.
> This avoids reference counting operations, do you think it works?

Afaics, rte_vfio_setup_device should not be called if a call to
rte_pci_map_device for this device was successful (rte_pci_map_device
itself calls rte_vfio_setup_device).
And as a consequence, calling rte_vfio_release_device cannot be done
unconditionnally neither.


-- 
David Marchand



Re: [PATCH v9 12/12] app/graph: support l3fwd use case

2023-10-18 Thread Jerin Jacob
On Wed, Oct 18, 2023 at 12:05 PM  wrote:
>
> From: Rakesh Kudurumalla 
>
> Adds an use case l3fwd. It contains a dedicated l3fwd.cli file
> mentioning commands to configure the required resources.
>
> Once application successfully parses the l3fwd.cli then a graph is
> created having below nodes:
>  - ethdev_rx -> pkt_cls
>
>  - pkt_cls -> ip4_lookup
>  - pkt_cls -> ip6_lookup
>  - pkt_cls -> pkt_drop
>
>  - ip4_lookup -> ip4_rewrite
>  - ip4_lookup -> pkt_drop
>
>  - ip6_lookup -> ip6_rewrite
>  - ip6_lookup -> pkt_drop
>
>  - ip4_rewrite -> ethdev_tx
>  - ip4_rewrite -> pkt_drop
>
>  - ip6_rewrite -> ethdev_tx
>  - ip6_rewrite -> pkt_drop
>
>  - ethdev_tx -> pkt_drop
>
> Signed-off-by: Sunil Kumar Kori 
> Signed-off-by: Rakesh Kudurumalla 

> diff --git a/doc/guides/tools/graph.rst b/doc/guides/tools/graph.rst
> index ed0fdfffe1..fc62e53ae1 100644
> --- a/doc/guides/tools/graph.rst
> +++ b/doc/guides/tools/graph.rst
> @@ -12,6 +12,10 @@ Based on the input file, application creates a graph to 
> cater the use case.
>
>  Also this application framework can be used by other graph based 
> applications.
>
> +Supported Use cases
> +---
> + * l3fwd

Please add commands, .cli file and pcap generation recipe to test this
with just ring pmd + pcap file.(i.e without HW) for others to easily
use and verify it.


Re: [PATCH 0/4] add telemetry commands for TM capabilities

2023-10-18 Thread fengchengwen
Series-acked-by: Chengwen Feng 

On 2023/10/18 9:39, Jie Hai wrote:
> This patch adds telemetry commands for TM capabilities and make some
> bufix for hns3 driver.
> 
> Jie Hai (4):
>   net/hns3: fix a typo
>   ethdev: add telemetry command for TM capabilities
>   ethdev: add telemetry command for TM level capabilities
>   ethdev: add telemetry command for TM node capabilities
> 
>  drivers/net/hns3/hns3_tm.c|   4 +-
>  lib/ethdev/rte_ethdev_telemetry.c | 380 ++
>  2 files changed, 382 insertions(+), 2 deletions(-)
> 


Re: [PATCH v1 1/2] baseband/acc: support ACC100 deRM corner case SDK

2023-10-18 Thread David Marchand
On Tue, Oct 10, 2023 at 7:55 PM Hernan Vargas  wrote:
>
> Implement de-ratematch pre-processing for ACC100 SW corner cases.
> Some specific 5GUL FEC corner cases may cause unintended back pressure
> and in some cases a potential stability issue on the ACC100.
> The PMD can detect such code block configuration and issue an info
> message to the user.
>
> Signed-off-by: Hernan Vargas 
> ---
>  drivers/baseband/acc/meson.build  | 23 ++-
>  drivers/baseband/acc/rte_acc100_pmd.c | 59 +--
>  2 files changed, 77 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/baseband/acc/meson.build 
> b/drivers/baseband/acc/meson.build
> index 27a654b50153..84f4fea635ef 100644
> --- a/drivers/baseband/acc/meson.build
> +++ b/drivers/baseband/acc/meson.build
> @@ -1,7 +1,28 @@
>  # SPDX-License-Identifier: BSD-3-Clause
>  # Copyright(c) 2020 Intel Corporation
>
> -deps += ['bus_pci']
...
> +deps += ['bbdev', 'bus_pci']

This part is likely a rebase damage.
See: b7b8de26f34d ("drivers: add dependencies for some classes")


-- 
David Marchand



Re: [PATCH v1] bus/pci: get PCI address from rte_device

2023-10-18 Thread David Marchand
On Wed, May 31, 2023 at 11:52 AM David Marchand
 wrote:
>
> (I reformatted the mail a bit)
>
> On Wed, May 31, 2023 at 10:51 AM Elena Agostini  wrote:
> > > On Wed, May 31, 2023 at 10:44 AM Elena Agostini eagost...@nvidia.com 
> > > wrote:
> > > > > On Tue, May 30, 2023 at 1:48 PM eagost...@nvidia.com wrote:
> > > > > > From: Elena Agostini eagost...@nvidia.com
> > > > > > In DPDK 22.11 pci bus related structure have been hidden internally
> > > > > > so the application doesn't have a direct access to those info 
> > > > > > anymore.
> > > > > > This patch introduces a get function to retrieve a PCI address
> > > > > > from an rte_device handler.
> > > > > > Signed-off-by: Elena Agostini eagost...@nvidia.com
>
> > > > > I would prefer we don't add specific bus API when there is an 
> > > > > alternative.
> > > > > The PCI address is already reported as a string in the generic device
> > > > > object name.
> > > > > Would that be enough for your usecase?
>
> > > > No as I need to parse anyway the PCI address string in the form of 
> > > > domain/bus/devid/function.
>
> > > I am curious. Can you explain why you would need such information?
>
> > Use-case is the Aerial 5G where two processes have to exchange info
> > about PCI devices sending messages according to some specific format.
>
> It seems strange that different processes need to exchange this bus
> level information.
> For dataplane, having a simpler metadata (like a portid maybe?) is
> better than a domain/bus/devid/function quartet.
> For controlplane, having an abstraction or a human readable string is
> probably better too.
>
> In any case, for what you request here, the application can parse the
> generic device name into a rte_pci_addr via rte_pci_addr_parse().
> Is it not enough?

No reply for some time now, marking this patch as rejected.


-- 
David Marchand



Re: [PATCH] ci: test manuals generation in GHA

2023-10-18 Thread David Marchand
On Tue, Oct 17, 2023 at 2:29 PM Aaron Conole  wrote:
> David Marchand  writes:
> > Add missing package so manuals are generated as part of the docs check.
> >
> > Signed-off-by: David Marchand 
> Reviewed-by: Aaron Conole 

Applied, thanks.


-- 
David Marchand



RE: [PATCH v3] mbuf: add ESP packet type

2023-10-18 Thread Alexander Kozyrev
> As per IPSEC ESP RFC 4303, for both tunnel mode or transport mode,
> next proto 50, so we cannot identify a packet is for tunnel mode or
> transport mode by just packet parsing.
> Am I missing something ?
You are absolutely correct, the only way to tell the difference is
to parse the next_proto field in the ESP header itself.
But this field is encrypted, according to RFC, and not really available for 
parsing. 

> Currently there is already a PTYPE `RTE_PTYPE_TUNNEL_ESP` being used
> by all drivers / ipsec-secgw to indicate ESP packet. So why is this
> needed ?
The idea was to add the possibility to distinguish packets in these two modes.
But you are right, it doesn't seem achievable without decrypting the packet 
first.

> There is also a documentation issue with `RTE_PTYPE_TUNNEL_ESP` where
> it indicates next-proto of 51 but it should have been 50.
> next-proto of 51 is for IPSEC AH.
Yes, documentation is incorrect there.

Thanks for bringing this up, Nithin, I think we can live with 
RTE_PTYPE_TUNNEL_ESP.


[PATCH v6 0/3] rewrite fastpath routines

2023-10-18 Thread Vamsi Attunuru
This series adds new fastpath routines for cn10k & cn9k endpoint
devices and supports 32B Tx descriptor format which improves the
performance.

V6 changes:
- Use __atomic_xxx built-ins to fix CI build

V5 changes:
- Series rebased

v4 changes:
- Use rte_atomic_xxx instead of __atomic_xxx built-ins

v2 & v3 changes:
- Fixed CI

Shijith Thotton (1):
  net/octeon_ep: support 32B IQ descriptor size

Vamsi Attunuru (2):
  net/octeon_ep: clean up receive routine
  net/octeon_ep: add new fastpath routines

 drivers/net/octeon_ep/cnxk_ep_rx.c| 309 ++
 drivers/net/octeon_ep/cnxk_ep_tx.c| 209 +
 drivers/net/octeon_ep/cnxk_ep_vf.c|  12 +-
 drivers/net/octeon_ep/cnxk_ep_vf.h|  13 ++
 drivers/net/octeon_ep/meson.build |   2 +
 drivers/net/octeon_ep/otx2_ep_vf.c|  11 +-
 drivers/net/octeon_ep/otx_ep_common.h | 127 ++-
 drivers/net/octeon_ep/otx_ep_ethdev.c |  69 +-
 drivers/net/octeon_ep/otx_ep_rxtx.c   | 255 +++--
 drivers/net/octeon_ep/otx_ep_rxtx.h   |  38 +++-
 drivers/net/octeon_ep/otx_ep_vf.c |   8 +
 11 files changed, 801 insertions(+), 252 deletions(-)
 create mode 100644 drivers/net/octeon_ep/cnxk_ep_rx.c
 create mode 100644 drivers/net/octeon_ep/cnxk_ep_tx.c

-- 
2.25.1



[PATCH v6 1/3] net/octeon_ep: support 32B IQ descriptor size

2023-10-18 Thread Vamsi Attunuru
From: Shijith Thotton 

Update input queue setup to consider descriptor size in driver conf.
The default instruction size for otx2 and cnxk devices has been updated
to 32 bytes.

Signed-off-by: Shijith Thotton 
---
 drivers/net/octeon_ep/cnxk_ep_vf.c| 10 +-
 drivers/net/octeon_ep/otx2_ep_vf.c| 10 +-
 drivers/net/octeon_ep/otx_ep_common.h |  4 
 drivers/net/octeon_ep/otx_ep_vf.c |  8 
 4 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/drivers/net/octeon_ep/cnxk_ep_vf.c 
b/drivers/net/octeon_ep/cnxk_ep_vf.c
index 92c2d2ca5c..7b3669fe0c 100644
--- a/drivers/net/octeon_ep/cnxk_ep_vf.c
+++ b/drivers/net/octeon_ep/cnxk_ep_vf.c
@@ -106,6 +106,14 @@ cnxk_ep_vf_setup_iq_regs(struct otx_ep_device *otx_ep, 
uint32_t iq_no)
return -EIO;
}
 
+   /* Configure input queue instruction size. */
+   if (otx_ep->conf->iq.instr_type == OTX_EP_32BYTE_INSTR)
+   reg_val &= ~(CNXK_EP_R_IN_CTL_IS_64B);
+   else
+   reg_val |= CNXK_EP_R_IN_CTL_IS_64B;
+   oct_ep_write64(reg_val, otx_ep->hw_addr + CNXK_EP_R_IN_CONTROL(iq_no));
+   iq->desc_size = otx_ep->conf->iq.instr_type;
+
/* Write the start of the input queue's ring and its size  */
oct_ep_write64(iq->base_addr_dma, otx_ep->hw_addr + 
CNXK_EP_R_IN_INSTR_BADDR(iq_no));
oct_ep_write64(iq->nb_desc, otx_ep->hw_addr + 
CNXK_EP_R_IN_INSTR_RSIZE(iq_no));
@@ -354,7 +362,7 @@ static const struct otx_ep_config default_cnxk_ep_conf = {
/* IQ attributes */
.iq= {
.max_iqs   = OTX_EP_CFG_IO_QUEUES,
-   .instr_type= OTX_EP_64BYTE_INSTR,
+   .instr_type= OTX_EP_32BYTE_INSTR,
.pending_list_size = (OTX_EP_MAX_IQ_DESCRIPTORS *
  OTX_EP_CFG_IO_QUEUES),
},
diff --git a/drivers/net/octeon_ep/otx2_ep_vf.c 
b/drivers/net/octeon_ep/otx2_ep_vf.c
index ced3a415a5..f72b8d25d7 100644
--- a/drivers/net/octeon_ep/otx2_ep_vf.c
+++ b/drivers/net/octeon_ep/otx2_ep_vf.c
@@ -256,6 +256,14 @@ otx2_vf_setup_iq_regs(struct otx_ep_device *otx_ep, 
uint32_t iq_no)
return -EIO;
}
 
+   /* Configure input queue instruction size. */
+   if (otx_ep->conf->iq.instr_type == OTX_EP_32BYTE_INSTR)
+   reg_val &= ~(SDP_VF_R_IN_CTL_IS_64B);
+   else
+   reg_val |= SDP_VF_R_IN_CTL_IS_64B;
+   oct_ep_write64(reg_val, otx_ep->hw_addr + SDP_VF_R_IN_CONTROL(iq_no));
+   iq->desc_size = otx_ep->conf->iq.instr_type;
+
/* Write the start of the input queue's ring and its size  */
oct_ep_write64(iq->base_addr_dma, otx_ep->hw_addr + 
SDP_VF_R_IN_INSTR_BADDR(iq_no));
oct_ep_write64(iq->nb_desc, otx_ep->hw_addr + 
SDP_VF_R_IN_INSTR_RSIZE(iq_no));
@@ -500,7 +508,7 @@ static const struct otx_ep_config default_otx2_ep_conf = {
/* IQ attributes */
.iq= {
.max_iqs   = OTX_EP_CFG_IO_QUEUES,
-   .instr_type= OTX_EP_64BYTE_INSTR,
+   .instr_type= OTX_EP_32BYTE_INSTR,
.pending_list_size = (OTX_EP_MAX_IQ_DESCRIPTORS *
  OTX_EP_CFG_IO_QUEUES),
},
diff --git a/drivers/net/octeon_ep/otx_ep_common.h 
b/drivers/net/octeon_ep/otx_ep_common.h
index c150cbe619..90e059cad0 100644
--- a/drivers/net/octeon_ep/otx_ep_common.h
+++ b/drivers/net/octeon_ep/otx_ep_common.h
@@ -11,6 +11,7 @@
 
 #define OTX_EP_MAX_RINGS_PER_VF(8)
 #define OTX_EP_CFG_IO_QUEUESOTX_EP_MAX_RINGS_PER_VF
+#define OTX_EP_32BYTE_INSTR (32)
 #define OTX_EP_64BYTE_INSTR (64)
 /*
  * Backpressure for SDP is configured on Octeon, and the minimum queue sizes
@@ -215,6 +216,9 @@ struct otx_ep_instr_queue {
/* Number of  descriptors in this ring. */
uint32_t nb_desc;
 
+   /* Size of the descriptor. */
+   uint8_t desc_size;
+
/* Input ring index, where the driver should write the next packet */
uint32_t host_write_index;
 
diff --git a/drivers/net/octeon_ep/otx_ep_vf.c 
b/drivers/net/octeon_ep/otx_ep_vf.c
index 4f3538146b..236b7a874c 100644
--- a/drivers/net/octeon_ep/otx_ep_vf.c
+++ b/drivers/net/octeon_ep/otx_ep_vf.c
@@ -120,6 +120,14 @@ otx_ep_setup_iq_regs(struct otx_ep_device *otx_ep, 
uint32_t iq_no)
return -EIO;
}
 
+   /* Configure input queue instruction size. */
+   if (iq->desc_size == OTX_EP_32BYTE_INSTR)
+   reg_val &= ~(OTX_EP_R_IN_CTL_IS_64B);
+   else
+   reg_val |= OTX_EP_R_IN_CTL_IS_64B;
+   oct_ep_write64(reg_val, otx_ep->hw_addr + OTX_EP_R_IN_CONTROL(iq_no));
+   iq->desc_size = otx_ep->conf->iq.instr_type;
+
/* Write the start of the input queue's ring and its size  */
otx_ep_write64(iq->base_addr_dma, otx_ep->hw_addr,
   OTX_EP_R_IN_INSTR_

[PATCH v6 2/3] net/octeon_ep: clean up receive routine

2023-10-18 Thread Vamsi Attunuru
Patch improves Rx routine and pkt count update routines,
packet count update routines need to drain inflight ISM
memory updates while decrementing the packet count register.

Signed-off-by: Vamsi Attunuru 
---
 drivers/net/octeon_ep/otx_ep_rxtx.c | 162 
 1 file changed, 68 insertions(+), 94 deletions(-)

diff --git a/drivers/net/octeon_ep/otx_ep_rxtx.c 
b/drivers/net/octeon_ep/otx_ep_rxtx.c
index b37fc8109f..4c509a419f 100644
--- a/drivers/net/octeon_ep/otx_ep_rxtx.c
+++ b/drivers/net/octeon_ep/otx_ep_rxtx.c
@@ -442,7 +442,14 @@ otx_vf_update_read_index(struct otx_ep_instr_queue *iq)
 * when count above halfway to saturation.
 */
rte_write32(val, iq->inst_cnt_reg);
-   *iq->inst_cnt_ism = 0;
+   rte_mb();
+
+   rte_write64(OTX2_SDP_REQUEST_ISM, iq->inst_cnt_reg);
+   while (__atomic_load_n(iq->inst_cnt_ism, __ATOMIC_RELAXED) >= 
val) {
+   rte_write64(OTX2_SDP_REQUEST_ISM, iq->inst_cnt_reg);
+   rte_mb();
+   }
+
iq->inst_cnt_ism_prev = 0;
}
rte_write64(OTX2_SDP_REQUEST_ISM, iq->inst_cnt_reg);
@@ -567,9 +574,7 @@ prepare_xmit_gather_list(struct otx_ep_instr_queue *iq, 
struct rte_mbuf *m, uint
 
finfo = &iq->req_list[iq->host_write_index].finfo;
*dptr = rte_mem_virt2iova(finfo->g.sg);
-   ih->s.tlen = pkt_len + ih->s.fsz;
-   ih->s.gsz = frags;
-   ih->s.gather = 1;
+   ih->u64 |= ((1ULL << 62) | ((uint64_t)frags << 48) | (pkt_len + 
ih->s.fsz));
 
while (frags--) {
finfo->g.sg[(j >> 2)].ptr[(j & mask)] = rte_mbuf_data_iova(m);
@@ -752,36 +757,26 @@ otx2_ep_xmit_pkts(void *tx_queue, struct rte_mbuf **pkts, 
uint16_t nb_pkts)
 static uint32_t
 otx_ep_droq_refill(struct otx_ep_droq *droq)
 {
-   struct otx_ep_droq_desc *desc_ring;
+   struct otx_ep_droq_desc *desc_ring = droq->desc_ring;
struct otx_ep_droq_info *info;
struct rte_mbuf *buf = NULL;
uint32_t desc_refilled = 0;
 
-   desc_ring = droq->desc_ring;
-
while (droq->refill_count && (desc_refilled < droq->nb_desc)) {
-   /* If a valid buffer exists (happens if there is no dispatch),
-* reuse the buffer, else allocate.
-*/
-   if (droq->recv_buf_list[droq->refill_idx] != NULL)
-   break;
-
buf = rte_pktmbuf_alloc(droq->mpool);
/* If a buffer could not be allocated, no point in
 * continuing
 */
-   if (buf == NULL) {
+   if (unlikely(!buf)) {
droq->stats.rx_alloc_failure++;
break;
}
info = rte_pktmbuf_mtod(buf, struct otx_ep_droq_info *);
-   memset(info, 0, sizeof(*info));
+   info->length = 0;
 
droq->recv_buf_list[droq->refill_idx] = buf;
desc_ring[droq->refill_idx].buffer_ptr =
rte_mbuf_data_iova_default(buf);
-
-
droq->refill_idx = otx_ep_incr_index(droq->refill_idx, 1,
droq->nb_desc);
 
@@ -793,21 +788,18 @@ otx_ep_droq_refill(struct otx_ep_droq *droq)
 }
 
 static struct rte_mbuf *
-otx_ep_droq_read_packet(struct otx_ep_device *otx_ep,
-   struct otx_ep_droq *droq, int next_fetch)
+otx_ep_droq_read_packet(struct otx_ep_device *otx_ep, struct otx_ep_droq 
*droq, int next_fetch)
 {
volatile struct otx_ep_droq_info *info;
-   struct rte_mbuf *droq_pkt2 = NULL;
-   struct rte_mbuf *droq_pkt = NULL;
-   struct rte_net_hdr_lens hdr_lens;
-   struct otx_ep_droq_info *info2;
+   struct rte_mbuf *mbuf_next = NULL;
+   struct rte_mbuf *mbuf = NULL;
uint64_t total_pkt_len;
uint32_t pkt_len = 0;
int next_idx;
 
-   droq_pkt  = droq->recv_buf_list[droq->read_idx];
-   droq_pkt2  = droq->recv_buf_list[droq->read_idx];
-   info = rte_pktmbuf_mtod(droq_pkt, struct otx_ep_droq_info *);
+   mbuf = droq->recv_buf_list[droq->read_idx];
+   info = rte_pktmbuf_mtod(mbuf, struct otx_ep_droq_info *);
+
/* make sure info is available */
rte_rmb();
if (unlikely(!info->length)) {
@@ -828,32 +820,25 @@ otx_ep_droq_read_packet(struct otx_ep_device *otx_ep,
assert(0);
}
}
+
if (next_fetch) {
next_idx = otx_ep_incr_index(droq->read_idx, 1, droq->nb_desc);
-   droq_pkt2  = droq->recv_buf_list[next_idx];
-   info2 = rte_pktmbuf_mtod(droq_pkt2, struct otx_ep_droq_info *);
-   rte_prefetch_non_temporal((const void *)info2);
+   mbuf_next = droq->recv_buf_list[next_idx];
+   rte_prefetch0(rte_pktmbuf_mtod(mbuf_next, void *));
}
 
-   info->

[PATCH v6 3/3] net/octeon_ep: add new fastpath routines

2023-10-18 Thread Vamsi Attunuru
Adds new fastpath routines for cn10k & cn9k endpoint
devices and assigns the fastpath routines based on
the offload flags.

Patch also adds misc changes to improve performance
and code-readability.

Signed-off-by: Vamsi Attunuru 
---
 drivers/net/octeon_ep/cnxk_ep_rx.c| 309 ++
 drivers/net/octeon_ep/cnxk_ep_tx.c| 209 +
 drivers/net/octeon_ep/cnxk_ep_vf.c|   2 +
 drivers/net/octeon_ep/cnxk_ep_vf.h|  13 ++
 drivers/net/octeon_ep/meson.build |   2 +
 drivers/net/octeon_ep/otx2_ep_vf.c|   1 +
 drivers/net/octeon_ep/otx_ep_common.h | 125 ++-
 drivers/net/octeon_ep/otx_ep_ethdev.c |  69 +-
 drivers/net/octeon_ep/otx_ep_rxtx.c   |  93 +---
 drivers/net/octeon_ep/otx_ep_rxtx.h   |  38 +++-
 10 files changed, 704 insertions(+), 157 deletions(-)

diff --git a/drivers/net/octeon_ep/cnxk_ep_rx.c 
b/drivers/net/octeon_ep/cnxk_ep_rx.c
new file mode 100644
index 00..74f0011283
--- /dev/null
+++ b/drivers/net/octeon_ep/cnxk_ep_rx.c
@@ -0,0 +1,309 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Marvell.
+ */
+
+#include "otx_ep_common.h"
+#include "otx2_ep_vf.h"
+#include "otx_ep_rxtx.h"
+
+static inline int
+cnxk_ep_rx_refill_mbuf(struct otx_ep_droq *droq, uint32_t count)
+{
+   struct otx_ep_droq_desc *desc_ring = droq->desc_ring;
+   struct rte_mbuf **recv_buf_list = droq->recv_buf_list;
+   uint32_t refill_idx = droq->refill_idx;
+   struct rte_mbuf *buf;
+   uint32_t i;
+   int rc;
+
+   rc = rte_pktmbuf_alloc_bulk(droq->mpool, &recv_buf_list[refill_idx], 
count);
+   if (unlikely(rc)) {
+   droq->stats.rx_alloc_failure++;
+   return rc;
+   }
+
+   for (i = 0; i < count; i++) {
+   buf = recv_buf_list[refill_idx];
+   desc_ring[refill_idx].buffer_ptr = 
rte_mbuf_data_iova_default(buf);
+   refill_idx++;
+   }
+
+   droq->refill_idx = otx_ep_incr_index(droq->refill_idx, count, 
droq->nb_desc);
+   droq->refill_count -= count;
+
+   return 0;
+}
+
+static inline void
+cnxk_ep_rx_refill(struct otx_ep_droq *droq)
+{
+   uint32_t desc_refilled = 0, count;
+   uint32_t nb_desc = droq->nb_desc;
+   uint32_t refill_idx = droq->refill_idx;
+   int rc;
+
+   if (unlikely(droq->read_idx == refill_idx))
+   return;
+
+   if (refill_idx < droq->read_idx) {
+   count = droq->read_idx - refill_idx;
+   rc = cnxk_ep_rx_refill_mbuf(droq, count);
+   if (unlikely(rc)) {
+   droq->stats.rx_alloc_failure++;
+   return;
+   }
+   desc_refilled = count;
+   } else {
+   count = nb_desc - refill_idx;
+   rc = cnxk_ep_rx_refill_mbuf(droq, count);
+   if (unlikely(rc)) {
+   droq->stats.rx_alloc_failure++;
+   return;
+   }
+
+   desc_refilled = count;
+   count = droq->read_idx;
+   rc = cnxk_ep_rx_refill_mbuf(droq, count);
+   if (unlikely(rc)) {
+   droq->stats.rx_alloc_failure++;
+   return;
+   }
+   desc_refilled += count;
+   }
+
+   /* Flush the droq descriptor data to memory to be sure
+* that when we update the credits the data in memory is
+* accurate.
+*/
+   rte_io_wmb();
+   rte_write32(desc_refilled, droq->pkts_credit_reg);
+}
+
+static inline uint32_t
+cnxk_ep_check_rx_pkts(struct otx_ep_droq *droq)
+{
+   uint32_t new_pkts;
+   uint32_t val;
+
+   /* Batch subtractions from the HW counter to reduce PCIe traffic
+* This adds an extra local variable, but almost halves the
+* number of PCIe writes.
+*/
+   val = __atomic_load_n(droq->pkts_sent_ism, __ATOMIC_RELAXED);
+   new_pkts = val - droq->pkts_sent_ism_prev;
+   droq->pkts_sent_ism_prev = val;
+
+   if (val > (uint32_t)(1 << 31)) {
+   /* Only subtract the packet count in the HW counter
+* when count above halfway to saturation.
+*/
+   rte_write64((uint64_t)val, droq->pkts_sent_reg);
+   rte_mb();
+
+   rte_write64(OTX2_SDP_REQUEST_ISM, droq->pkts_sent_reg);
+   while (__atomic_load_n(droq->pkts_sent_ism, __ATOMIC_RELAXED) 
>= val) {
+   rte_write64(OTX2_SDP_REQUEST_ISM, droq->pkts_sent_reg);
+   rte_mb();
+   }
+
+   droq->pkts_sent_ism_prev = 0;
+   }
+   rte_write64(OTX2_SDP_REQUEST_ISM, droq->pkts_sent_reg);
+   droq->pkts_pending += new_pkts;
+
+   return new_pkts;
+}
+
+static inline int16_t __rte_hot
+cnxk_ep_rx_pkts_to_process(struct otx_ep_droq *droq, uint16_t nb_pkts)
+{
+   if (droq->pkts_pending < nb_pkts)
+   cnxk_ep_check_rx_pkts(d

Re: [PATCH v4 0/7] document and simplify use of cmdline

2023-10-18 Thread David Marchand
Hello Bruce,


On Tue, Oct 17, 2023 at 7:08 PM Bruce Richardson
 wrote:
> > > sure I like it that much as a feature :-) I rather like having unique
> > > prefixes for each command. I wasn't actually aware of the testpmd "help
> > > " command at all. I will have to look into it.
> >
> > Let me propose an alternative hack.
> > I mentionned previously that we could have a better namespace /
> > discriminant for those symbols, and it seems easier than I thought:
> >
> > @@ -25,8 +25,10 @@ def process_command(tokens, cfile, comment):
> >  sys.exit(1)
> >  for t in tokens:
> >  if t.startswith('<'):
> > -break
> > -name.append(t)
> > +t_type, t_name = t[1:].split('>')
> > +name.append(t_name)
> > +else:
> > +name.append(t)
> >  name = '_'.join(name)
> >
> >  result_struct = []
> >
> > With this, any command implementation symbol has the full chain of
> > token names as a prefix which will ensure there is no conflict.
> > WDYT?
> >
> Having thought a little more about it, I still don't like having the full
> command in all cases, but I can see it being useful for cases of
> overlapping prefixes.
>
> How about making it optional - setting a flag in the typename, or in the
> parameter name to indicate that it should be included in the overall
> command name. For example, if we prefix the variable name with "_" or "__",
> it could indicate that we can choose to include this.
>
> show port n  --> void cmd_show_port_parsed(...)
> show port _n --> void cmd_show_port_n_parsed(...)
>

I think I get what you mean, and it seems acceptable.


> Prefixes on strings beyond initial tokens could just be silently stripped.

By initial tokens, do you mean fixed strings token before a <> typed token ?


-- 
David Marchand



Re: [PATCH 1/4] net/hns3: fix a typo

2023-10-18 Thread lihuisong (C)

hns3 patch can be stripped from this series.
ltgm,
Acked-by: Huisong Li 


在 2023/10/18 9:39, Jie Hai 写道:

This patch fixes a typo.

Fixes: c09c7847d892 ("net/hns3: support traffic management")
Cc: sta...@dpdk.org

Signed-off-by: Jie Hai 
---
  drivers/net/hns3/hns3_tm.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/hns3/hns3_tm.c b/drivers/net/hns3/hns3_tm.c
index 67402a700f46..d9691640140b 100644
--- a/drivers/net/hns3/hns3_tm.c
+++ b/drivers/net/hns3/hns3_tm.c
@@ -739,7 +739,7 @@ hns3_tm_node_type_get(struct rte_eth_dev *dev, uint32_t 
node_id,
  }
  
  static void

-hns3_tm_nonleaf_level_capsbilities_get(struct rte_eth_dev *dev,
+hns3_tm_nonleaf_level_capabilities_get(struct rte_eth_dev *dev,
   uint32_t level_id,
   struct rte_tm_level_capabilities *cap)
  {
@@ -818,7 +818,7 @@ hns3_tm_level_capabilities_get(struct rte_eth_dev *dev,
memset(cap, 0, sizeof(struct rte_tm_level_capabilities));
  
  	if (level_id != HNS3_TM_NODE_LEVEL_QUEUE)

-   hns3_tm_nonleaf_level_capsbilities_get(dev, level_id, cap);
+   hns3_tm_nonleaf_level_capabilities_get(dev, level_id, cap);
else
hns3_tm_leaf_level_capabilities_get(dev, cap);
  


Re: [PATCH v4 0/7] document and simplify use of cmdline

2023-10-18 Thread Bruce Richardson
On Wed, Oct 18, 2023 at 01:21:40PM +0200, David Marchand wrote:
> Hello Bruce,
> 
> 
> On Tue, Oct 17, 2023 at 7:08 PM Bruce Richardson
>  wrote:
> > > > sure I like it that much as a feature :-) I rather like having unique
> > > > prefixes for each command. I wasn't actually aware of the testpmd "help
> > > > " command at all. I will have to look into it.
> > >
> > > Let me propose an alternative hack.
> > > I mentionned previously that we could have a better namespace /
> > > discriminant for those symbols, and it seems easier than I thought:
> > >
> > > @@ -25,8 +25,10 @@ def process_command(tokens, cfile, comment):
> > >  sys.exit(1)
> > >  for t in tokens:
> > >  if t.startswith('<'):
> > > -break
> > > -name.append(t)
> > > +t_type, t_name = t[1:].split('>')
> > > +name.append(t_name)
> > > +else:
> > > +name.append(t)
> > >  name = '_'.join(name)
> > >
> > >  result_struct = []
> > >
> > > With this, any command implementation symbol has the full chain of
> > > token names as a prefix which will ensure there is no conflict.
> > > WDYT?
> > >
> > Having thought a little more about it, I still don't like having the full
> > command in all cases, but I can see it being useful for cases of
> > overlapping prefixes.
> >
> > How about making it optional - setting a flag in the typename, or in the
> > parameter name to indicate that it should be included in the overall
> > command name. For example, if we prefix the variable name with "_" or "__",
> > it could indicate that we can choose to include this.
> >
> > show port n  --> void cmd_show_port_parsed(...)
> > show port _n --> void cmd_show_port_n_parsed(...)
> >
> 
> I think I get what you mean, and it seems acceptable.
> 

Cool. Any suggestions for a preferred prefix to indicate inclusion in the
cmd name? "_", "__" or something else? I'm trending towards single "_" as
above.

> 
> > Prefixes on strings beyond initial tokens could just be silently stripped.
> 
> By initial tokens, do you mean fixed strings token before a <> typed token ?
>
Yes.

So:

add x --> cmd_add_parsed
add _x--> cmd_add_x_parsed
add _x _y --> cmd_add_x_y_parsed
add x _y  --> cmd_add_parsed, strip "_" off y silently

/Bruce



Re: [PATCH] event/dsw: fix missing device pointer

2023-10-18 Thread Bruce Richardson
On Wed, Oct 18, 2023 at 10:48:14AM +0530, Jerin Jacob wrote:
> On Tue, Oct 17, 2023 at 10:21 PM Jerin Jacob  wrote:
> >
> > On Tue, Oct 17, 2023 at 9:45 PM Bruce Richardson
> >  wrote:
> > >
> > > On Tue, Oct 17, 2023 at 09:34:04PM +0530, Jerin Jacob wrote:
> > > > On Tue, Oct 17, 2023 at 9:32 PM Bruce Richardson
> > > >  wrote:
> > > > >
> > > > > After calling rte_event_dev_info_get() the ".dev" field of the info
> > > > > structure should have a pointer to the underlying device, allowing the
> > > > > user to e.g. get the device name using using rte_dev_name(info.dev).
> > > > >
> > > > > The distributed software eventdev info structure did not return a
> > > > > correct device pointer, though, instead returning NULL, which caused
> > > > > crashes getting "rte_dev_name". Initializing the dev pointer inside 
> > > > > the
> > > > > "eventdev" struct in the device probe function fixes this by ensuring 
> > > > > we
> > > > > have a valid pointer to return in info_get calls.
> > > > >
> > > > > Fixes: 46a186b1f0c5 ("event/dsw: add device registration and build 
> > > > > system")
> > > > > Cc: mattias.ronnb...@ericsson.com
> > > > >
> > > > > Signed-off-by: Bruce Richardson 
> > > >
> > > > Is this issue for all "vdev" devices? if so, Please check for
> > > > drivers/event/skeleton too.
> > > >
> > > Yes, good point, looks like event/skeleton also returns NULL for the 
> > > device
> > > pointer.
> > >
> > > I'll do up a v3 with the extra patch in it.
> >
> > Looks there are more vdev devuces. Can we have common PMD function or
> > extend rte_event_pmd_vdev_init or so.
> 
> 
> @Richardson, Bruce I will be on vacation from Friday, So would like to
> give PR for rc2 before that.
> 
> Adding helper function in rc2 may be risky, Could you fix all vdev
> mentioned below.
> Helper work, I think, we can take in next release.
> 

Having looked at it more, and considering I cannot test a number of the
drivers (dpaa*, octeon), I actually think the safest approach is to modify
the vdev_init function. It's a small change, so I think it's pretty low
risk. Patch will follow shortly.

/Bruce


[PATCH v4 1/2] event/*: set device pointer for vdev-based eventdevs

2023-10-18 Thread Bruce Richardson
The eventdevs based on vdevs, rather than on e.g. HW PCI devices, were,
as a rule, not setting the ".dev" pointer in the eventdev structure.
This caused issues as a NULL pointer was returned in calls to info_get,
triggering crashes if the pointer is passed unchecked to e.g.
rte_dev_name() to print out the name of an event device.

Most effective, and future-proofed fix, is to not rely on the eventdev
drivers to set the pointer themselves, but to change the vdev init
function to take the vdev struct as parameter, and set the "dev" pointer
centrally on init. This allows us to fix all drivers in one go, enforced
by compiler error if the parameter is missing.

Fixes: aaa4a221da26 ("event/sw: add new software-only eventdev driver")
Fixes: 46a186b1f0c5 ("event/dsw: add device registration and build system")
Fixes: 929da5e6 ("event/skeleton: add skeleton eventdev driver")
Fixes: 3c7f3dcfb099 ("event/opdl: add PMD main body and helper function")
Fixes: 9caac5dd1e7f ("event/dpaa: introduce PMD")
Fixes: 8a5d7a8ec74b ("event/dpaa2: initialize device")
Fixes: 34498de6000f ("event/octeontx: add octeontx eventdev driver")
Cc: sta...@dpdk.org

Signed-off-by: Bruce Richardson 
---
 drivers/event/dpaa/dpaa_eventdev.c | 6 +++---
 drivers/event/dpaa2/dpaa2_eventdev.c   | 6 +++---
 drivers/event/dsw/dsw_evdev.c  | 2 +-
 drivers/event/octeontx/ssovf_evdev.c   | 2 +-
 drivers/event/opdl/opdl_evdev.c| 2 +-
 drivers/event/skeleton/skeleton_eventdev.c | 6 +++---
 drivers/event/sw/sw_evdev.c| 2 +-
 lib/eventdev/eventdev_pmd_vdev.h   | 3 ++-
 8 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/drivers/event/dpaa/dpaa_eventdev.c 
b/drivers/event/dpaa/dpaa_eventdev.c
index f615da3813..46a9b88c73 100644
--- a/drivers/event/dpaa/dpaa_eventdev.c
+++ b/drivers/event/dpaa/dpaa_eventdev.c
@@ -994,14 +994,14 @@ dpaa_event_check_flags(const char *params)
 }
 
 static int
-dpaa_event_dev_create(const char *name, const char *params)
+dpaa_event_dev_create(const char *name, const char *params, struct 
rte_vdev_device *vdev)
 {
struct rte_eventdev *eventdev;
struct dpaa_eventdev *priv;
 
eventdev = rte_event_pmd_vdev_init(name,
   sizeof(struct dpaa_eventdev),
-  rte_socket_id());
+  rte_socket_id(), vdev);
if (eventdev == NULL) {
DPAA_EVENTDEV_ERR("Failed to create eventdev vdev %s", name);
goto fail;
@@ -1051,7 +1051,7 @@ dpaa_event_dev_probe(struct rte_vdev_device *vdev)
 
params = rte_vdev_device_args(vdev);
 
-   return dpaa_event_dev_create(name, params);
+   return dpaa_event_dev_create(name, params, vdev);
 }
 
 static int
diff --git a/drivers/event/dpaa2/dpaa2_eventdev.c 
b/drivers/event/dpaa2/dpaa2_eventdev.c
index ffc5550f85..dd4e64395f 100644
--- a/drivers/event/dpaa2/dpaa2_eventdev.c
+++ b/drivers/event/dpaa2/dpaa2_eventdev.c
@@ -1086,7 +1086,7 @@ dpaa2_eventdev_setup_dpci(struct dpaa2_dpci_dev *dpci_dev,
 }
 
 static int
-dpaa2_eventdev_create(const char *name)
+dpaa2_eventdev_create(const char *name, struct rte_vdev_device *vdev)
 {
struct rte_eventdev *eventdev;
struct dpaa2_eventdev *priv;
@@ -1096,7 +1096,7 @@ dpaa2_eventdev_create(const char *name)
 
eventdev = rte_event_pmd_vdev_init(name,
   sizeof(struct dpaa2_eventdev),
-  rte_socket_id());
+  rte_socket_id(), vdev);
if (eventdev == NULL) {
DPAA2_EVENTDEV_ERR("Failed to create Event device %s", name);
goto fail;
@@ -1190,7 +1190,7 @@ dpaa2_eventdev_probe(struct rte_vdev_device *vdev)
 
name = rte_vdev_device_name(vdev);
DPAA2_EVENTDEV_INFO("Initializing %s", name);
-   return dpaa2_eventdev_create(name);
+   return dpaa2_eventdev_create(name, vdev);
 }
 
 static int
diff --git a/drivers/event/dsw/dsw_evdev.c b/drivers/event/dsw/dsw_evdev.c
index 785c12f61f..1209e73a9d 100644
--- a/drivers/event/dsw/dsw_evdev.c
+++ b/drivers/event/dsw/dsw_evdev.c
@@ -435,7 +435,7 @@ dsw_probe(struct rte_vdev_device *vdev)
name = rte_vdev_device_name(vdev);
 
dev = rte_event_pmd_vdev_init(name, sizeof(struct dsw_evdev),
- rte_socket_id());
+ rte_socket_id(), vdev);
if (dev == NULL)
return -EFAULT;
 
diff --git a/drivers/event/octeontx/ssovf_evdev.c 
b/drivers/event/octeontx/ssovf_evdev.c
index 0eb9358981..a16f24e088 100644
--- a/drivers/event/octeontx/ssovf_evdev.c
+++ b/drivers/event/octeontx/ssovf_evdev.c
@@ -880,7 +880,7 @@ ssovf_vdev_probe(struct rte_vdev_device *vdev)
}
 
eventdev = rte_event_pmd_vdev_init(name, sizeof(struct ssovf_evdev),
-   rte_socket_

[PATCH v4 2/2] event/skeleton: set driver name string

2023-10-18 Thread Bruce Richardson
When calling rte_eventdev_info_get() the driver name string field should
be populated.

Fixes: 929da5e6 ("event/skeleton: add skeleton eventdev driver")
Cc: sta...@dpdk.org

Signed-off-by: Bruce Richardson 
---
 drivers/event/skeleton/skeleton_eventdev.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/event/skeleton/skeleton_eventdev.c 
b/drivers/event/skeleton/skeleton_eventdev.c
index 7db1efaf14..6afb5de824 100644
--- a/drivers/event/skeleton/skeleton_eventdev.c
+++ b/drivers/event/skeleton/skeleton_eventdev.c
@@ -24,6 +24,7 @@
 
 #define EVENTDEV_NAME_SKELETON_PMD event_skeleton
 /**< Skeleton event device PMD name */
+#define EVENTDEV_NAME_STRING RTE_STR(EVENTDEV_NAME_SKELETON_PMD)
 
 static uint16_t
 skeleton_eventdev_enqueue(void *port, const struct rte_event *ev)
@@ -88,6 +89,7 @@ skeleton_eventdev_info_get(struct rte_eventdev *dev,
 
RTE_SET_USED(skel);
 
+   dev_info->driver_name = EVENTDEV_NAME_STRING;
dev_info->min_dequeue_timeout_ns = 1;
dev_info->max_dequeue_timeout_ns = 1;
dev_info->dequeue_timeout_ns = 25;
-- 
2.39.2



Re: [PATCH v4] bus/pci: fix legacy device IO port map in secondary process

2023-10-18 Thread Gupta, Nipun




On 10/18/2023 3:35 PM, David Marchand wrote:

On Mon, Oct 9, 2023 at 5:06 AM Ma, WenwuX  wrote:

 From a pci bus API pov, nothing prevents a driver from mixing memory
mapped with vfio and ioport resources (iow, calls to
rte_pci_map_resource() and rte_pci_ioport_map()).
IOW, it may not be the case with the net/virtio driver but, in theory,
rte_pci_ioport_map()/pci_vfio_ioport_map() may be called after a
rte_pci_map_resource() call.

In a similar manner, from the API pov,
rte_pci_ioport_map()/pci_vfio_ioport_map() may be called for multiple bars.

In summary, nothing in this patch checks that vfio has been configured already
and I think we need a refcount to handle those situations.


We call rte_vfio_setup_device just to get device info, we can call 
rte_vfio_release_device as soon as pci_vfio_fill_regions is done.
This avoids reference counting operations, do you think it works?


Afaics, rte_vfio_setup_device should not be called if a call to
rte_pci_map_device for this device was successful (rte_pci_map_device
itself calls rte_vfio_setup_device).
And as a consequence, calling rte_vfio_release_device cannot be done
unconditionnally neither.


Hi David,

AFAIU, rte_vfio_setup_device() is written as re-entrant and does not 
create the DMA mapping again if it is already done for the iommu group.


When this API is called again either for a device within the same group 
or from the device for which it is already called, it mainly only does 
the work for device info get. Though not the best thing to use like 
this, but if this is called multiple times it should not have any 
negative impact.


As Wenmu mention that they need only device info from VFIO, a separate 
API to get device info can be added in eal_vfio.c/h. The device info 
portion of rte_vfio_setup_device() can be moved out to a new API, and 
rte_vfio_setup_device() can call this new API too?


Thanks,
Nipun


Re: [PATCH v4 2/2] event/skeleton: set driver name string

2023-10-18 Thread David Marchand
On Wed, Oct 18, 2023 at 2:26 PM Bruce Richardson
 wrote:
>
> When calling rte_eventdev_info_get() the driver name string field should
> be populated.
>
> Fixes: 929da5e6 ("event/skeleton: add skeleton eventdev driver")
> Cc: sta...@dpdk.org
>
> Signed-off-by: Bruce Richardson 

event/dpaa2 seems affected too.

Can rte_eventdev_info_get() fill this based on the driver attached to
the device, like ethdev does?

Something like untested:

diff --git a/lib/eventdev/rte_eventdev.c b/lib/eventdev/rte_eventdev.c
index 95373bbaad..37ccc0dc77 100644
--- a/lib/eventdev/rte_eventdev.c
+++ b/lib/eventdev/rte_eventdev.c
@@ -104,6 +104,7 @@ rte_event_dev_info_get(uint8_t dev_id, struct
rte_event_dev_info *dev_info)
dev_info->dequeue_timeout_ns = dev->data->dev_conf.dequeue_timeout_ns;

dev_info->dev = dev->dev;
+   dev_info->driver_name = dev->dev->driver->name;

rte_eventdev_trace_info_get(dev_id, dev_info, dev_info->dev);


-- 
David Marchand



Re: [PATCH v4 2/2] event/skeleton: set driver name string

2023-10-18 Thread Bruce Richardson
On Wed, Oct 18, 2023 at 02:44:11PM +0200, David Marchand wrote:
> On Wed, Oct 18, 2023 at 2:26 PM Bruce Richardson
>  wrote:
> >
> > When calling rte_eventdev_info_get() the driver name string field should
> > be populated.
> >
> > Fixes: 929da5e6 ("event/skeleton: add skeleton eventdev driver")
> > Cc: sta...@dpdk.org
> >
> > Signed-off-by: Bruce Richardson 
> 
> event/dpaa2 seems affected too.
> 

Ok, hadn't noticed that. I tested all the SW drivers I could and didn't
find any of them affected.

> Can rte_eventdev_info_get() fill this based on the driver attached to
> the device, like ethdev does?
> 
> Something like untested:
> 
> diff --git a/lib/eventdev/rte_eventdev.c b/lib/eventdev/rte_eventdev.c
> index 95373bbaad..37ccc0dc77 100644
> --- a/lib/eventdev/rte_eventdev.c
> +++ b/lib/eventdev/rte_eventdev.c
> @@ -104,6 +104,7 @@ rte_event_dev_info_get(uint8_t dev_id, struct
> rte_event_dev_info *dev_info)
> dev_info->dequeue_timeout_ns = dev->data->dev_conf.dequeue_timeout_ns;
> 
> dev_info->dev = dev->dev;
> +   dev_info->driver_name = dev->dev->driver->name;
> 
> rte_eventdev_trace_info_get(dev_id, dev_info, dev_info->dev);
> 
Ok, let me do up and test a patch.

/Bruce


[PATCH] test/crypto: move some tests to driver-tests suite

2023-10-18 Thread David Marchand
Some cryptodev driver specific tests were in the "driver-tests" suite,
while similar tests for other drivers were not.

Signed-off-by: David Marchand 
---
 app/test/test_cryptodev.c  | 20 ++--
 app/test/test_cryptodev_asym.c |  9 +++--
 2 files changed, 13 insertions(+), 16 deletions(-)

diff --git a/app/test/test_cryptodev.c b/app/test/test_cryptodev.c
index d2c4c6f8b5..970ca52f7e 100644
--- a/app/test/test_cryptodev.c
+++ b/app/test/test_cryptodev.c
@@ -17963,11 +17963,11 @@ test_cryptodev_dpaa_sec_raw_api(void)
return 
run_cryptodev_raw_testsuite(RTE_STR(CRYPTODEV_NAME_DPAA_SEC_PMD));
 }
 
-REGISTER_TEST_COMMAND(cryptodev_cn10k_raw_api_autotest,
+REGISTER_DRIVER_TEST(cryptodev_cn10k_raw_api_autotest,
test_cryptodev_cn10k_raw_api);
-REGISTER_TEST_COMMAND(cryptodev_dpaa2_sec_raw_api_autotest,
+REGISTER_DRIVER_TEST(cryptodev_dpaa2_sec_raw_api_autotest,
test_cryptodev_dpaa2_sec_raw_api);
-REGISTER_TEST_COMMAND(cryptodev_dpaa_sec_raw_api_autotest,
+REGISTER_DRIVER_TEST(cryptodev_dpaa_sec_raw_api_autotest,
test_cryptodev_dpaa_sec_raw_api);
 REGISTER_DRIVER_TEST(cryptodev_qat_raw_api_autotest,
test_cryptodev_qat_raw_api);
@@ -17981,7 +17981,7 @@ REGISTER_DRIVER_TEST(cryptodev_openssl_autotest, 
test_cryptodev_openssl);
 REGISTER_DRIVER_TEST(cryptodev_aesni_gcm_autotest, test_cryptodev_aesni_gcm);
 REGISTER_DRIVER_TEST(cryptodev_cpu_aesni_gcm_autotest,
test_cryptodev_cpu_aesni_gcm);
-REGISTER_TEST_COMMAND(cryptodev_mlx5_autotest, test_cryptodev_mlx5);
+REGISTER_DRIVER_TEST(cryptodev_mlx5_autotest, test_cryptodev_mlx5);
 REGISTER_DRIVER_TEST(cryptodev_null_autotest, test_cryptodev_null);
 REGISTER_DRIVER_TEST(cryptodev_sw_snow3g_autotest, test_cryptodev_sw_snow3g);
 REGISTER_DRIVER_TEST(cryptodev_sw_kasumi_autotest, test_cryptodev_sw_kasumi);
@@ -17990,12 +17990,12 @@ REGISTER_DRIVER_TEST(cryptodev_sw_armv8_autotest, 
test_cryptodev_armv8);
 REGISTER_DRIVER_TEST(cryptodev_sw_mvsam_autotest, test_cryptodev_mrvl);
 REGISTER_DRIVER_TEST(cryptodev_dpaa2_sec_autotest, test_cryptodev_dpaa2_sec);
 REGISTER_DRIVER_TEST(cryptodev_dpaa_sec_autotest, test_cryptodev_dpaa_sec);
-REGISTER_TEST_COMMAND(cryptodev_ccp_autotest, test_cryptodev_ccp);
+REGISTER_DRIVER_TEST(cryptodev_ccp_autotest, test_cryptodev_ccp);
 REGISTER_DRIVER_TEST(cryptodev_uadk_autotest, test_cryptodev_uadk);
-REGISTER_TEST_COMMAND(cryptodev_virtio_autotest, test_cryptodev_virtio);
-REGISTER_TEST_COMMAND(cryptodev_octeontx_autotest, test_cryptodev_octeontx);
-REGISTER_TEST_COMMAND(cryptodev_caam_jr_autotest, test_cryptodev_caam_jr);
-REGISTER_TEST_COMMAND(cryptodev_nitrox_autotest, test_cryptodev_nitrox);
-REGISTER_TEST_COMMAND(cryptodev_bcmfs_autotest, test_cryptodev_bcmfs);
+REGISTER_DRIVER_TEST(cryptodev_virtio_autotest, test_cryptodev_virtio);
+REGISTER_DRIVER_TEST(cryptodev_octeontx_autotest, test_cryptodev_octeontx);
+REGISTER_DRIVER_TEST(cryptodev_caam_jr_autotest, test_cryptodev_caam_jr);
+REGISTER_DRIVER_TEST(cryptodev_nitrox_autotest, test_cryptodev_nitrox);
+REGISTER_DRIVER_TEST(cryptodev_bcmfs_autotest, test_cryptodev_bcmfs);
 REGISTER_DRIVER_TEST(cryptodev_cn9k_autotest, test_cryptodev_cn9k);
 REGISTER_DRIVER_TEST(cryptodev_cn10k_autotest, test_cryptodev_cn10k);
diff --git a/app/test/test_cryptodev_asym.c b/app/test/test_cryptodev_asym.c
index 94bb091df3..db3180bdcb 100644
--- a/app/test/test_cryptodev_asym.c
+++ b/app/test/test_cryptodev_asym.c
@@ -2875,10 +2875,7 @@ test_cryptodev_cn10k_asym(void)
 }
 
 REGISTER_DRIVER_TEST(cryptodev_openssl_asym_autotest, 
test_cryptodev_openssl_asym);
-
 REGISTER_DRIVER_TEST(cryptodev_qat_asym_autotest, test_cryptodev_qat_asym);
-
-REGISTER_TEST_COMMAND(cryptodev_octeontx_asym_autotest,
- test_cryptodev_octeontx_asym);
-REGISTER_TEST_COMMAND(cryptodev_cn9k_asym_autotest, test_cryptodev_cn9k_asym);
-REGISTER_TEST_COMMAND(cryptodev_cn10k_asym_autotest, 
test_cryptodev_cn10k_asym);
+REGISTER_DRIVER_TEST(cryptodev_octeontx_asym_autotest, 
test_cryptodev_octeontx_asym);
+REGISTER_DRIVER_TEST(cryptodev_cn9k_asym_autotest, test_cryptodev_cn9k_asym);
+REGISTER_DRIVER_TEST(cryptodev_cn10k_asym_autotest, test_cryptodev_cn10k_asym);
-- 
2.41.0



Re: [PATCH v4 1/2] event/*: set device pointer for vdev-based eventdevs

2023-10-18 Thread David Marchand
On Wed, Oct 18, 2023 at 2:26 PM Bruce Richardson
 wrote:
>
> The eventdevs based on vdevs, rather than on e.g. HW PCI devices, were,
> as a rule, not setting the ".dev" pointer in the eventdev structure.
> This caused issues as a NULL pointer was returned in calls to info_get,
> triggering crashes if the pointer is passed unchecked to e.g.
> rte_dev_name() to print out the name of an event device.
>
> Most effective, and future-proofed fix, is to not rely on the eventdev
> drivers to set the pointer themselves, but to change the vdev init
> function to take the vdev struct as parameter, and set the "dev" pointer
> centrally on init. This allows us to fix all drivers in one go, enforced
> by compiler error if the parameter is missing.
>
> Fixes: aaa4a221da26 ("event/sw: add new software-only eventdev driver")
> Fixes: 46a186b1f0c5 ("event/dsw: add device registration and build system")
> Fixes: 929da5e6 ("event/skeleton: add skeleton eventdev driver")
> Fixes: 3c7f3dcfb099 ("event/opdl: add PMD main body and helper function")
> Fixes: 9caac5dd1e7f ("event/dpaa: introduce PMD")
> Fixes: 8a5d7a8ec74b ("event/dpaa2: initialize device")
> Fixes: 34498de6000f ("event/octeontx: add octeontx eventdev driver")
> Cc: sta...@dpdk.org
>
> Signed-off-by: Bruce Richardson 

Acked-by: David Marchand 


-- 
David Marchand



Re: [PATCH] test/crypto: move some tests to driver-tests suite

2023-10-18 Thread Bruce Richardson
On Wed, Oct 18, 2023 at 02:55:42PM +0200, David Marchand wrote:
> Some cryptodev driver specific tests were in the "driver-tests" suite,
> while similar tests for other drivers were not.
> 
> Signed-off-by: David Marchand 
> ---

Spotted that inconsistency myself but never got around to fixing it.
Thanks,

Acked-by: Bruce Richardson 


[PATCH v5 1/2] eventdev: fix device pointer for vdev-based eventdevs

2023-10-18 Thread Bruce Richardson
The eventdevs based on vdevs, rather than on e.g. HW PCI devices, were,
as a rule, not setting the ".dev" pointer in the eventdev structure.
This caused issues as a NULL pointer was returned in calls to info_get,
triggering crashes if the pointer is passed unchecked to e.g.
rte_dev_name() to print out the name of an event device.

Most effective, and future-proofed fix, is to not rely on the eventdev
drivers to set the pointer themselves, but to change the vdev init
function to take the vdev struct as parameter, and set the "dev" pointer
centrally on init. This allows us to fix all drivers in one go, enforced
by compiler error if the parameter is missing.

Fixes: aaa4a221da26 ("event/sw: add new software-only eventdev driver")
Fixes: 46a186b1f0c5 ("event/dsw: add device registration and build system")
Fixes: 929da5e6 ("event/skeleton: add skeleton eventdev driver")
Fixes: 3c7f3dcfb099 ("event/opdl: add PMD main body and helper function")
Fixes: 9caac5dd1e7f ("event/dpaa: introduce PMD")
Fixes: 8a5d7a8ec74b ("event/dpaa2: initialize device")
Fixes: 34498de6000f ("event/octeontx: add octeontx eventdev driver")
Cc: sta...@dpdk.org

Signed-off-by: Bruce Richardson 
Acked-by: David Marchand 
---
 drivers/event/dpaa/dpaa_eventdev.c | 6 +++---
 drivers/event/dpaa2/dpaa2_eventdev.c   | 6 +++---
 drivers/event/dsw/dsw_evdev.c  | 2 +-
 drivers/event/octeontx/ssovf_evdev.c   | 2 +-
 drivers/event/opdl/opdl_evdev.c| 2 +-
 drivers/event/skeleton/skeleton_eventdev.c | 6 +++---
 drivers/event/sw/sw_evdev.c| 2 +-
 lib/eventdev/eventdev_pmd_vdev.h   | 3 ++-
 8 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/drivers/event/dpaa/dpaa_eventdev.c 
b/drivers/event/dpaa/dpaa_eventdev.c
index f615da3813..46a9b88c73 100644
--- a/drivers/event/dpaa/dpaa_eventdev.c
+++ b/drivers/event/dpaa/dpaa_eventdev.c
@@ -994,14 +994,14 @@ dpaa_event_check_flags(const char *params)
 }
 
 static int
-dpaa_event_dev_create(const char *name, const char *params)
+dpaa_event_dev_create(const char *name, const char *params, struct 
rte_vdev_device *vdev)
 {
struct rte_eventdev *eventdev;
struct dpaa_eventdev *priv;
 
eventdev = rte_event_pmd_vdev_init(name,
   sizeof(struct dpaa_eventdev),
-  rte_socket_id());
+  rte_socket_id(), vdev);
if (eventdev == NULL) {
DPAA_EVENTDEV_ERR("Failed to create eventdev vdev %s", name);
goto fail;
@@ -1051,7 +1051,7 @@ dpaa_event_dev_probe(struct rte_vdev_device *vdev)
 
params = rte_vdev_device_args(vdev);
 
-   return dpaa_event_dev_create(name, params);
+   return dpaa_event_dev_create(name, params, vdev);
 }
 
 static int
diff --git a/drivers/event/dpaa2/dpaa2_eventdev.c 
b/drivers/event/dpaa2/dpaa2_eventdev.c
index ffc5550f85..dd4e64395f 100644
--- a/drivers/event/dpaa2/dpaa2_eventdev.c
+++ b/drivers/event/dpaa2/dpaa2_eventdev.c
@@ -1086,7 +1086,7 @@ dpaa2_eventdev_setup_dpci(struct dpaa2_dpci_dev *dpci_dev,
 }
 
 static int
-dpaa2_eventdev_create(const char *name)
+dpaa2_eventdev_create(const char *name, struct rte_vdev_device *vdev)
 {
struct rte_eventdev *eventdev;
struct dpaa2_eventdev *priv;
@@ -1096,7 +1096,7 @@ dpaa2_eventdev_create(const char *name)
 
eventdev = rte_event_pmd_vdev_init(name,
   sizeof(struct dpaa2_eventdev),
-  rte_socket_id());
+  rte_socket_id(), vdev);
if (eventdev == NULL) {
DPAA2_EVENTDEV_ERR("Failed to create Event device %s", name);
goto fail;
@@ -1190,7 +1190,7 @@ dpaa2_eventdev_probe(struct rte_vdev_device *vdev)
 
name = rte_vdev_device_name(vdev);
DPAA2_EVENTDEV_INFO("Initializing %s", name);
-   return dpaa2_eventdev_create(name);
+   return dpaa2_eventdev_create(name, vdev);
 }
 
 static int
diff --git a/drivers/event/dsw/dsw_evdev.c b/drivers/event/dsw/dsw_evdev.c
index 785c12f61f..1209e73a9d 100644
--- a/drivers/event/dsw/dsw_evdev.c
+++ b/drivers/event/dsw/dsw_evdev.c
@@ -435,7 +435,7 @@ dsw_probe(struct rte_vdev_device *vdev)
name = rte_vdev_device_name(vdev);
 
dev = rte_event_pmd_vdev_init(name, sizeof(struct dsw_evdev),
- rte_socket_id());
+ rte_socket_id(), vdev);
if (dev == NULL)
return -EFAULT;
 
diff --git a/drivers/event/octeontx/ssovf_evdev.c 
b/drivers/event/octeontx/ssovf_evdev.c
index 0eb9358981..a16f24e088 100644
--- a/drivers/event/octeontx/ssovf_evdev.c
+++ b/drivers/event/octeontx/ssovf_evdev.c
@@ -880,7 +880,7 @@ ssovf_vdev_probe(struct rte_vdev_device *vdev)
}
 
eventdev = rte_event_pmd_vdev_init(name, sizeof(struct ssovf_evdev),
-

RE: [PATCH v3 0/3] fix test-pipeline issues

2023-10-18 Thread Dumitrescu, Cristian



> -Original Message-
> From: Feifei Wang 
> Sent: Tuesday, September 12, 2023 7:39 AM
> Cc: dev@dpdk.org; n...@arm.com; Feifei Wang 
> Subject: [PATCH v3 0/3] fix test-pipeline issues
> 
> For test-pipeline application, there are some problems with the normal
> operation of the program and security issues. These patches can fix
> these issues.
> 
> v3: fix SIGINT handling issue and add dev close operation
> 
> Feifei Wang (3):
>   app/test-pipeline: relax RSS hash requirement
>   app/test-pipeline: fix SIGINT handling issue
>   app/test-pipeline: add dev close operation
> 
>  app/test-pipeline/init.c  |  22 -
>  app/test-pipeline/main.c  |  33 +++
>  app/test-pipeline/main.h  |   2 +
>  app/test-pipeline/pipeline_acl.c  |   6 +-
>  app/test-pipeline/pipeline_hash.c | 110 ++---
>  app/test-pipeline/pipeline_lpm.c  |   6 +-
>  app/test-pipeline/pipeline_lpm_ipv6.c |   6 +-
>  app/test-pipeline/pipeline_stub.c |   6 +-
>  app/test-pipeline/runtime.c   | 132 ++
>  9 files changed, 198 insertions(+), 125 deletions(-)
> 
> --
> 2.25.1

Series-acked-by: Cristian Dumitrescu 



[PATCH v5 2/2] eventdev: fix missing driver names in info struct

2023-10-18 Thread Bruce Richardson
Rather than relying on the individual drivers to always populated the
driver name field in the info structure - something missed by some
drivers, we can do so in the eventdev rte_event_dev_info_get() function.
This fixes issues

Fixes: 929da5e6 ("event/skeleton: add skeleton eventdev driver")
Fixes: 0ce3ce7c275c ("event/dpaa2: add configuration functions")
Cc: sta...@dpdk.org

Suggested-by: David Marchand 
Signed-off-by: Bruce Richardson 
---
 lib/eventdev/rte_eventdev.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/eventdev/rte_eventdev.c b/lib/eventdev/rte_eventdev.c
index 95373bbaad..0ca32d6721 100644
--- a/lib/eventdev/rte_eventdev.c
+++ b/lib/eventdev/rte_eventdev.c
@@ -104,6 +104,8 @@ rte_event_dev_info_get(uint8_t dev_id, struct 
rte_event_dev_info *dev_info)
dev_info->dequeue_timeout_ns = dev->data->dev_conf.dequeue_timeout_ns;
 
dev_info->dev = dev->dev;
+   if (dev->dev != NULL && dev->dev->driver != NULL)
+   dev_info->driver_name = dev->dev->driver->name;
 
rte_eventdev_trace_info_get(dev_id, dev_info, dev_info->dev);
 
-- 
2.39.2



Re: [PATCH v4 2/2] event/skeleton: set driver name string

2023-10-18 Thread Bruce Richardson
On Wed, Oct 18, 2023 at 01:52:08PM +0100, Bruce Richardson wrote:
> On Wed, Oct 18, 2023 at 02:44:11PM +0200, David Marchand wrote:
> > On Wed, Oct 18, 2023 at 2:26 PM Bruce Richardson
> >  wrote:
> > >
> > > When calling rte_eventdev_info_get() the driver name string field should
> > > be populated.
> > >
> > > Fixes: 929da5e6 ("event/skeleton: add skeleton eventdev driver")
> > > Cc: sta...@dpdk.org
> > >
> > > Signed-off-by: Bruce Richardson 
> > 
> > event/dpaa2 seems affected too.
> > 
> 
> Ok, hadn't noticed that. I tested all the SW drivers I could and didn't
> find any of them affected.
> 
> > Can rte_eventdev_info_get() fill this based on the driver attached to
> > the device, like ethdev does?
> > 
> > Something like untested:
> > 
> > diff --git a/lib/eventdev/rte_eventdev.c b/lib/eventdev/rte_eventdev.c
> > index 95373bbaad..37ccc0dc77 100644
> > --- a/lib/eventdev/rte_eventdev.c
> > +++ b/lib/eventdev/rte_eventdev.c
> > @@ -104,6 +104,7 @@ rte_event_dev_info_get(uint8_t dev_id, struct
> > rte_event_dev_info *dev_info)
> > dev_info->dequeue_timeout_ns = 
> > dev->data->dev_conf.dequeue_timeout_ns;
> > 
> > dev_info->dev = dev->dev;
> > +   dev_info->driver_name = dev->dev->driver->name;
> > 
> > rte_eventdev_trace_info_get(dev_id, dev_info, dev_info->dev);
> > 
> Ok, let me do up and test a patch.
> 
V5 sent with this fix instead. I just added some additional conditionals
around it, just in case we end up with some Null pointers in that chain
(hopefully not after the previous patch, but better to check).

Testing with skeleton, sw, dsw and opdl drivers showed no issues with
reporting the driver name.

/Bruce


Re: [PATCH v4] bus/pci: fix legacy device IO port map in secondary process

2023-10-18 Thread David Marchand
On Wed, Oct 18, 2023 at 2:39 PM Gupta, Nipun  wrote:
> On 10/18/2023 3:35 PM, David Marchand wrote:
> > On Mon, Oct 9, 2023 at 5:06 AM Ma, WenwuX  wrote:
> >>>  From a pci bus API pov, nothing prevents a driver from mixing memory
> >>> mapped with vfio and ioport resources (iow, calls to
> >>> rte_pci_map_resource() and rte_pci_ioport_map()).
> >>> IOW, it may not be the case with the net/virtio driver but, in theory,
> >>> rte_pci_ioport_map()/pci_vfio_ioport_map() may be called after a
> >>> rte_pci_map_resource() call.
> >>>
> >>> In a similar manner, from the API pov,
> >>> rte_pci_ioport_map()/pci_vfio_ioport_map() may be called for multiple 
> >>> bars.
> >>>
> >>> In summary, nothing in this patch checks that vfio has been configured 
> >>> already
> >>> and I think we need a refcount to handle those situations.
> >>>
> >> We call rte_vfio_setup_device just to get device info, we can call 
> >> rte_vfio_release_device as soon as pci_vfio_fill_regions is done.
> >> This avoids reference counting operations, do you think it works?
> >
> > Afaics, rte_vfio_setup_device should not be called if a call to
> > rte_pci_map_device for this device was successful (rte_pci_map_device
> > itself calls rte_vfio_setup_device).
> > And as a consequence, calling rte_vfio_release_device cannot be done
> > unconditionnally neither.
>
> Hi David,
>
> AFAIU, c() is written as re-entrant and does not
> create the DMA mapping again if it is already done for the iommu group.
>
> When this API is called again either for a device within the same group
> or from the device for which it is already called, it mainly only does
> the work for device info get. Though not the best thing to use like
> this, but if this is called multiple times it should not have any
> negative impact.

Even if rte_vfio_setup_device() is reentrant, there is still the
question when to call rte_vfio_release_device().


>
> As Wenmu mention that they need only device info from VFIO, a separate
> API to get device info can be added in eal_vfio.c/h. The device info
> portion of rte_vfio_setup_device() can be moved out to a new API, and
> rte_vfio_setup_device() can call this new API too?

Ok, I think I understand your suggestion.

Do we have a reference to the vfio device fd stored somewhere in the
pci device object?
I don't think it is the case, but if the pci layer keeps a reference
to it (it would be populated/reset during
rte_pci_map_device/rte_pci_unmap_device), then the ioport code can
call the VFIO_DEVICE_GET_INFO ioctl() similarly to what is done for
irq msi info, and  there is no need for a new EAL api.

For the case when this device fd is not available (no previous call to
rte_pci_map_device()), then the ioport code can call
rte_vfio_setup_device() / rte_vfio_release_device().

Is this what you have in mind?


-- 
David Marchand



[PATCH v6 00/34] Implementation of revised ml/cnxk driver

2023-10-18 Thread Srikanth Yalavarthi
This patch series is an implementation of revised ml/cnxk driver
to support models compiled with TVM compiler framework. TVM models
use a hybrid mode for execution, with regions of the model executing
on the ML accelerator and the rest executing on CPU cores.

This series of commits reorganizes the ml/cnxk driver and adds support
to execute multiple regions with-in a TVM model.

v6:
  - Added depends info for series. This series depends on patch-132887
  - Fix merge conflicts with dpdk-23.11-rc1
  - Fix issues with ml/cnxk driver release notes
  - Added build dependency information for dlpack headers

v5:
  - Fix build failures for individual patches in the series
  - Finished build testing with devtools/test-meson-builds.sh script

v4:
  - Squashed release notes
  - Updated external build dependency info in documentation

v3:
  - Reduced use of RTE_MLDEV_CNXK_ENABLE_MVTVM macro
  - Added stubs file with dummy functions to use when TVM is disabled
  - Dropped patch with internal function to read firmware
  - Updated ML CNXK PMD documentation
  - Added external library dependency info in documentation
  - Added release notes for 23.11

v2:
  - Fix xstats reporting
  - Fix issues reported by klocwork static analysis tool
  - Update external header inclusions

v1:
  - Initial changes

Anup Prabhu (2):
  ml/cnxk: enable OCM check for multilayer TVM model
  ml/cnxk: enable fast-path ops for TVM models

Prince Takkar (2):
  ml/cnxk: update internal TVM model info structure
  ml/cnxk: support quantize and dequantize callback

Srikanth Yalavarthi (30):
  ml/cnxk: drop support for register polling
  ml/cnxk: add generic cnxk device structure
  ml/cnxk: add generic model and layer structures
  ml/cnxk: add generic cnxk request structure
  ml/cnxk: add generic cnxk xstats structures
  ml/cnxk: rename cnxk ops function pointers struct
  ml/cnxk: update device handling functions
  ml/cnxk: update queue-pair handling functions
  ml/cnxk: update model load and unload functions
  ml/cnxk: update model start and stop functions
  ml/cnxk: update model utility functions
  ml/cnxk: update data quantization functions
  ml/cnxk: update device debug functions
  ml/cnxk: update device stats functions
  ml/cnxk: update device and model xstats functions
  ml/cnxk: update fast path functions
  ml/cnxk: move error handling to cnxk layer
  ml/cnxk: support config and close of tvmdp library
  ml/cnxk: add structures to support TVM model type
  ml/cnxk: add support for identify model type
  ml/cnxk: add support to parse TVM model objects
  ml/cnxk: fetch layer info and load TVM model
  ml/cnxk: update internal info for TVM model
  ml/cnxk: enable model unload in tvmdp library
  ml/cnxk: support start and stop for TVM models
  ml/cnxk: support device dump for TVM models
  ml/cnxk: enable reporting model runtime as xstats
  ml/cnxk: implement I/O alloc and free callbacks
  ml/cnxk: add generic ML malloc and free callback
  ml/cnxk: enable creation of mvtvm virtual device

 doc/guides/mldevs/cnxk.rst |  131 +-
 doc/guides/rel_notes/release_23_11.rst |3 +
 drivers/ml/cnxk/cn10k_ml_dev.c |  416 ++--
 drivers/ml/cnxk/cn10k_ml_dev.h |  457 +---
 drivers/ml/cnxk/cn10k_ml_model.c   |  401 ++--
 drivers/ml/cnxk/cn10k_ml_model.h   |  151 +-
 drivers/ml/cnxk/cn10k_ml_ocm.c |  111 +-
 drivers/ml/cnxk/cn10k_ml_ocm.h |   15 +-
 drivers/ml/cnxk/cn10k_ml_ops.c | 2828 
 drivers/ml/cnxk/cn10k_ml_ops.h |  358 ++-
 drivers/ml/cnxk/cnxk_ml_dev.c  |   22 +
 drivers/ml/cnxk/cnxk_ml_dev.h  |  120 +
 drivers/ml/cnxk/cnxk_ml_io.c   |   95 +
 drivers/ml/cnxk/cnxk_ml_io.h   |   88 +
 drivers/ml/cnxk/cnxk_ml_model.c|   94 +
 drivers/ml/cnxk/cnxk_ml_model.h|  192 ++
 drivers/ml/cnxk/cnxk_ml_ops.c  | 1690 ++
 drivers/ml/cnxk/cnxk_ml_ops.h  |   87 +
 drivers/ml/cnxk/cnxk_ml_utils.c|   15 +
 drivers/ml/cnxk/cnxk_ml_utils.h|   17 +
 drivers/ml/cnxk/cnxk_ml_xstats.h   |  152 ++
 drivers/ml/cnxk/meson.build|   73 +
 drivers/ml/cnxk/mvtvm_ml_dev.c |  196 ++
 drivers/ml/cnxk/mvtvm_ml_dev.h |   40 +
 drivers/ml/cnxk/mvtvm_ml_model.c   |  392 
 drivers/ml/cnxk/mvtvm_ml_model.h   |   90 +
 drivers/ml/cnxk/mvtvm_ml_ops.c |  652 ++
 drivers/ml/cnxk/mvtvm_ml_ops.h |   82 +
 drivers/ml/cnxk/mvtvm_ml_stubs.c   |  141 ++
 drivers/ml/cnxk/mvtvm_ml_stubs.h   |   36 +
 30 files changed, 6186 insertions(+), 2959 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_dev.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_dev.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_io.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_io.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_model.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_model.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_ops.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_ops.h
 create mode 

[PATCH v6 01/34] ml/cnxk: drop support for register polling

2023-10-18 Thread Srikanth Yalavarthi
Dropped support for device argument "poll_mem" for cnxk
ML driver. Support to use registers for polling is removed
and DDR addresses would be used for polling.

Signed-off-by: Srikanth Yalavarthi 
---
Depends-on: patch-132887 ("ml/cnxk: don't export internal headers")

 doc/guides/mldevs/cnxk.rst |  16 -
 drivers/ml/cnxk/cn10k_ml_dev.c |  36 +--
 drivers/ml/cnxk/cn10k_ml_dev.h |  13 +---
 drivers/ml/cnxk/cn10k_ml_ops.c | 111 -
 drivers/ml/cnxk/cn10k_ml_ops.h |   6 --
 5 files changed, 18 insertions(+), 164 deletions(-)

diff --git a/doc/guides/mldevs/cnxk.rst b/doc/guides/mldevs/cnxk.rst
index b79bc540d9..1834b1f905 100644
--- a/doc/guides/mldevs/cnxk.rst
+++ b/doc/guides/mldevs/cnxk.rst
@@ -180,22 +180,6 @@ Runtime Config Options
   in the fast path enqueue burst operation.
 
 
-**Polling memory location** (default ``ddr``)
-
-  ML cnxk driver provides the option to select the memory location to be used
-  for polling to check the inference request completion.
-  Driver supports using either the DDR address space (``ddr``)
-  or ML registers (``register``) as polling locations.
-  The parameter ``poll_mem`` is used to specify the poll location.
-
-  For example::
-
- -a :00:10.0,poll_mem="register"
-
-  With the above configuration, ML cnxk driver is configured to use ML 
registers
-  for polling in fastpath requests.
-
-
 Debugging Options
 -
 
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 983138a7f2..e3c2badcef 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -23,7 +23,6 @@
 #define CN10K_ML_DEV_CACHE_MODEL_DATA  "cache_model_data"
 #define CN10K_ML_OCM_ALLOC_MODE"ocm_alloc_mode"
 #define CN10K_ML_DEV_HW_QUEUE_LOCK "hw_queue_lock"
-#define CN10K_ML_FW_POLL_MEM   "poll_mem"
 #define CN10K_ML_OCM_PAGE_SIZE "ocm_page_size"
 
 #define CN10K_ML_FW_PATH_DEFAULT   "/lib/firmware/mlip-fw.bin"
@@ -32,7 +31,6 @@
 #define CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT  1
 #define CN10K_ML_OCM_ALLOC_MODE_DEFAULT"lowest"
 #define CN10K_ML_DEV_HW_QUEUE_LOCK_DEFAULT 1
-#define CN10K_ML_FW_POLL_MEM_DEFAULT   "ddr"
 #define CN10K_ML_OCM_PAGE_SIZE_DEFAULT 16384
 
 /* ML firmware macros */
@@ -54,7 +52,6 @@ static const char *const valid_args[] = {CN10K_ML_FW_PATH,
 CN10K_ML_DEV_CACHE_MODEL_DATA,
 CN10K_ML_OCM_ALLOC_MODE,
 CN10K_ML_DEV_HW_QUEUE_LOCK,
-CN10K_ML_FW_POLL_MEM,
 CN10K_ML_OCM_PAGE_SIZE,
 NULL};
 
@@ -103,9 +100,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, 
struct cn10k_ml_dev *mlde
bool hw_queue_lock_set = false;
bool ocm_page_size_set = false;
char *ocm_alloc_mode = NULL;
-   bool poll_mem_set = false;
bool fw_path_set = false;
-   char *poll_mem = NULL;
char *fw_path = NULL;
int ret = 0;
bool found;
@@ -189,17 +184,6 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, 
struct cn10k_ml_dev *mlde
hw_queue_lock_set = true;
}
 
-   if (rte_kvargs_count(kvlist, CN10K_ML_FW_POLL_MEM) == 1) {
-   ret = rte_kvargs_process(kvlist, CN10K_ML_FW_POLL_MEM, 
&parse_string_arg,
-&poll_mem);
-   if (ret < 0) {
-   plt_err("Error processing arguments, key = %s\n", 
CN10K_ML_FW_POLL_MEM);
-   ret = -EINVAL;
-   goto exit;
-   }
-   poll_mem_set = true;
-   }
-
if (rte_kvargs_count(kvlist, CN10K_ML_OCM_PAGE_SIZE) == 1) {
ret = rte_kvargs_process(kvlist, CN10K_ML_OCM_PAGE_SIZE, 
&parse_integer_arg,
 &mldev->ocm_page_size);
@@ -280,18 +264,6 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, 
struct cn10k_ml_dev *mlde
}
plt_info("ML: %s = %d", CN10K_ML_DEV_HW_QUEUE_LOCK, 
mldev->hw_queue_lock);
 
-   if (!poll_mem_set) {
-   mldev->fw.poll_mem = CN10K_ML_FW_POLL_MEM_DEFAULT;
-   } else {
-   if (!((strcmp(poll_mem, "ddr") == 0) || (strcmp(poll_mem, 
"register") == 0))) {
-   plt_err("Invalid argument, %s = %s\n", 
CN10K_ML_FW_POLL_MEM, poll_mem);
-   ret = -EINVAL;
-   goto exit;
-   }
-   mldev->fw.poll_mem = poll_mem;
-   }
-   plt_info("ML: %s = %s", CN10K_ML_FW_POLL_MEM, mldev->fw.poll_mem);
-
if (!ocm_page_size_set) {
mldev->ocm_page_size = CN10K_ML_OCM_PAGE_SIZE_DEFAULT;
} else {
@@ -450,10 +422,7 @@ cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw)
if (fw->report_dpe_war

[PATCH v6 02/34] ml/cnxk: add generic cnxk device structure

2023-10-18 Thread Srikanth Yalavarthi
Introduce generic cnxk device structure. This structure is
a top level device structure for the driver, which would
encapsulate the target / platform specific device structure.

Signed-off-by: Srikanth Yalavarthi 
---
 drivers/ml/cnxk/cn10k_ml_dev.c   | 316 ++--
 drivers/ml/cnxk/cn10k_ml_dev.h   |  47 +--
 drivers/ml/cnxk/cn10k_ml_model.c |  15 +-
 drivers/ml/cnxk/cn10k_ml_model.h |   8 +-
 drivers/ml/cnxk/cn10k_ml_ocm.c   |  60 ++--
 drivers/ml/cnxk/cn10k_ml_ops.c   | 495 +--
 drivers/ml/cnxk/cnxk_ml_dev.c|  11 +
 drivers/ml/cnxk/cnxk_ml_dev.h|  58 
 drivers/ml/cnxk/meson.build  |   1 +
 9 files changed, 562 insertions(+), 449 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_dev.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_dev.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index e3c2badcef..3bc61443d8 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -10,13 +10,14 @@
 #include 
 #include 
 
-#include 
-
 #include 
 
-#include "cn10k_ml_dev.h"
+#include 
+
 #include "cn10k_ml_ops.h"
 
+#include "cnxk_ml_dev.h"
+
 #define CN10K_ML_FW_PATH   "fw_path"
 #define CN10K_ML_FW_ENABLE_DPE_WARNINGS "enable_dpe_warnings"
 #define CN10K_ML_FW_REPORT_DPE_WARNINGS "report_dpe_warnings"
@@ -58,9 +59,6 @@ static const char *const valid_args[] = {CN10K_ML_FW_PATH,
 /* Supported OCM page sizes: 1KB, 2KB, 4KB, 8KB and 16KB */
 static const int valid_ocm_page_size[] = {1024, 2048, 4096, 8192, 16384};
 
-/* Dummy operations for ML device */
-struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
-
 static int
 parse_string_arg(const char *key __rte_unused, const char *value, void 
*extra_args)
 {
@@ -90,7 +88,7 @@ parse_integer_arg(const char *key __rte_unused, const char 
*value, void *extra_a
 }
 
 static int
-cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev 
*mldev)
+cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev 
*cn10k_mldev)
 {
bool enable_dpe_warnings_set = false;
bool report_dpe_warnings_set = false;
@@ -127,7 +125,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, 
struct cn10k_ml_dev *mlde
 
if (rte_kvargs_count(kvlist, CN10K_ML_FW_ENABLE_DPE_WARNINGS) == 1) {
ret = rte_kvargs_process(kvlist, 
CN10K_ML_FW_ENABLE_DPE_WARNINGS,
-&parse_integer_arg, 
&mldev->fw.enable_dpe_warnings);
+&parse_integer_arg, 
&cn10k_mldev->fw.enable_dpe_warnings);
if (ret < 0) {
plt_err("Error processing arguments, key = %s\n",
CN10K_ML_FW_ENABLE_DPE_WARNINGS);
@@ -139,7 +137,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, 
struct cn10k_ml_dev *mlde
 
if (rte_kvargs_count(kvlist, CN10K_ML_FW_REPORT_DPE_WARNINGS) == 1) {
ret = rte_kvargs_process(kvlist, 
CN10K_ML_FW_REPORT_DPE_WARNINGS,
-&parse_integer_arg, 
&mldev->fw.report_dpe_warnings);
+&parse_integer_arg, 
&cn10k_mldev->fw.report_dpe_warnings);
if (ret < 0) {
plt_err("Error processing arguments, key = %s\n",
CN10K_ML_FW_REPORT_DPE_WARNINGS);
@@ -151,7 +149,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, 
struct cn10k_ml_dev *mlde
 
if (rte_kvargs_count(kvlist, CN10K_ML_DEV_CACHE_MODEL_DATA) == 1) {
ret = rte_kvargs_process(kvlist, CN10K_ML_DEV_CACHE_MODEL_DATA, 
&parse_integer_arg,
-&mldev->cache_model_data);
+&cn10k_mldev->cache_model_data);
if (ret < 0) {
plt_err("Error processing arguments, key = %s\n",
CN10K_ML_DEV_CACHE_MODEL_DATA);
@@ -174,7 +172,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, 
struct cn10k_ml_dev *mlde
 
if (rte_kvargs_count(kvlist, CN10K_ML_DEV_HW_QUEUE_LOCK) == 1) {
ret = rte_kvargs_process(kvlist, CN10K_ML_DEV_HW_QUEUE_LOCK, 
&parse_integer_arg,
-&mldev->hw_queue_lock);
+&cn10k_mldev->hw_queue_lock);
if (ret < 0) {
plt_err("Error processing arguments, key = %s\n",
CN10K_ML_DEV_HW_QUEUE_LOCK);
@@ -186,7 +184,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, 
struct cn10k_ml_dev *mlde
 
if (rte_kvargs_count(kvlist, CN10K_ML_OCM_PAGE_SIZE) == 1) {
ret = rte_kvargs_process(kvlist, CN10K_ML_OCM_PAGE_SIZE, 
&parse_integer_arg,
-&mldev->ocm_page_size);
+&cn10k_mldev->ocm_page_size);
if (ret < 0) {
 

[PATCH v6 05/34] ml/cnxk: add generic cnxk xstats structures

2023-10-18 Thread Srikanth Yalavarthi
Introduced generic xstats structures and renamed cn10k
xstats enumerations with cnxk prefix.

Signed-off-by: Srikanth Yalavarthi 
---
 drivers/ml/cnxk/cn10k_ml_dev.h   |  86 +---
 drivers/ml/cnxk/cn10k_ml_model.h |   6 +-
 drivers/ml/cnxk/cn10k_ml_ops.c   | 169 ++-
 drivers/ml/cnxk/cnxk_ml_xstats.h | 128 +++
 4 files changed, 209 insertions(+), 180 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_xstats.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 1852d4f6c9..be989e0a20 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -10,6 +10,7 @@
 #include "cn10k_ml_ocm.h"
 
 #include "cnxk_ml_io.h"
+#include "cnxk_ml_xstats.h"
 
 /* Dummy Device ops */
 extern struct rte_ml_dev_ops ml_dev_dummy_ops;
@@ -121,89 +122,6 @@ struct cn10k_ml_fw {
struct cnxk_ml_req *req;
 };
 
-/* Extended stats types enum */
-enum cn10k_ml_xstats_type {
-   /* Number of models loaded */
-   nb_models_loaded,
-
-   /* Number of models unloaded */
-   nb_models_unloaded,
-
-   /* Number of models started */
-   nb_models_started,
-
-   /* Number of models stopped */
-   nb_models_stopped,
-
-   /* Average inference hardware latency */
-   avg_hw_latency,
-
-   /* Minimum hardware latency */
-   min_hw_latency,
-
-   /* Maximum hardware latency */
-   max_hw_latency,
-
-   /* Average firmware latency */
-   avg_fw_latency,
-
-   /* Minimum firmware latency */
-   min_fw_latency,
-
-   /* Maximum firmware latency */
-   max_fw_latency,
-};
-
-/* Extended stats function type enum. */
-enum cn10k_ml_xstats_fn_type {
-   /* Device function */
-   CN10K_ML_XSTATS_FN_DEVICE,
-
-   /* Model function */
-   CN10K_ML_XSTATS_FN_MODEL,
-};
-
-/* Function pointer to get xstats for a type */
-typedef uint64_t (*cn10k_ml_xstats_fn)(struct rte_ml_dev *dev, uint16_t 
obj_idx,
-  enum cn10k_ml_xstats_type stat);
-
-/* Extended stats entry structure */
-struct cn10k_ml_xstats_entry {
-   /* Name-ID map */
-   struct rte_ml_dev_xstats_map map;
-
-   /* xstats mode, device or model */
-   enum rte_ml_dev_xstats_mode mode;
-
-   /* Type of xstats */
-   enum cn10k_ml_xstats_type type;
-
-   /* xstats function */
-   enum cn10k_ml_xstats_fn_type fn_id;
-
-   /* Object ID, model ID for model stat type */
-   uint16_t obj_idx;
-
-   /* Allowed to reset the stat */
-   uint8_t reset_allowed;
-
-   /* An offset to be taken away to emulate resets */
-   uint64_t reset_value;
-};
-
-/* Extended stats data */
-struct cn10k_ml_xstats {
-   /* Pointer to xstats entries */
-   struct cn10k_ml_xstats_entry *entries;
-
-   /* Store num stats and offset of the stats for each model */
-   uint16_t count_per_model[ML_CNXK_MAX_MODELS];
-   uint16_t offset_for_model[ML_CNXK_MAX_MODELS];
-   uint16_t count_mode_device;
-   uint16_t count_mode_model;
-   uint16_t count;
-};
-
 /* Device private data */
 struct cn10k_ml_dev {
/* Device ROC */
@@ -216,7 +134,7 @@ struct cn10k_ml_dev {
struct cn10k_ml_ocm ocm;
 
/* Extended stats data */
-   struct cn10k_ml_xstats xstats;
+   struct cnxk_ml_xstats xstats;
 
/* Enable / disable model data caching */
int cache_model_data;
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 74ada1531a..5c32f48c68 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -404,7 +404,7 @@ struct cn10k_ml_layer_addr {
 };
 
 /* Model fast-path stats */
-struct cn10k_ml_layer_stats {
+struct cn10k_ml_layer_xstats {
/* Total hardware latency, sum of all inferences */
uint64_t hw_latency_tot;
 
@@ -447,10 +447,10 @@ struct cn10k_ml_layer_data {
struct cnxk_ml_req *req;
 
/* Layer: Stats for burst ops */
-   struct cn10k_ml_layer_stats *burst_stats;
+   struct cn10k_ml_layer_xstats *burst_xstats;
 
/* Layer: Stats for sync ops */
-   struct cn10k_ml_layer_stats *sync_stats;
+   struct cn10k_ml_layer_xstats *sync_xstats;
 };
 
 struct cn10k_ml_model_data {
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index caee09829b..42a4389bbe 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -10,6 +10,7 @@
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
+#include "cnxk_ml_xstats.h"
 
 /* ML model macros */
 #define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
@@ -425,26 +426,6 @@ cn10k_ml_prep_fp_job_descriptor(struct cn10k_ml_dev 
*cn10k_mldev, struct cnxk_ml
req->cn10k_req.jd.model_run.num_batches = op->nb_batches;
 }
 
-struct xstat_info {
-   char name[32];
-   enum cn10k_ml_xstats_type type;
-   uint8_t reset_allowed;
-};
-

[PATCH v6 06/34] ml/cnxk: rename cnxk ops function pointers struct

2023-10-18 Thread Srikanth Yalavarthi
Renamed cn10k ML ops structure with cnxk prefix.

Signed-off-by: Srikanth Yalavarthi 
---
 drivers/ml/cnxk/cn10k_ml_dev.c |  2 +-
 drivers/ml/cnxk/cn10k_ml_ops.c | 73 +-
 drivers/ml/cnxk/cn10k_ml_ops.h | 34 +++-
 drivers/ml/cnxk/cnxk_ml_ops.c  | 36 +
 drivers/ml/cnxk/cnxk_ml_ops.h  |  2 +
 5 files changed, 91 insertions(+), 56 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index fc6f78d414..91813e9d0a 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -345,7 +345,7 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_de
goto pmd_destroy;
}
 
-   dev->dev_ops = &cn10k_ml_ops;
+   dev->dev_ops = &cnxk_ml_ops;
} else {
plt_err("CN10K ML Ops are not supported on secondary process");
dev->dev_ops = &ml_dev_dummy_ops;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 42a4389bbe..66b38fc1eb 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -119,7 +119,7 @@ cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct 
cnxk_ml_qp *qp)
return 0;
 }
 
-static int
+int
 cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id)
 {
struct cnxk_ml_qp *qp;
@@ -860,7 +860,7 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t 
model_id)
return ret;
 }
 
-static int
+int
 cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
struct cn10k_ml_dev *cn10k_mldev;
@@ -888,7 +888,7 @@ cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct 
rte_ml_dev_info *dev_info)
return 0;
 }
 
-static int
+int
 cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config 
*conf)
 {
struct rte_ml_dev_info dev_info;
@@ -1087,7 +1087,7 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const 
struct rte_ml_dev_config *c
return ret;
 }
 
-static int
+int
 cn10k_ml_dev_close(struct rte_ml_dev *dev)
 {
struct cn10k_ml_dev *cn10k_mldev;
@@ -1160,7 +1160,7 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
return rte_dev_remove(dev->device);
 }
 
-static int
+int
 cn10k_ml_dev_start(struct rte_ml_dev *dev)
 {
struct cn10k_ml_dev *cn10k_mldev;
@@ -1180,7 +1180,7 @@ cn10k_ml_dev_start(struct rte_ml_dev *dev)
return 0;
 }
 
-static int
+int
 cn10k_ml_dev_stop(struct rte_ml_dev *dev)
 {
struct cn10k_ml_dev *cn10k_mldev;
@@ -1200,7 +1200,7 @@ cn10k_ml_dev_stop(struct rte_ml_dev *dev)
return 0;
 }
 
-static int
+int
 cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
  const struct rte_ml_dev_qp_conf *qp_conf, int 
socket_id)
 {
@@ -1241,7 +1241,7 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, 
uint16_t queue_pair_id,
return 0;
 }
 
-static int
+int
 cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
 {
struct cnxk_ml_qp *qp;
@@ -1258,7 +1258,7 @@ cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct 
rte_ml_dev_stats *stats)
return 0;
 }
 
-static void
+void
 cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev)
 {
struct cnxk_ml_qp *qp;
@@ -1273,7 +1273,7 @@ cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev)
}
 }
 
-static int
+int
 cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum 
rte_ml_dev_xstats_mode mode,
  int32_t model_id, struct rte_ml_dev_xstats_map 
*xstats_map,
  uint32_t size)
@@ -1321,7 +1321,7 @@ cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, 
enum rte_ml_dev_xstats_mod
return idx;
 }
 
-static int
+int
 cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, 
uint16_t *stat_id,
uint64_t *value)
 {
@@ -1363,7 +1363,7 @@ cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, 
const char *name, uint16
return -EINVAL;
 }
 
-static int
+int
 cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode 
mode, int32_t model_id,
const uint16_t stat_ids[], uint64_t values[], uint16_t 
nb_ids)
 {
@@ -1427,7 +1427,7 @@ cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum 
rte_ml_dev_xstats_mode mode
return idx;
 }
 
-static int
+int
 cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode 
mode,
  int32_t model_id, const uint16_t stat_ids[], uint16_t 
nb_ids)
 {
@@ -1441,7 +1441,7 @@ cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum 
rte_ml_dev_xstats_mode mo
return 0;
 }
 
-static int
+int
 cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 {
struct cn10k_ml_dev *cn10k_mldev;
@@ -1528,7 +1528,7 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
return 0;
 }
 
-static i

[PATCH v6 03/34] ml/cnxk: add generic model and layer structures

2023-10-18 Thread Srikanth Yalavarthi
Introduce generic cnxk model and layer structure. These
structures would enable supporting models with multiple
layers. A model is a collection of multiple independent
layers with flow dependencies between the layers.

Signed-off-by: Srikanth Yalavarthi 
---
 drivers/ml/cnxk/cn10k_ml_dev.h   |   9 +-
 drivers/ml/cnxk/cn10k_ml_model.c | 245 
 drivers/ml/cnxk/cn10k_ml_model.h | 122 ++--
 drivers/ml/cnxk/cn10k_ml_ocm.c   |  50 ++--
 drivers/ml/cnxk/cn10k_ml_ocm.h   |   9 +-
 drivers/ml/cnxk/cn10k_ml_ops.c   | 488 +--
 drivers/ml/cnxk/cnxk_ml_io.h |  79 +
 drivers/ml/cnxk/cnxk_ml_model.c  |   7 +
 drivers/ml/cnxk/cnxk_ml_model.h  | 111 +++
 drivers/ml/cnxk/meson.build  |   1 +
 10 files changed, 651 insertions(+), 470 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_io.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_model.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_model.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index f9da1548c4..99ff0a344a 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -9,6 +9,8 @@
 
 #include "cn10k_ml_ocm.h"
 
+#include "cnxk_ml_io.h"
+
 /* Dummy Device ops */
 extern struct rte_ml_dev_ops ml_dev_dummy_ops;
 
@@ -21,9 +23,6 @@ extern struct rte_ml_dev_ops ml_dev_dummy_ops;
 /* Device alignment size */
 #define ML_CN10K_ALIGN_SIZE 128
 
-/* Maximum number of models per device */
-#define ML_CN10K_MAX_MODELS 16
-
 /* Maximum number of queue-pairs per device, spinlock version */
 #define ML_CN10K_MAX_QP_PER_DEVICE_SL 16
 
@@ -455,8 +454,8 @@ struct cn10k_ml_xstats {
struct cn10k_ml_xstats_entry *entries;
 
/* Store num stats and offset of the stats for each model */
-   uint16_t count_per_model[ML_CN10K_MAX_MODELS];
-   uint16_t offset_for_model[ML_CN10K_MAX_MODELS];
+   uint16_t count_per_model[ML_CNXK_MAX_MODELS];
+   uint16_t offset_for_model[ML_CNXK_MAX_MODELS];
uint16_t count_mode_device;
uint16_t count_mode_model;
uint16_t count;
diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index cc46ca2efd..d747bba151 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -6,10 +6,10 @@
 
 #include 
 
-#include "cn10k_ml_model.h"
 #include "cn10k_ml_ocm.h"
 
 #include "cnxk_ml_dev.h"
+#include "cnxk_ml_model.h"
 
 static enum rte_ml_io_type
 cn10k_ml_io_type_map(uint8_t type)
@@ -311,19 +311,17 @@ cn10k_ml_model_metadata_update(struct 
cn10k_ml_model_metadata *metadata)
 }
 
 void
-cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer, 
uint8_t *base_dma_addr)
+cn10k_ml_layer_addr_update(struct cnxk_ml_layer *layer, uint8_t *buffer, 
uint8_t *base_dma_addr)
 {
struct cn10k_ml_model_metadata *metadata;
-   struct cn10k_ml_model_addr *addr;
+   struct cn10k_ml_layer_addr *addr;
size_t model_data_size;
uint8_t *dma_addr_load;
uint8_t *dma_addr_run;
-   uint8_t i;
-   uint8_t j;
int fpos;
 
-   metadata = &model->metadata;
-   addr = &model->addr;
+   metadata = &layer->glow.metadata;
+   addr = &layer->glow.addr;
model_data_size = metadata->init_model.file_size + 
metadata->main_model.file_size +
  metadata->finish_model.file_size + 
metadata->weights_bias.file_size;
 
@@ -361,102 +359,136 @@ cn10k_ml_model_addr_update(struct cn10k_ml_model 
*model, uint8_t *buffer, uint8_
addr->wb_base_addr = PLT_PTR_SUB(dma_addr_load, 
metadata->weights_bias.mem_offset);
addr->wb_load_addr = PLT_PTR_ADD(addr->wb_base_addr, 
metadata->weights_bias.mem_offset);
rte_memcpy(addr->wb_load_addr, PLT_PTR_ADD(buffer, fpos), 
metadata->weights_bias.file_size);
+}
+
+void
+cn10k_ml_layer_info_update(struct cnxk_ml_layer *layer)
+{
+   struct cn10k_ml_model_metadata *metadata;
+   uint8_t i;
+   uint8_t j;
+
+   metadata = &layer->glow.metadata;
 
/* Inputs */
-   addr->total_input_sz_d = 0;
-   addr->total_input_sz_q = 0;
+   layer->info.nb_inputs = metadata->model.num_input;
+   layer->info.total_input_sz_d = 0;
+   layer->info.total_input_sz_q = 0;
for (i = 0; i < metadata->model.num_input; i++) {
if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-   addr->input[i].nb_dims = 4;
-   addr->input[i].shape[0] = metadata->input1[i].shape.w;
-   addr->input[i].shape[1] = metadata->input1[i].shape.x;
-   addr->input[i].shape[2] = metadata->input1[i].shape.y;
-   addr->input[i].shape[3] = metadata->input1[i].shape.z;
-
-   addr->input[i].nb_elements =
+   strncpy(layer->info.input[i].name, (char 
*)metadata->input1[i].input_name,
+   MRVL_ML_INPUT_NAME_LEN);
+   layer->info.input[i].dty

[PATCH v6 07/34] ml/cnxk: update device handling functions

2023-10-18 Thread Srikanth Yalavarthi
Implement CNXK wrapper functions for dev_info_get,
dev_configure, dev_close, dev_start and dev_stop. The
wrapper functions allocate / release common resources
for the ML driver and invoke device specific functions.

Signed-off-by: Srikanth Yalavarthi 
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 230 ++
 drivers/ml/cnxk/cn10k_ml_ops.h |  16 +-
 drivers/ml/cnxk/cnxk_ml_dev.h  |   3 +
 drivers/ml/cnxk/cnxk_ml_ops.c  | 286 -
 drivers/ml/cnxk/cnxk_ml_ops.h  |   3 +
 5 files changed, 314 insertions(+), 224 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 66b38fc1eb..6d8f2c8777 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -101,7 +101,7 @@ qp_memzone_name_get(char *name, int size, int dev_id, int 
qp_id)
snprintf(name, size, "cnxk_ml_qp_mem_%u:%u", dev_id, qp_id);
 }
 
-static int
+int
 cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp)
 {
const struct rte_memzone *qp_mem;
@@ -861,20 +861,12 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, 
uint16_t model_id)
 }
 
 int
-cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
+cn10k_ml_dev_info_get(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_dev_info 
*dev_info)
 {
struct cn10k_ml_dev *cn10k_mldev;
-   struct cnxk_ml_dev *cnxk_mldev;
 
-   if (dev_info == NULL)
-   return -EINVAL;
-
-   cnxk_mldev = dev->data->dev_private;
cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
-   memset(dev_info, 0, sizeof(struct rte_ml_dev_info));
-   dev_info->driver_name = dev->device->driver->name;
-   dev_info->max_models = ML_CNXK_MAX_MODELS;
if (cn10k_mldev->hw_queue_lock)
dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_SL;
else
@@ -889,143 +881,17 @@ cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct 
rte_ml_dev_info *dev_info)
 }
 
 int
-cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config 
*conf)
+cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct 
rte_ml_dev_config *conf)
 {
-   struct rte_ml_dev_info dev_info;
struct cn10k_ml_dev *cn10k_mldev;
-   struct cnxk_ml_dev *cnxk_mldev;
-   struct cnxk_ml_model *model;
struct cn10k_ml_ocm *ocm;
-   struct cnxk_ml_qp *qp;
-   uint16_t model_id;
-   uint32_t mz_size;
uint16_t tile_id;
-   uint16_t qp_id;
int ret;
 
-   if (dev == NULL || conf == NULL)
-   return -EINVAL;
+   RTE_SET_USED(conf);
 
-   /* Get CN10K device handle */
-   cnxk_mldev = dev->data->dev_private;
cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
-   cn10k_ml_dev_info_get(dev, &dev_info);
-   if (conf->nb_models > dev_info.max_models) {
-   plt_err("Invalid device config, nb_models > %u\n", 
dev_info.max_models);
-   return -EINVAL;
-   }
-
-   if (conf->nb_queue_pairs > dev_info.max_queue_pairs) {
-   plt_err("Invalid device config, nb_queue_pairs > %u\n", 
dev_info.max_queue_pairs);
-   return -EINVAL;
-   }
-
-   if (cnxk_mldev->state == ML_CNXK_DEV_STATE_PROBED) {
-   plt_ml_dbg("Configuring ML device, nb_queue_pairs = %u, 
nb_models = %u",
-  conf->nb_queue_pairs, conf->nb_models);
-
-   /* Load firmware */
-   ret = cn10k_ml_fw_load(cnxk_mldev);
-   if (ret != 0)
-   return ret;
-   } else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CONFIGURED) {
-   plt_ml_dbg("Re-configuring ML device, nb_queue_pairs = %u, 
nb_models = %u",
-  conf->nb_queue_pairs, conf->nb_models);
-   } else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_STARTED) {
-   plt_err("Device can't be reconfigured in started state\n");
-   return -ENOTSUP;
-   } else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CLOSED) {
-   plt_err("Device can't be reconfigured after close\n");
-   return -ENOTSUP;
-   }
-
-   /* Configure queue-pairs */
-   if (dev->data->queue_pairs == NULL) {
-   mz_size = sizeof(dev->data->queue_pairs[0]) * 
conf->nb_queue_pairs;
-   dev->data->queue_pairs =
-   rte_zmalloc("cn10k_mldev_queue_pairs", mz_size, 
RTE_CACHE_LINE_SIZE);
-   if (dev->data->queue_pairs == NULL) {
-   dev->data->nb_queue_pairs = 0;
-   plt_err("Failed to get memory for queue_pairs, 
nb_queue_pairs %u",
-   conf->nb_queue_pairs);
-   return -ENOMEM;
-   }
-   } else { /* Re-configure */
-   void **queue_pairs;
-
-   /* Release all queue pairs as ML spec doesn't support 
queue_pair_destroy. */
-   for (qp_id = 0; qp_id < dev->data->nb_

[PATCH v6 04/34] ml/cnxk: add generic cnxk request structure

2023-10-18 Thread Srikanth Yalavarthi
Added generic cnxk request structure. Moved common fields
from cn10k structures to cnxk structure. Moved job related
structures and enumerations to ops headers.

Signed-off-by: Srikanth Yalavarthi 
---
 drivers/ml/cnxk/cn10k_ml_dev.c   |  72 +++
 drivers/ml/cnxk/cn10k_ml_dev.h   | 269 +
 drivers/ml/cnxk/cn10k_ml_model.c |   6 +-
 drivers/ml/cnxk/cn10k_ml_model.h |   4 +-
 drivers/ml/cnxk/cn10k_ml_ops.c   | 331 +--
 drivers/ml/cnxk/cn10k_ml_ops.h   | 296 +++
 drivers/ml/cnxk/cnxk_ml_ops.c|   7 +
 drivers/ml/cnxk/cnxk_ml_ops.h|  63 ++
 drivers/ml/cnxk/meson.build  |   1 +
 9 files changed, 557 insertions(+), 492 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_ops.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_ops.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 3bc61443d8..fc6f78d414 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -14,9 +14,8 @@
 
 #include 
 
-#include "cn10k_ml_ops.h"
-
 #include "cnxk_ml_dev.h"
+#include "cnxk_ml_ops.h"
 
 #define CN10K_ML_FW_PATH   "fw_path"
 #define CN10K_ML_FW_ENABLE_DPE_WARNINGS "enable_dpe_warnings"
@@ -400,20 +399,23 @@ cn10k_ml_pci_remove(struct rte_pci_device *pci_dev)
 static void
 cn10k_ml_fw_print_info(struct cn10k_ml_fw *fw)
 {
-   plt_info("ML Firmware Version = %s", fw->req->jd.fw_load.version);
-
-   plt_ml_dbg("Firmware capabilities = 0x%016lx", 
fw->req->jd.fw_load.cap.u64);
-   plt_ml_dbg("Version = %s", fw->req->jd.fw_load.version);
-   plt_ml_dbg("core0_debug_ptr = 0x%016lx", 
fw->req->jd.fw_load.debug.core0_debug_ptr);
-   plt_ml_dbg("core1_debug_ptr = 0x%016lx", 
fw->req->jd.fw_load.debug.core1_debug_ptr);
-   plt_ml_dbg("debug_buffer_size = %u bytes", 
fw->req->jd.fw_load.debug.debug_buffer_size);
+   plt_info("ML Firmware Version = %s", 
fw->req->cn10k_req.jd.fw_load.version);
+
+   plt_ml_dbg("Firmware capabilities = 0x%016lx", 
fw->req->cn10k_req.jd.fw_load.cap.u64);
+   plt_ml_dbg("Version = %s", fw->req->cn10k_req.jd.fw_load.version);
+   plt_ml_dbg("core0_debug_ptr = 0x%016lx",
+  fw->req->cn10k_req.jd.fw_load.debug.core0_debug_ptr);
+   plt_ml_dbg("core1_debug_ptr = 0x%016lx",
+  fw->req->cn10k_req.jd.fw_load.debug.core1_debug_ptr);
+   plt_ml_dbg("debug_buffer_size = %u bytes",
+  fw->req->cn10k_req.jd.fw_load.debug.debug_buffer_size);
plt_ml_dbg("core0_exception_buffer = 0x%016lx",
-  fw->req->jd.fw_load.debug.core0_exception_buffer);
+  fw->req->cn10k_req.jd.fw_load.debug.core0_exception_buffer);
plt_ml_dbg("core1_exception_buffer = 0x%016lx",
-  fw->req->jd.fw_load.debug.core1_exception_buffer);
+  fw->req->cn10k_req.jd.fw_load.debug.core1_exception_buffer);
plt_ml_dbg("exception_state_size = %u bytes",
-  fw->req->jd.fw_load.debug.exception_state_size);
-   plt_ml_dbg("flags = 0x%016lx", fw->req->jd.fw_load.flags);
+  fw->req->cn10k_req.jd.fw_load.debug.exception_state_size);
+   plt_ml_dbg("flags = 0x%016lx", fw->req->cn10k_req.jd.fw_load.flags);
 }
 
 uint64_t
@@ -458,29 +460,30 @@ cn10k_ml_fw_load_asim(struct cn10k_ml_fw *fw)
roc_ml_reg_save(&cn10k_mldev->roc, ML_MLR_BASE);
 
/* Update FW load completion structure */
-   fw->req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->status);
-   fw->req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
-   fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, 
&fw->req->result);
-   fw->req->jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
-   plt_write64(ML_CNXK_POLL_JOB_START, &fw->req->status);
+   fw->req->cn10k_req.jd.hdr.jce.w1.u64 = 
PLT_U64_CAST(&fw->req->cn10k_req.status);
+   fw->req->cn10k_req.jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
+   fw->req->cn10k_req.jd.hdr.result =
+   roc_ml_addr_ap2mlip(&cn10k_mldev->roc, 
&fw->req->cn10k_req.result);
+   fw->req->cn10k_req.jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
+   plt_write64(ML_CNXK_POLL_JOB_START, &fw->req->cn10k_req.status);
plt_wmb();
 
/* Enqueue FW load through scratch registers */
timeout = true;
timeout_cycle = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
-   roc_ml_scratch_enqueue(&cn10k_mldev->roc, &fw->req->jd);
+   roc_ml_scratch_enqueue(&cn10k_mldev->roc, &fw->req->cn10k_req.jd);
 
plt_rmb();
do {
if (roc_ml_scratch_is_done_bit_set(&cn10k_mldev->roc) &&
-   (plt_read64(&fw->req->status) == ML_CNXK_POLL_JOB_FINISH)) {
+   (plt_read64(&fw->req->cn10k_req.status) == 
ML_CNXK_POLL_JOB_FINISH)) {
timeout = false;
break;
}
} while (plt_tsc_cycles() < tim

[PATCH v6 08/34] ml/cnxk: update queue-pair handling functions

2023-10-18 Thread Srikanth Yalavarthi
Added cnxk wrapper function to handle ML device queue-pairs.

Signed-off-by: Srikanth Yalavarthi 
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 135 +
 drivers/ml/cnxk/cn10k_ml_ops.h |   7 +-
 drivers/ml/cnxk/cnxk_ml_ops.c  | 153 -
 drivers/ml/cnxk/cnxk_ml_ops.h  |   3 -
 4 files changed, 154 insertions(+), 144 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 6d8f2c8777..e3c688a55f 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -95,93 +95,12 @@ cn10k_ml_get_poll_ptr(struct cnxk_ml_req *req)
return plt_read64(req->status);
 }
 
-static void
-qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
-{
-   snprintf(name, size, "cnxk_ml_qp_mem_%u:%u", dev_id, qp_id);
-}
-
-int
-cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp)
-{
-   const struct rte_memzone *qp_mem;
-   char name[RTE_MEMZONE_NAMESIZE];
-   int ret;
-
-   qp_memzone_name_get(name, RTE_MEMZONE_NAMESIZE, dev->data->dev_id, 
qp->id);
-   qp_mem = rte_memzone_lookup(name);
-   ret = rte_memzone_free(qp_mem);
-   if (ret)
-   return ret;
-
-   rte_free(qp);
-
-   return 0;
-}
-
-int
-cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id)
-{
-   struct cnxk_ml_qp *qp;
-   int ret;
-
-   qp = dev->data->queue_pairs[queue_pair_id];
-   if (qp == NULL)
-   return -EINVAL;
-
-   ret = cnxk_ml_qp_destroy(dev, qp);
-   if (ret) {
-   plt_err("Could not destroy queue pair %u", queue_pair_id);
-   return ret;
-   }
-
-   dev->data->queue_pairs[queue_pair_id] = NULL;
-
-   return 0;
-}
-
-static struct cnxk_ml_qp *
-cnxk_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t 
nb_desc, int socket_id)
+void
+cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp)
 {
-   const struct rte_memzone *qp_mem;
-   char name[RTE_MEMZONE_NAMESIZE];
-   struct cnxk_ml_qp *qp;
-   uint32_t len;
-   uint8_t *va;
uint64_t i;
 
-   /* Allocate queue pair */
-   qp = rte_zmalloc_socket("cn10k_ml_pmd_queue_pair", sizeof(struct 
cnxk_ml_qp), ROC_ALIGN,
-   socket_id);
-   if (qp == NULL) {
-   plt_err("Could not allocate queue pair");
-   return NULL;
-   }
-
-   /* For request queue */
-   len = nb_desc * sizeof(struct cnxk_ml_req);
-   qp_memzone_name_get(name, RTE_MEMZONE_NAMESIZE, dev->data->dev_id, 
qp_id);
-   qp_mem = rte_memzone_reserve_aligned(
-   name, len, socket_id, RTE_MEMZONE_SIZE_HINT_ONLY | 
RTE_MEMZONE_256MB, ROC_ALIGN);
-   if (qp_mem == NULL) {
-   plt_err("Could not reserve memzone: %s", name);
-   goto qp_free;
-   }
-
-   va = qp_mem->addr;
-   memset(va, 0, len);
-
-   /* Initialize Request queue */
-   qp->id = qp_id;
-   qp->queue.reqs = (struct cnxk_ml_req *)va;
-   qp->queue.head = 0;
-   qp->queue.tail = 0;
-   qp->queue.wait_cycles = ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
-   qp->nb_desc = nb_desc;
-   qp->stats.enqueued_count = 0;
-   qp->stats.dequeued_count = 0;
-   qp->stats.enqueue_err_count = 0;
-   qp->stats.dequeue_err_count = 0;
+   RTE_SET_USED(cnxk_mldev);
 
/* Initialize job command */
for (i = 0; i < qp->nb_desc; i++) {
@@ -189,13 +108,6 @@ cnxk_ml_qp_create(const struct rte_ml_dev *dev, uint16_t 
qp_id, uint32_t nb_desc
qp->queue.reqs[i].cn10k_req.jcmd.w1.s.jobptr =
PLT_U64_CAST(&qp->queue.reqs[i].cn10k_req.jd);
}
-
-   return qp;
-
-qp_free:
-   rte_free(qp);
-
-   return NULL;
 }
 
 static void
@@ -1002,47 +914,6 @@ cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev)
return 0;
 }
 
-int
-cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
- const struct rte_ml_dev_qp_conf *qp_conf, int 
socket_id)
-{
-   struct rte_ml_dev_info dev_info;
-   struct cnxk_ml_qp *qp;
-   uint32_t nb_desc;
-
-   if (queue_pair_id >= dev->data->nb_queue_pairs) {
-   plt_err("Queue-pair id = %u (>= max queue pairs supported, 
%u)\n", queue_pair_id,
-   dev->data->nb_queue_pairs);
-   return -EINVAL;
-   }
-
-   if (dev->data->queue_pairs[queue_pair_id] != NULL)
-   cn10k_ml_dev_queue_pair_release(dev, queue_pair_id);
-
-   cnxk_ml_dev_info_get(dev, &dev_info);
-   if ((qp_conf->nb_desc > dev_info.max_desc) || (qp_conf->nb_desc == 0)) {
-   plt_err("Could not setup queue pair for %u descriptors", 
qp_conf->nb_desc);
-   return -EINVAL;
-   }
-   plt_ml_dbg("Creating queue-pair, queue_pair_id = %u, nb_desc = %u", 
queue_pair_id,
-  qp

[PATCH v6 09/34] ml/cnxk: update model load and unload functions

2023-10-18 Thread Srikanth Yalavarthi
Implemented cnxk wrapper functions to load and unload
ML models. Wrapper functions would invoke the cn10k
model load and unload functions.

Signed-off-by: Srikanth Yalavarthi 
---
 drivers/ml/cnxk/cn10k_ml_model.c | 244 -
 drivers/ml/cnxk/cn10k_ml_model.h |  26 ++-
 drivers/ml/cnxk/cn10k_ml_ops.c   | 296 ++-
 drivers/ml/cnxk/cn10k_ml_ops.h   |  12 +-
 drivers/ml/cnxk/cnxk_ml_dev.h|  15 ++
 drivers/ml/cnxk/cnxk_ml_ops.c| 144 ++-
 drivers/ml/cnxk/cnxk_ml_ops.h|   2 +
 7 files changed, 462 insertions(+), 277 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index 5d37e9bf8a..69a60b9b90 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -316,42 +316,31 @@ cn10k_ml_layer_addr_update(struct cnxk_ml_layer *layer, 
uint8_t *buffer, uint8_t
 {
struct cn10k_ml_model_metadata *metadata;
struct cn10k_ml_layer_addr *addr;
-   size_t model_data_size;
uint8_t *dma_addr_load;
-   uint8_t *dma_addr_run;
int fpos;
 
metadata = &layer->glow.metadata;
addr = &layer->glow.addr;
-   model_data_size = metadata->init_model.file_size + 
metadata->main_model.file_size +
- metadata->finish_model.file_size + 
metadata->weights_bias.file_size;
 
/* Base address */
addr->base_dma_addr_load = base_dma_addr;
-   addr->base_dma_addr_run = PLT_PTR_ADD(addr->base_dma_addr_load, 
model_data_size);
 
/* Init section */
dma_addr_load = addr->base_dma_addr_load;
-   dma_addr_run = addr->base_dma_addr_run;
fpos = sizeof(struct cn10k_ml_model_metadata);
addr->init_load_addr = dma_addr_load;
-   addr->init_run_addr = dma_addr_run;
rte_memcpy(dma_addr_load, PLT_PTR_ADD(buffer, fpos), 
metadata->init_model.file_size);
 
/* Main section */
dma_addr_load += metadata->init_model.file_size;
-   dma_addr_run += metadata->init_model.file_size;
fpos += metadata->init_model.file_size;
addr->main_load_addr = dma_addr_load;
-   addr->main_run_addr = dma_addr_run;
rte_memcpy(dma_addr_load, PLT_PTR_ADD(buffer, fpos), 
metadata->main_model.file_size);
 
/* Finish section */
dma_addr_load += metadata->main_model.file_size;
-   dma_addr_run += metadata->main_model.file_size;
fpos += metadata->main_model.file_size;
addr->finish_load_addr = dma_addr_load;
-   addr->finish_run_addr = dma_addr_run;
rte_memcpy(dma_addr_load, PLT_PTR_ADD(buffer, fpos), 
metadata->finish_model.file_size);
 
/* Weights and Bias section */
@@ -363,140 +352,146 @@ cn10k_ml_layer_addr_update(struct cnxk_ml_layer *layer, 
uint8_t *buffer, uint8_t
 }
 
 void
-cn10k_ml_layer_info_update(struct cnxk_ml_layer *layer)
+cn10k_ml_layer_io_info_set(struct cnxk_ml_io_info *io_info,
+  struct cn10k_ml_model_metadata *metadata)
 {
-   struct cn10k_ml_model_metadata *metadata;
uint8_t i;
uint8_t j;
 
-   metadata = &layer->glow.metadata;
-
/* Inputs */
-   layer->info.nb_inputs = metadata->model.num_input;
-   layer->info.total_input_sz_d = 0;
-   layer->info.total_input_sz_q = 0;
+   io_info->nb_inputs = metadata->model.num_input;
+   io_info->total_input_sz_d = 0;
+   io_info->total_input_sz_q = 0;
for (i = 0; i < metadata->model.num_input; i++) {
if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-   strncpy(layer->info.input[i].name, (char 
*)metadata->input1[i].input_name,
+   strncpy(io_info->input[i].name, (char 
*)metadata->input1[i].input_name,
MRVL_ML_INPUT_NAME_LEN);
-   layer->info.input[i].dtype = 
metadata->input1[i].input_type;
-   layer->info.input[i].qtype = 
metadata->input1[i].model_input_type;
-   layer->info.input[i].nb_dims = 4;
-   layer->info.input[i].shape[0] = 
metadata->input1[i].shape.w;
-   layer->info.input[i].shape[1] = 
metadata->input1[i].shape.x;
-   layer->info.input[i].shape[2] = 
metadata->input1[i].shape.y;
-   layer->info.input[i].shape[3] = 
metadata->input1[i].shape.z;
-   layer->info.input[i].nb_elements =
+   io_info->input[i].dtype = 
metadata->input1[i].input_type;
+   io_info->input[i].qtype = 
metadata->input1[i].model_input_type;
+   io_info->input[i].nb_dims = 4;
+   io_info->input[i].shape[0] = 
metadata->input1[i].shape.w;
+   io_info->input[i].shape[1] = 
metadata->input1[i].shape.x;
+   io_info->input[i].shape[2] = 
metadata->input1[i].shape.y;
+   io_info->input[i].shape[3] = 
metadata->inpu

[PATCH v6 10/34] ml/cnxk: update model start and stop functions

2023-10-18 Thread Srikanth Yalavarthi
Implemented cnxk wrapper functions to start and stop
ML models. Wrapper functions would invoke the cn10k
model start and stop functions.

Signed-off-by: Srikanth Yalavarthi 
---
 drivers/ml/cnxk/cn10k_ml_ocm.c |  28 ++--
 drivers/ml/cnxk/cn10k_ml_ocm.h |  12 +-
 drivers/ml/cnxk/cn10k_ml_ops.c | 282 -
 drivers/ml/cnxk/cn10k_ml_ops.h |   8 +-
 drivers/ml/cnxk/cnxk_ml_ops.c  |  48 +-
 drivers/ml/cnxk/cnxk_ml_ops.h  |   1 +
 6 files changed, 240 insertions(+), 139 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index d71c36eae6..2197e5e0ed 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -215,11 +215,10 @@ cn10k_ml_ocm_tilecount(uint64_t tilemask, int *start, int 
*end)
  * scratch & WB pages and OCM allocation mode.
  */
 int
-cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t 
wb_pages,
+cn10k_ml_ocm_tilemask_find(struct cnxk_ml_dev *cnxk_mldev, uint8_t num_tiles, 
uint16_t wb_pages,
   uint16_t scratch_pages, uint64_t *tilemask)
 {
struct cn10k_ml_dev *cn10k_mldev;
-   struct cnxk_ml_dev *cnxk_mldev;
struct cn10k_ml_ocm *ocm;
 
uint16_t used_scratch_pages_max;
@@ -238,7 +237,6 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t 
num_tiles, uint16_t w
int max_slot_sz;
int page_id;
 
-   cnxk_mldev = dev->data->dev_private;
cn10k_mldev = &cnxk_mldev->cn10k_mldev;
ocm = &cn10k_mldev->ocm;
 
@@ -333,12 +331,10 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, 
uint8_t num_tiles, uint16_t w
 }
 
 void
-cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t 
layer_id,
+cn10k_ml_ocm_reserve_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id, 
uint16_t layer_id,
   uint64_t tilemask, int wb_page_start, uint16_t 
wb_pages,
   uint16_t scratch_pages)
 {
-   struct cn10k_ml_dev *cn10k_mldev;
-   struct cnxk_ml_dev *cnxk_mldev;
struct cnxk_ml_model *model;
struct cnxk_ml_layer *layer;
struct cn10k_ml_ocm *ocm;
@@ -351,10 +347,8 @@ cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, 
uint16_t model_id, uint16_t l
int tile_id;
int page_id;
 
-   cnxk_mldev = dev->data->dev_private;
-   cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-   ocm = &cn10k_mldev->ocm;
-   model = dev->data->models[model_id];
+   ocm = &cnxk_mldev->cn10k_mldev.ocm;
+   model = cnxk_mldev->mldev->data->models[model_id];
layer = &model->layer[layer_id];
 
/* Get first set bit, tile_start */
@@ -396,12 +390,10 @@ cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, 
uint16_t model_id, uint16_t l
 }
 
 void
-cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t 
layer_id)
+cn10k_ml_ocm_free_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id, 
uint16_t layer_id)
 {
struct cnxk_ml_model *local_model;
struct cnxk_ml_layer *local_layer;
-   struct cn10k_ml_dev *cn10k_mldev;
-   struct cnxk_ml_dev *cnxk_mldev;
struct cnxk_ml_model *model;
struct cnxk_ml_layer *layer;
struct cn10k_ml_ocm *ocm;
@@ -416,10 +408,8 @@ cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t 
model_id, uint16_t laye
uint16_t i;
uint16_t j;
 
-   cnxk_mldev = dev->data->dev_private;
-   cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-   ocm = &cn10k_mldev->ocm;
-   model = dev->data->models[model_id];
+   ocm = &cnxk_mldev->cn10k_mldev.ocm;
+   model = cnxk_mldev->mldev->data->models[model_id];
layer = &model->layer[layer_id];
 
/* Update OCM info for WB memory */
@@ -438,8 +428,8 @@ cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t 
model_id, uint16_t laye
 
/* Get max scratch pages required, excluding the current model 
*/
scratch_resize_pages = 0;
-   for (i = 0; i < dev->data->nb_models; i++) {
-   local_model = dev->data->models[i];
+   for (i = 0; i < cnxk_mldev->mldev->data->nb_models; i++) {
+   local_model = cnxk_mldev->mldev->data->models[i];
if (local_model == NULL)
continue;
 
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
index 720f8caf76..97b723a56a 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.h
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -8,6 +8,8 @@
 #include 
 #include 
 
+struct cnxk_ml_dev;
+
 /* Number of OCM tiles. */
 #define ML_CN10K_OCM_NUMTILES 0x8
 
@@ -75,12 +77,12 @@ struct cn10k_ml_ocm {
 };
 
 int cn10k_ml_ocm_tilecount(uint64_t tilemask, int *start, int *end);
-int cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, 
uint16_t wb_pages,
+int cn10k_ml_ocm_tilemask_find(struct cnxk_ml_dev *cnxk_mldev, uint8_t 
num_tiles, uint16_t wb_pages,
  

[PATCH v6 11/34] ml/cnxk: update model utility functions

2023-10-18 Thread Srikanth Yalavarthi
Added cnxk wrapper function to update model params and
fetch model info.

Signed-off-by: Srikanth Yalavarthi 
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 38 ++-
 drivers/ml/cnxk/cn10k_ml_ops.h |  5 ++--
 drivers/ml/cnxk/cnxk_ml_ops.c  | 48 --
 3 files changed, 56 insertions(+), 35 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index c677861645..c0d6216485 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -1835,45 +1835,23 @@ cn10k_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, 
struct cnxk_ml_model *model)
 }
 
 int
-cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
-   struct rte_ml_model_info *model_info)
+cn10k_ml_model_params_update(struct cnxk_ml_dev *cnxk_mldev, struct 
cnxk_ml_model *model,
+void *buffer)
 {
-   struct cnxk_ml_model *model;
-
-   model = dev->data->models[model_id];
-
-   if (model == NULL) {
-   plt_err("Invalid model_id = %u", model_id);
-   return -EINVAL;
-   }
-
-   rte_memcpy(model_info, model->info, sizeof(struct rte_ml_model_info));
-   model_info->input_info = ((struct rte_ml_model_info 
*)model->info)->input_info;
-   model_info->output_info = ((struct rte_ml_model_info 
*)model->info)->output_info;
-
-   return 0;
-}
-
-int
-cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void 
*buffer)
-{
-   struct cnxk_ml_model *model;
-
-   model = dev->data->models[model_id];
+   struct cnxk_ml_layer *layer;
 
-   if (model == NULL) {
-   plt_err("Invalid model_id = %u", model_id);
-   return -EINVAL;
-   }
+   RTE_SET_USED(cnxk_mldev);
 
if (model->state == ML_CNXK_MODEL_STATE_UNKNOWN)
return -1;
else if (model->state != ML_CNXK_MODEL_STATE_LOADED)
return -EBUSY;
 
+   layer = &model->layer[0];
+
/* Update model weights & bias */
-   rte_memcpy(model->layer[0].glow.addr.wb_load_addr, buffer,
-  model->layer[0].glow.metadata.weights_bias.file_size);
+   rte_memcpy(layer->glow.addr.wb_load_addr, buffer,
+  layer->glow.metadata.weights_bias.file_size);
 
return 0;
 }
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index a222a43d55..ef12069f0d 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -317,9 +317,8 @@ int cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, 
struct rte_ml_model_para
 int cn10k_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model 
*model);
 int cn10k_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model 
*model);
 int cn10k_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model 
*model);
-int cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
-   struct rte_ml_model_info *model_info);
-int cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, 
void *buffer);
+int cn10k_ml_model_params_update(struct cnxk_ml_dev *cnxk_mldev, struct 
cnxk_ml_model *model,
+void *buffer);
 
 /* I/O ops */
 int cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id,
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index b61ed45876..9ce37fcfd1 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -604,6 +604,50 @@ cnxk_ml_model_stop(struct rte_ml_dev *dev, uint16_t 
model_id)
return cn10k_ml_model_stop(cnxk_mldev, model);
 }
 
+static int
+cnxk_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
+  struct rte_ml_model_info *model_info)
+{
+   struct rte_ml_model_info *info;
+   struct cnxk_ml_model *model;
+
+   if ((dev == NULL) || (model_info == NULL))
+   return -EINVAL;
+
+   model = dev->data->models[model_id];
+   if (model == NULL) {
+   plt_err("Invalid model_id = %u", model_id);
+   return -EINVAL;
+   }
+
+   info = (struct rte_ml_model_info *)model->info;
+   rte_memcpy(model_info, info, sizeof(struct rte_ml_model_info));
+   model_info->input_info = info->input_info;
+   model_info->output_info = info->output_info;
+
+   return 0;
+}
+
+static int
+cnxk_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void 
*buffer)
+{
+   struct cnxk_ml_dev *cnxk_mldev;
+   struct cnxk_ml_model *model;
+
+   if ((dev == NULL) || (buffer == NULL))
+   return -EINVAL;
+
+   cnxk_mldev = dev->data->dev_private;
+
+   model = dev->data->models[model_id];
+   if (model == NULL) {
+   plt_err("Invalid model_id = %u", model_id);
+   return -EINVAL;
+   }
+
+   return cn10k_ml_model_params_update(cnxk_mldev, model, buffer);
+}
+

[PATCH v6 12/34] ml/cnxk: update data quantization functions

2023-10-18 Thread Srikanth Yalavarthi
Added cnxk wrapper functions to quantize input data and
dequantize output data.

Signed-off-by: Srikanth Yalavarthi 
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 164 -
 drivers/ml/cnxk/cn10k_ml_ops.h |   7 --
 drivers/ml/cnxk/cnxk_ml_io.c   |  95 +++
 drivers/ml/cnxk/cnxk_ml_io.h   |   3 +
 drivers/ml/cnxk/cnxk_ml_ops.c  |  78 +++-
 drivers/ml/cnxk/meson.build|   1 +
 6 files changed, 175 insertions(+), 173 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_io.c

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index c0d6216485..ff190b7f86 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -1856,170 +1856,6 @@ cn10k_ml_model_params_update(struct cnxk_ml_dev 
*cnxk_mldev, struct cnxk_ml_mode
return 0;
 }
 
-int
-cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct 
rte_ml_buff_seg **dbuffer,
-struct rte_ml_buff_seg **qbuffer)
-{
-   struct cnxk_ml_model *model;
-   uint8_t model_input_type;
-   uint8_t *lcl_dbuffer;
-   uint8_t *lcl_qbuffer;
-   uint8_t input_type;
-   float qscale;
-   uint32_t i;
-   uint32_t j;
-   int ret;
-
-   model = dev->data->models[model_id];
-
-   if (model == NULL) {
-   plt_err("Invalid model_id = %u", model_id);
-   return -EINVAL;
-   }
-
-   lcl_dbuffer = dbuffer[0]->addr;
-   lcl_qbuffer = qbuffer[0]->addr;
-
-   for (i = 0; i < model->layer[0].glow.metadata.model.num_input; i++) {
-   if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-   input_type = 
model->layer[0].glow.metadata.input1[i].input_type;
-   model_input_type = 
model->layer[0].glow.metadata.input1[i].model_input_type;
-   qscale = model->layer[0].glow.metadata.input1[i].qscale;
-   } else {
-   j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-   input_type = 
model->layer[0].glow.metadata.input2[j].input_type;
-   model_input_type = 
model->layer[0].glow.metadata.input2[j].model_input_type;
-   qscale = model->layer[0].glow.metadata.input2[j].qscale;
-   }
-
-   if (input_type == model_input_type) {
-   rte_memcpy(lcl_qbuffer, lcl_dbuffer, 
model->layer[0].info.input[i].sz_d);
-   } else {
-   switch 
(model->layer[0].glow.metadata.input1[i].model_input_type) {
-   case RTE_ML_IO_TYPE_INT8:
-   ret = rte_ml_io_float32_to_int8(
-   qscale, 
model->layer[0].info.input[i].nb_elements,
-   lcl_dbuffer, lcl_qbuffer);
-   break;
-   case RTE_ML_IO_TYPE_UINT8:
-   ret = rte_ml_io_float32_to_uint8(
-   qscale, 
model->layer[0].info.input[i].nb_elements,
-   lcl_dbuffer, lcl_qbuffer);
-   break;
-   case RTE_ML_IO_TYPE_INT16:
-   ret = rte_ml_io_float32_to_int16(
-   qscale, 
model->layer[0].info.input[i].nb_elements,
-   lcl_dbuffer, lcl_qbuffer);
-   break;
-   case RTE_ML_IO_TYPE_UINT16:
-   ret = rte_ml_io_float32_to_uint16(
-   qscale, 
model->layer[0].info.input[i].nb_elements,
-   lcl_dbuffer, lcl_qbuffer);
-   break;
-   case RTE_ML_IO_TYPE_FP16:
-   ret = rte_ml_io_float32_to_float16(
-   
model->layer[0].info.input[i].nb_elements, lcl_dbuffer,
-   lcl_qbuffer);
-   break;
-   default:
-   plt_err("Unsupported model_input_type[%u] : 
%u", i,
-   
model->layer[0].glow.metadata.input1[i].model_input_type);
-   ret = -ENOTSUP;
-   }
-   if (ret < 0)
-   return ret;
-   }
-
-   lcl_dbuffer += model->layer[0].info.input[i].sz_d;
-   lcl_qbuffer += model->layer[0].info.input[i].sz_q;
-   }
-
-   return 0;
-}
-
-int
-cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct 
rte_ml_buff_seg **qbuffer,
-  struct rte_ml_buff_seg **dbuffer)
-{
-   struct cnxk_ml_model *model;
-   uint8_t model_output_type;
-   uint8_t *lcl_qbuffer;
-   uint8_t *lcl_dbuffer;
-   

[PATCH v6 13/34] ml/cnxk: update device debug functions

2023-10-18 Thread Srikanth Yalavarthi
Added cnxk wrapper for device dump and selftest debug
functions.

Signed-off-by: Srikanth Yalavarthi 
---
 drivers/ml/cnxk/cn10k_ml_model.c | 118 +
 drivers/ml/cnxk/cn10k_ml_model.h |   1 +
 drivers/ml/cnxk/cn10k_ml_ocm.c   |   8 +-
 drivers/ml/cnxk/cn10k_ml_ocm.h   |   2 +-
 drivers/ml/cnxk/cn10k_ml_ops.c   | 176 ++-
 drivers/ml/cnxk/cn10k_ml_ops.h   |   4 +-
 drivers/ml/cnxk/cnxk_ml_model.c  |  33 ++
 drivers/ml/cnxk/cnxk_ml_model.h  |   2 +
 drivers/ml/cnxk/cnxk_ml_ops.c|  39 ++-
 drivers/ml/cnxk/cnxk_ml_utils.c  |  15 +++
 drivers/ml/cnxk/cnxk_ml_utils.h  |  17 +++
 drivers/ml/cnxk/meson.build  |   1 +
 12 files changed, 235 insertions(+), 181 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_utils.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_utils.h

diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index 69a60b9b90..b765b4ada9 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -11,6 +11,7 @@
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
+#include "cnxk_ml_utils.h"
 
 static enum rte_ml_io_type
 cn10k_ml_io_type_map(uint8_t type)
@@ -596,3 +597,120 @@ cn10k_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, 
struct cnxk_ml_model *mo
 
rte_ml_io_type_size_get(io_info->output[i].qtype);
}
 }
+
+void
+cn10k_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer 
*layer, FILE *fp)
+{
+   struct cn10k_ml_ocm *ocm;
+   char str[STR_LEN];
+   uint8_t i;
+   uint8_t j;
+
+   ocm = &cnxk_mldev->cn10k_mldev.ocm;
+
+   /* Print debug info */
+   cnxk_ml_print_line(fp, LINE_LEN);
+   fprintf(fp, " Layer Information (Layer ID: %u, Name: %s)\n",
+   cnxk_mldev->index_map[layer->index].layer_id, layer->name);
+   cnxk_ml_print_line(fp, LINE_LEN);
+   fprintf(fp, "%*s : %u\n", FIELD_LEN, "index", layer->index);
+   fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", layer->name);
+   fprintf(fp, "%*s : %u.%u.%u.%u\n", FIELD_LEN, "version",
+   layer->glow.metadata.model.version[0], 
layer->glow.metadata.model.version[1],
+   layer->glow.metadata.model.version[2], 
layer->glow.metadata.model.version[3]);
+   fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "layer", 
PLT_U64_CAST(layer));
+   fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", layer->batch_size);
+
+   /* Print model state */
+   if (layer->state == ML_CNXK_LAYER_STATE_LOADED)
+   fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "loaded");
+   if (layer->state == ML_CNXK_LAYER_STATE_JOB_ACTIVE)
+   fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "job_active");
+   if (layer->state == ML_CNXK_LAYER_STATE_STARTED)
+   fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "started");
+
+   /* Print OCM status */
+   fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "wb_size",
+   layer->glow.metadata.model.ocm_wb_range_end -
+   layer->glow.metadata.model.ocm_wb_range_start + 1);
+   fprintf(fp, "%*s : %u\n", FIELD_LEN, "wb_pages", 
layer->glow.ocm_map.wb_pages);
+   fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "scratch_size",
+   ocm->size_per_tile - 
layer->glow.metadata.model.ocm_tmp_range_floor);
+   fprintf(fp, "%*s : %u\n", FIELD_LEN, "scratch_pages", 
layer->glow.ocm_map.scratch_pages);
+   fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_tiles",
+   layer->glow.metadata.model.tile_end - 
layer->glow.metadata.model.tile_start + 1);
+
+   if (layer->state == ML_CNXK_LAYER_STATE_STARTED) {
+   fprintf(fp, "%*s : 0x%0*" PRIx64 "\n", FIELD_LEN, "tilemask",
+   ML_CN10K_OCM_NUMTILES / 4, 
layer->glow.ocm_map.tilemask);
+   fprintf(fp, "%*s : 0x%" PRIx64 "\n", FIELD_LEN, "ocm_wb_start",
+   layer->glow.ocm_map.wb_page_start * ocm->page_size);
+   }
+
+   fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", 
layer->glow.metadata.model.num_input);
+   fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_outputs", 
layer->glow.metadata.model.num_output);
+   fprintf(fp, "\n");
+
+   cnxk_ml_print_line(fp, LINE_LEN);
+   fprintf(fp, "%8s  %16s  %12s  %18s\n", "input", "input_name", 
"input_type",
+   "model_input_type");
+   cnxk_ml_print_line(fp, LINE_LEN);
+   for (i = 0; i < layer->glow.metadata.model.num_input; i++) {
+   if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
+   fprintf(fp, "%8u  ", i);
+   fprintf(fp, "%*s  ", 16, 
layer->glow.metadata.input1[i].input_name);
+   
rte_ml_io_type_to_str(layer->glow.metadata.input1[i].input_type, str,
+ STR_LEN);
+   fprintf(fp, "%*s  ", 12, str);
+   
rt

[PATCH v6 14/34] ml/cnxk: update device stats functions

2023-10-18 Thread Srikanth Yalavarthi
Added cnxk wrapper function to handle ML device stats

Signed-off-by: Srikanth Yalavarthi 
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 32 --
 drivers/ml/cnxk/cn10k_ml_ops.h |  2 --
 drivers/ml/cnxk/cnxk_ml_ops.c  | 36 --
 3 files changed, 34 insertions(+), 36 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 0a3575879f..27d255a830 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -770,38 +770,6 @@ cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev)
return 0;
 }
 
-int
-cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
-{
-   struct cnxk_ml_qp *qp;
-   int qp_id;
-
-   for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
-   qp = dev->data->queue_pairs[qp_id];
-   stats->enqueued_count += qp->stats.enqueued_count;
-   stats->dequeued_count += qp->stats.dequeued_count;
-   stats->enqueue_err_count += qp->stats.enqueue_err_count;
-   stats->dequeue_err_count += qp->stats.dequeue_err_count;
-   }
-
-   return 0;
-}
-
-void
-cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev)
-{
-   struct cnxk_ml_qp *qp;
-   int qp_id;
-
-   for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
-   qp = dev->data->queue_pairs[qp_id];
-   qp->stats.enqueued_count = 0;
-   qp->stats.dequeued_count = 0;
-   qp->stats.enqueue_err_count = 0;
-   qp->stats.dequeue_err_count = 0;
-   }
-}
-
 int
 cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum 
rte_ml_dev_xstats_mode mode,
  int32_t model_id, struct rte_ml_dev_xstats_map 
*xstats_map,
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 5fda98ae88..47e7cb12af 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -298,8 +298,6 @@ int cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp);
 int cn10k_ml_dev_selftest(struct cnxk_ml_dev *cnxk_mldev);
 
-int cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats 
*stats);
-void cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev);
 int cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum 
rte_ml_dev_xstats_mode mode,
  int32_t model_id, struct 
rte_ml_dev_xstats_map *xstats_map,
  uint32_t size);
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 66b88ddae1..c75317d6da 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -489,6 +489,38 @@ cnxk_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, 
uint16_t queue_pair_id,
return 0;
 }
 
+static int
+cnxk_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
+{
+   struct cnxk_ml_qp *qp;
+   int qp_id;
+
+   for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+   qp = dev->data->queue_pairs[qp_id];
+   stats->enqueued_count += qp->stats.enqueued_count;
+   stats->dequeued_count += qp->stats.dequeued_count;
+   stats->enqueue_err_count += qp->stats.enqueue_err_count;
+   stats->dequeue_err_count += qp->stats.dequeue_err_count;
+   }
+
+   return 0;
+}
+
+static void
+cnxk_ml_dev_stats_reset(struct rte_ml_dev *dev)
+{
+   struct cnxk_ml_qp *qp;
+   int qp_id;
+
+   for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+   qp = dev->data->queue_pairs[qp_id];
+   qp->stats.enqueued_count = 0;
+   qp->stats.dequeued_count = 0;
+   qp->stats.enqueue_err_count = 0;
+   qp->stats.dequeue_err_count = 0;
+   }
+}
+
 static int
 cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, 
uint16_t *model_id)
 {
@@ -772,8 +804,8 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
.dev_queue_pair_release = cnxk_ml_dev_queue_pair_release,
 
/* Stats ops */
-   .dev_stats_get = cn10k_ml_dev_stats_get,
-   .dev_stats_reset = cn10k_ml_dev_stats_reset,
+   .dev_stats_get = cnxk_ml_dev_stats_get,
+   .dev_stats_reset = cnxk_ml_dev_stats_reset,
.dev_xstats_names_get = cn10k_ml_dev_xstats_names_get,
.dev_xstats_by_name_get = cn10k_ml_dev_xstats_by_name_get,
.dev_xstats_get = cn10k_ml_dev_xstats_get,
-- 
2.42.0



[PATCH v6 15/34] ml/cnxk: update device and model xstats functions

2023-10-18 Thread Srikanth Yalavarthi
Added cnxk wrapper function to handle ML device and model
extended stats. Handling resources for the xstats is done
in the cnxk layer. Introduced internal xstats group.

Signed-off-by: Srikanth Yalavarthi 
---
 drivers/ml/cnxk/cn10k_ml_dev.h   |   4 -
 drivers/ml/cnxk/cn10k_ml_ops.c   | 531 +++
 drivers/ml/cnxk/cn10k_ml_ops.h   |  16 +-
 drivers/ml/cnxk/cnxk_ml_dev.h|   5 +
 drivers/ml/cnxk/cnxk_ml_ops.c| 481 +++-
 drivers/ml/cnxk/cnxk_ml_xstats.h |  21 +-
 6 files changed, 551 insertions(+), 507 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index be989e0a20..bde9d08901 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -10,7 +10,6 @@
 #include "cn10k_ml_ocm.h"
 
 #include "cnxk_ml_io.h"
-#include "cnxk_ml_xstats.h"
 
 /* Dummy Device ops */
 extern struct rte_ml_dev_ops ml_dev_dummy_ops;
@@ -133,9 +132,6 @@ struct cn10k_ml_dev {
/* OCM info */
struct cn10k_ml_ocm ocm;
 
-   /* Extended stats data */
-   struct cnxk_ml_xstats xstats;
-
/* Enable / disable model data caching */
int cache_model_data;
 
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 27d255a830..776ad60401 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -198,107 +198,21 @@ cn10k_ml_prep_fp_job_descriptor(struct cnxk_ml_dev 
*cnxk_mldev, struct cnxk_ml_r
req->cn10k_req.jd.model_run.num_batches = op->nb_batches;
 }
 
-static int
-cn10k_ml_xstats_init(struct rte_ml_dev *dev)
-{
-   struct cn10k_ml_dev *cn10k_mldev;
-   struct cnxk_ml_dev *cnxk_mldev;
-   uint16_t nb_stats;
-   uint16_t stat_id;
-   uint16_t model;
-   uint16_t i;
-
-   cnxk_mldev = dev->data->dev_private;
-   cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-
-   /* Allocate memory for xstats entries. Don't allocate during 
reconfigure */
-   nb_stats = RTE_DIM(device_xstats) + ML_CNXK_MAX_MODELS * 
RTE_DIM(layer_xstats);
-   if (cn10k_mldev->xstats.entries == NULL)
-   cn10k_mldev->xstats.entries = rte_zmalloc(
-   "cn10k_ml_xstats", sizeof(struct cnxk_ml_xstats_entry) 
* nb_stats,
-   PLT_CACHE_LINE_SIZE);
-
-   if (cn10k_mldev->xstats.entries == NULL)
-   return -ENOMEM;
-
-   /* Initialize device xstats */
-   stat_id = 0;
-   for (i = 0; i < RTE_DIM(device_xstats); i++) {
-   cn10k_mldev->xstats.entries[stat_id].map.id = stat_id;
-   snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
-sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), 
"%s",
-device_xstats[i].name);
-
-   cn10k_mldev->xstats.entries[stat_id].mode = 
RTE_ML_DEV_XSTATS_DEVICE;
-   cn10k_mldev->xstats.entries[stat_id].type = 
device_xstats[i].type;
-   cn10k_mldev->xstats.entries[stat_id].fn_id = 
CNXK_ML_XSTATS_FN_DEVICE;
-   cn10k_mldev->xstats.entries[stat_id].obj_idx = 0;
-   cn10k_mldev->xstats.entries[stat_id].reset_allowed = 
device_xstats[i].reset_allowed;
-   stat_id++;
-   }
-   cn10k_mldev->xstats.count_mode_device = stat_id;
-
-   /* Initialize model xstats */
-   for (model = 0; model < ML_CNXK_MAX_MODELS; model++) {
-   cn10k_mldev->xstats.offset_for_model[model] = stat_id;
-
-   for (i = 0; i < RTE_DIM(layer_xstats); i++) {
-   cn10k_mldev->xstats.entries[stat_id].map.id = stat_id;
-   cn10k_mldev->xstats.entries[stat_id].mode = 
RTE_ML_DEV_XSTATS_MODEL;
-   cn10k_mldev->xstats.entries[stat_id].type = 
layer_xstats[i].type;
-   cn10k_mldev->xstats.entries[stat_id].fn_id = 
CNXK_ML_XSTATS_FN_MODEL;
-   cn10k_mldev->xstats.entries[stat_id].obj_idx = model;
-   cn10k_mldev->xstats.entries[stat_id].reset_allowed =
-   layer_xstats[i].reset_allowed;
-
-   /* Name of xstat is updated during model load */
-   snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
-
sizeof(cn10k_mldev->xstats.entries[stat_id].map.name),
-"Model-%u-%s", model, layer_xstats[i].name);
-
-   stat_id++;
-   }
-
-   cn10k_mldev->xstats.count_per_model[model] = 
RTE_DIM(layer_xstats);
-   }
-
-   cn10k_mldev->xstats.count_mode_model = stat_id - 
cn10k_mldev->xstats.count_mode_device;
-   cn10k_mldev->xstats.count = stat_id;
-
-   return 0;
-}
-
 static void
-cn10k_ml_xstats_uninit(struct rte_ml_dev *dev)
+cn10k_ml_xstats_layer_name_update(struct cnxk_ml_dev *cnxk_mldev, uint16_t 
model_id,
+ uint16_t layer_id)
 {
-   struct cn10k_

[PATCH v6 16/34] ml/cnxk: update fast path functions

2023-10-18 Thread Srikanth Yalavarthi
Implemented cnxk layer fast-path functions and added support
for model specific fast-path functions. CNXK layer functions
would invoke model specific fast-path functions.

Added support for model specific poll handling functions and
updated internal inference sync function. Drop use of rte_ml_op
as argument. Updated function arguments to enable the function
to be used as callback by TVM HW runtime.

Signed-off-by: Srikanth Yalavarthi 
---
 drivers/ml/cnxk/cn10k_ml_dev.h  |   5 -
 drivers/ml/cnxk/cn10k_ml_ops.c  | 241 
 drivers/ml/cnxk/cn10k_ml_ops.h  |  13 +-
 drivers/ml/cnxk/cnxk_ml_model.h |  14 ++
 drivers/ml/cnxk/cnxk_ml_ops.c   | 128 +
 drivers/ml/cnxk/cnxk_ml_ops.h   |   7 +
 6 files changed, 216 insertions(+), 192 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index bde9d08901..94a94d996f 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -143,11 +143,6 @@ struct cn10k_ml_dev {
 
/* JCMD enqueue function handler */
bool (*ml_jcmdq_enqueue)(struct roc_ml *roc_ml, struct ml_job_cmd_s 
*job_cmd);
-
-   /* Poll handling function pointers */
-   void (*set_poll_addr)(struct cnxk_ml_req *req);
-   void (*set_poll_ptr)(struct cnxk_ml_req *req);
-   uint64_t (*get_poll_ptr)(struct cnxk_ml_req *req);
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 776ad60401..8116c8dedb 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -65,24 +65,12 @@ static const struct cn10k_ml_stype_db_driver {
{ML_DRIVER_ERR_FW_ERROR, "UNKNOWN FIRMWARE ERROR"},
 };
 
-static inline void
+__rte_hot void
 cn10k_ml_set_poll_addr(struct cnxk_ml_req *req)
 {
req->status = &req->cn10k_req.status;
 }
 
-static inline void
-cn10k_ml_set_poll_ptr(struct cnxk_ml_req *req)
-{
-   plt_write64(ML_CNXK_POLL_JOB_START, req->status);
-}
-
-static inline uint64_t
-cn10k_ml_get_poll_ptr(struct cnxk_ml_req *req)
-{
-   return plt_read64(req->status);
-}
-
 void
 cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp)
 {
@@ -177,7 +165,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cnxk_ml_dev 
*cnxk_mldev, struct cnxk_ml_l
 
 static __rte_always_inline void
 cn10k_ml_prep_fp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct 
cnxk_ml_req *req,
-   struct rte_ml_op *op)
+   uint16_t index, void *input, void *output, 
uint16_t nb_batches)
 {
struct cn10k_ml_dev *cn10k_mldev;
 
@@ -185,17 +173,17 @@ cn10k_ml_prep_fp_job_descriptor(struct cnxk_ml_dev 
*cnxk_mldev, struct cnxk_ml_r
 
req->cn10k_req.jd.hdr.jce.w0.u64 = 0;
req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(req->status);
-   req->cn10k_req.jd.hdr.model_id = op->model_id;
+   req->cn10k_req.jd.hdr.model_id = index;
req->cn10k_req.jd.hdr.job_type = ML_CN10K_JOB_TYPE_MODEL_RUN;
req->cn10k_req.jd.hdr.fp_flags = ML_FLAGS_POLL_COMPL;
req->cn10k_req.jd.hdr.sp_flags = 0x0;
req->cn10k_req.jd.hdr.result =
roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.result);
req->cn10k_req.jd.model_run.input_ddr_addr =
-   PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, 
op->input[0]->addr));
+   PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, input));
req->cn10k_req.jd.model_run.output_ddr_addr =
-   PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, 
op->output[0]->addr));
-   req->cn10k_req.jd.model_run.num_batches = op->nb_batches;
+   PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, output));
+   req->cn10k_req.jd.model_run.num_batches = nb_batches;
 }
 
 static void
@@ -311,30 +299,15 @@ cn10k_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, 
struct cnxk_ml_layer *l
 static int
 cn10k_ml_cache_model_data(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer 
*layer)
 {
-   struct rte_ml_buff_seg seg[2];
-   struct rte_ml_buff_seg *inp;
-   struct rte_ml_buff_seg *out;
-   struct rte_ml_op op;
-
char str[RTE_MEMZONE_NAMESIZE];
const struct plt_memzone *mz;
uint64_t isize = 0;
uint64_t osize = 0;
int ret = 0;
-   uint32_t i;
-
-   inp = &seg[0];
-   out = &seg[1];
 
/* Create input and output buffers. */
-   for (i = 0; i < layer->info.nb_inputs; i++)
-   isize += layer->info.input[i].sz_q;
-
-   for (i = 0; i < layer->info.nb_outputs; i++)
-   osize += layer->info.output[i].sz_q;
-
-   isize = layer->batch_size * isize;
-   osize = layer->batch_size * osize;
+   isize = layer->info.total_input_sz_q;
+   osize = layer->info.total_output_sz_q;
 
snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", "ml_dummy_io", 
layer->index);
mz = plt_m

[PATCH v6 17/34] ml/cnxk: move error handling to cnxk layer

2023-10-18 Thread Srikanth Yalavarthi
Move error type structures to cnxk layer. cn10k layer to
handle fw and hw error sub-types only.

Signed-off-by: Srikanth Yalavarthi 
---
 drivers/ml/cnxk/cn10k_ml_dev.h | 41 ++-
 drivers/ml/cnxk/cn10k_ml_ops.c | 93 +-
 drivers/ml/cnxk/cnxk_ml_dev.c  |  8 +++
 drivers/ml/cnxk/cnxk_ml_dev.h  | 18 +++
 drivers/ml/cnxk/cnxk_ml_ops.c  |  2 +-
 5 files changed, 78 insertions(+), 84 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 94a94d996f..2e7eb6c9ef 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -52,38 +52,27 @@ struct cnxk_ml_dev;
 struct cnxk_ml_req;
 struct cnxk_ml_qp;
 
-/* Error types enumeration */
-enum cn10k_ml_error_etype {
-   /* 0x0 */ ML_ETYPE_NO_ERROR = 0, /* No error */
-   /* 0x1 */ ML_ETYPE_FW_NONFATAL,  /* Firmware non-fatal error */
-   /* 0x2 */ ML_ETYPE_HW_NONFATAL,  /* Hardware non-fatal error */
-   /* 0x3 */ ML_ETYPE_HW_FATAL, /* Hardware fatal error */
-   /* 0x4 */ ML_ETYPE_HW_WARNING,   /* Hardware warning */
-   /* 0x5 */ ML_ETYPE_DRIVER,   /* Driver specific error */
-   /* 0x6 */ ML_ETYPE_UNKNOWN,  /* Unknown error */
-};
-
 /* Firmware non-fatal error sub-type */
 enum cn10k_ml_error_stype_fw_nf {
-   /* 0x0 */ ML_FW_ERR_NOERR = 0,   /* No error */
-   /* 0x1 */ ML_FW_ERR_UNLOAD_ID_NOT_FOUND, /* Model ID not found during 
load */
-   /* 0x2 */ ML_FW_ERR_LOAD_LUT_OVERFLOW,   /* Lookup table overflow at 
load */
-   /* 0x3 */ ML_FW_ERR_ID_IN_USE,   /* Model ID already in use */
-   /* 0x4 */ ML_FW_ERR_INVALID_TILEMASK,/* Invalid OCM tilemask */
-   /* 0x5 */ ML_FW_ERR_RUN_LUT_OVERFLOW,/* Lookup table overflow at 
run */
-   /* 0x6 */ ML_FW_ERR_RUN_ID_NOT_FOUND,/* Model ID not found during 
run */
-   /* 0x7 */ ML_FW_ERR_COMMAND_NOTSUP,  /* Unsupported command */
-   /* 0x8 */ ML_FW_ERR_DDR_ADDR_RANGE,  /* DDR address out of range */
-   /* 0x9 */ ML_FW_ERR_NUM_BATCHES_INVALID, /* Invalid number of batches */
-   /* 0xA */ ML_FW_ERR_INSSYNC_TIMEOUT, /* INS sync timeout */
+   /* 0x0 */ ML_CN10K_FW_ERR_NOERR = 0,   /* No error */
+   /* 0x1 */ ML_CN10K_FW_ERR_UNLOAD_ID_NOT_FOUND, /* Model ID not found 
during load */
+   /* 0x2 */ ML_CN10K_FW_ERR_LOAD_LUT_OVERFLOW,   /* Lookup table overflow 
at load */
+   /* 0x3 */ ML_CN10K_FW_ERR_ID_IN_USE,   /* Model ID already in 
use */
+   /* 0x4 */ ML_CN10K_FW_ERR_INVALID_TILEMASK,/* Invalid OCM tilemask 
*/
+   /* 0x5 */ ML_CN10K_FW_ERR_RUN_LUT_OVERFLOW,/* Lookup table overflow 
at run */
+   /* 0x6 */ ML_CN10K_FW_ERR_RUN_ID_NOT_FOUND,/* Model ID not found 
during run */
+   /* 0x7 */ ML_CN10K_FW_ERR_COMMAND_NOTSUP,  /* Unsupported command */
+   /* 0x8 */ ML_CN10K_FW_ERR_DDR_ADDR_RANGE,  /* DDR address out of 
range */
+   /* 0x9 */ ML_CN10K_FW_ERR_NUM_BATCHES_INVALID, /* Invalid number of 
batches */
+   /* 0xA */ ML_CN10K_FW_ERR_INSSYNC_TIMEOUT, /* INS sync timeout */
 };
 
 /* Driver error sub-type */
 enum cn10k_ml_error_stype_driver {
-   /* 0x0 */ ML_DRIVER_ERR_NOERR = 0, /* No error */
-   /* 0x1 */ ML_DRIVER_ERR_UNKNOWN,   /* Unable to determine error 
sub-type */
-   /* 0x2 */ ML_DRIVER_ERR_EXCEPTION, /* Firmware exception */
-   /* 0x3 */ ML_DRIVER_ERR_FW_ERROR,  /* Unknown firmware error */
+   /* 0x0 */ ML_CN10K_DRIVER_ERR_NOERR = 0, /* No error */
+   /* 0x1 */ ML_CN10K_DRIVER_ERR_UNKNOWN,   /* Unable to determine error 
sub-type */
+   /* 0x2 */ ML_CN10K_DRIVER_ERR_EXCEPTION, /* Firmware exception */
+   /* 0x3 */ ML_CN10K_DRIVER_ERR_FW_ERROR,  /* Unknown firmware error */
 };
 
 /* Error structure */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 8116c8dedb..65eaaf030d 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -22,47 +22,27 @@
 #define ML_FLAGS_POLL_COMPL BIT(0)
 #define ML_FLAGS_SSO_COMPL  BIT(1)
 
-/* Error message length */
-#define ERRMSG_LEN 32
-
-/* Error type database */
-static const struct cn10k_ml_etype_db {
-   enum cn10k_ml_error_etype etype;
-   char name[ERRMSG_LEN];
-} ml_etype_db[] = {
-   {ML_ETYPE_NO_ERROR, "NO_ERROR"},{ML_ETYPE_FW_NONFATAL, 
"FW_NON_FATAL"},
-   {ML_ETYPE_HW_NONFATAL, "HW_NON_FATAL"}, {ML_ETYPE_HW_FATAL, "HW_FATAL"},
-   {ML_ETYPE_HW_WARNING, "HW_WARNING"},{ML_ETYPE_DRIVER, 
"DRIVER_ERROR"},
-   {ML_ETYPE_UNKNOWN, "UNKNOWN_ERROR"},
-};
-
 /* Hardware non-fatal error subtype database */
-static const struct cn10k_ml_stype_db_hw_nf {
-   enum cn10k_ml_error_stype_fw_nf stype;
-   char msg[ERRMSG_LEN];
-} ml_stype_db_hw_nf[] = {
-   {ML_FW_ERR_NOERR, "NO ERROR"},
-   {ML_FW_ERR_UNLOAD_ID_NOT_FOUND, "UNLOAD MODEL ID NOT FOUND"},
-   {ML_FW_ERR_LOAD_LUT_OVERFLOW, "LOAD LUT OVERFLOW"},
-   {ML_F

[PATCH v6 19/34] ml/cnxk: add structures to support TVM model type

2023-10-18 Thread Srikanth Yalavarthi
Introduced model type, sub-type and layer type. Added
internal structures for TVM model objects.

Signed-off-by: Srikanth Yalavarthi 
---
 drivers/ml/cnxk/cn10k_ml_ocm.c   |  3 ++
 drivers/ml/cnxk/cn10k_ml_ops.c   |  6 ++-
 drivers/ml/cnxk/cnxk_ml_model.h  | 66 +++-
 drivers/ml/cnxk/cnxk_ml_ops.c| 52 -
 drivers/ml/cnxk/meson.build  |  1 +
 drivers/ml/cnxk/mvtvm_ml_model.h | 46 ++
 6 files changed, 161 insertions(+), 13 deletions(-)
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_model.h

diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index dc315cce10..749ddeb344 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -435,6 +435,9 @@ cn10k_ml_ocm_free_pages(struct cnxk_ml_dev *cnxk_mldev, 
uint16_t model_id, uint1
 
for (j = 0; j < local_model->nb_layers; j++) {
local_layer = &local_model->layer[j];
+   if (local_layer->type != 
ML_CNXK_LAYER_TYPE_MRVL)
+   continue;
+
if (local_layer != layer &&
local_layer->glow.ocm_map.ocm_reserved) {
if 
(IS_BIT_SET(local_layer->glow.ocm_map.tilemask, tile_id))
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 65eaaf030d..a471e98fbf 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -725,6 +725,9 @@ cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct 
rte_ml_model_params *
if (ret != 0)
return ret;
 
+   /* Set model sub type */
+   model->subtype = ML_CNXK_MODEL_SUBTYPE_GLOW_MRVL;
+
/* Copy metadata to internal buffer */
rte_memcpy(&model->glow.metadata, params->addr, sizeof(struct 
cn10k_ml_model_metadata));
cn10k_ml_model_metadata_update(&model->glow.metadata);
@@ -746,6 +749,7 @@ cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct 
rte_ml_model_params *
 
/* Load layer and get the index */
layer = &model->layer[0];
+   layer->type = ML_CNXK_LAYER_TYPE_MRVL;
ret = cn10k_ml_layer_load(cnxk_mldev, model->model_id, NULL, 
params->addr, params->size,
  &layer->index);
if (ret != 0) {
@@ -969,7 +973,7 @@ cn10k_ml_layer_start(void *device, uint16_t model_id, const 
char *layer_name)
if (ret < 0) {
cn10k_ml_layer_stop(device, model_id, layer_name);
} else {
-   if (cn10k_mldev->cache_model_data)
+   if (cn10k_mldev->cache_model_data && model->type == 
ML_CNXK_MODEL_TYPE_GLOW)
ret = cn10k_ml_cache_model_data(cnxk_mldev, layer);
}
 
diff --git a/drivers/ml/cnxk/cnxk_ml_model.h b/drivers/ml/cnxk/cnxk_ml_model.h
index f618e5aa5f..f100eca203 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.h
+++ b/drivers/ml/cnxk/cnxk_ml_model.h
@@ -11,6 +11,10 @@
 
 #include "cn10k_ml_model.h"
 
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+#include "mvtvm_ml_model.h"
+#endif
+
 #include "cnxk_ml_io.h"
 
 struct cnxk_ml_dev;
@@ -18,6 +22,48 @@ struct cnxk_ml_model;
 struct cnxk_ml_qp;
 struct cnxk_ml_req;
 
+/* Model type */
+enum cnxk_ml_model_type {
+   /* Unknown model type */
+   ML_CNXK_MODEL_TYPE_UNKNOWN,
+
+   /* Invalid model type */
+   ML_CNXK_MODEL_TYPE_INVALID,
+
+   /* Glow compiled model, for MLIP target */
+   ML_CNXK_MODEL_TYPE_GLOW,
+
+   /* TVM compiled model, for ARM64 / ARM64 + MLIP target */
+   ML_CNXK_MODEL_TYPE_TVM,
+};
+
+/* Model subtype */
+enum cnxk_ml_model_subtype {
+   /* Marvell Glow model */
+   ML_CNXK_MODEL_SUBTYPE_GLOW_MRVL,
+
+   /* TVM model with single MRVL region */
+   ML_CNXK_MODEL_SUBTYPE_TVM_MRVL,
+
+   /* TVM model with LLVM regions only */
+   ML_CNXK_MODEL_SUBTYPE_TVM_LLVM,
+
+   /* TVM hybrid model, with both MRVL and LLVM regions or (> 1) MRVL 
regions*/
+   ML_CNXK_MODEL_SUBTYPE_TVM_HYBRID,
+};
+
+/* Layer type */
+enum cnxk_ml_layer_type {
+   /* MRVL layer, for MLIP target*/
+   ML_CNXK_LAYER_TYPE_UNKNOWN = 0,
+
+   /* MRVL layer, for MLIP target*/
+   ML_CNXK_LAYER_TYPE_MRVL,
+
+   /* LLVM layer, for ARM64 target*/
+   ML_CNXK_LAYER_TYPE_LLVM,
+};
+
 /* Model state */
 enum cnxk_ml_model_state {
/* Unknown state */
@@ -53,6 +99,9 @@ struct cnxk_ml_layer {
/* Name*/
char name[RTE_ML_STR_MAX];
 
+   /* Type */
+   enum cnxk_ml_layer_type type;
+
/* Model handle */
struct cnxk_ml_model *model;
 
@@ -83,14 +132,27 @@ struct cnxk_ml_model {
/* Device reference */
struct cnxk_ml_dev *cnxk_mldev;
 
+   /* Type */
+   enum cnxk_ml_model_type type;
+
+   /* Model subtype */
+   enum cnxk_ml_model_subtype subtype;
+
/* ID */
uint16_t model_id;

[PATCH v6 18/34] ml/cnxk: support config and close of tvmdp library

2023-10-18 Thread Srikanth Yalavarthi
Added support to configure and close TVMDP library based
on ML device configuration options.

Updated meson build to enable Jansson, TVM runtime, TVMDP
library as build dependencies.

Signed-off-by: Srikanth Yalavarthi 
---
 doc/guides/mldevs/cnxk.rst   | 78 
 drivers/ml/cnxk/cnxk_ml_ops.c|  7 +++
 drivers/ml/cnxk/cnxk_ml_ops.h|  6 +++
 drivers/ml/cnxk/meson.build  | 59 
 drivers/ml/cnxk/mvtvm_ml_ops.c   | 41 +
 drivers/ml/cnxk/mvtvm_ml_ops.h   | 19 
 drivers/ml/cnxk/mvtvm_ml_stubs.c | 26 +++
 drivers/ml/cnxk/mvtvm_ml_stubs.h | 15 ++
 8 files changed, 251 insertions(+)
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_ops.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_ops.h
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_stubs.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_stubs.h

diff --git a/doc/guides/mldevs/cnxk.rst b/doc/guides/mldevs/cnxk.rst
index 1834b1f905..ef2b5d4581 100644
--- a/doc/guides/mldevs/cnxk.rst
+++ b/doc/guides/mldevs/cnxk.rst
@@ -46,6 +46,84 @@ or cross-compiled on an x86 platform.
 
 Refer to :doc:`../platform/cnxk` for instructions to build your DPDK 
application.
 
+Compilation Prerequisites
+-
+
+This driver requires external libraries to optionally enable support for
+models compiled using Apache TVM framework. The following dependencies are
+not part of DPDK and must be installed separately:
+
+- **Jansson**
+
+  This library enables support to parse and read JSON files.
+
+- **DLPack**
+
+  This library provides headers for open in-memory tensor structures.
+
+.. note::
+
+DPDK CNXK ML driver requires DLPack version 0.7
+
+.. code-block:: console
+
+git clone https://github.com/dmlc/dlpack.git
+cd dlpack
+git checkout v0.7 -b v0.7
+cmake -S ./ -B build \
+  -DCMAKE_C_COMPILER=aarch64-linux-gnu-gcc \
+  -DCMAKE_CXX_COMPILER=aarch64-linux-gnu-g++ \
+  -DBUILD_MOCK=OFF
+make -C build
+make -C build install
+
+- **TVM**
+
+  Apache TVM provides a runtime library (libtvm_runtime) used to execute
+  models on CPU cores or hardware accelerators.
+
+.. note::
+
+DPDK CNXK ML driver requires TVM version 0.10.0
+
+.. code-block:: console
+
+git clone https://github.com/apache/tvm.git
+cd tvm
+git checkout v0.10.0 -b v0.10.0
+cmake -S ./ -B build \
+  -DCMAKE_C_COMPILER=aarch64-linux-gnu-gcc \
+  -DCMAKE_CXX_COMPILER=aarch64-linux-gnu-g++ \
+  -DMACHINE_NAME=aarch64-linux-gnu \
+  -DCMAKE_FIND_ROOT_PATH_MODE_PROGRAM=NEVER \
+  -DCMAKE_FIND_ROOT_PATH_MODE_LIBRARY=ONLY
+make -C build
+make -C build install
+
+- **TVMDP**
+
+  Marvell's `TVM Dataplane Library 
`_
+  works as an interface between TVM runtime and DPDK drivers. TVMDP library
+  provides a simplified C interface for TVM's runtime based on C++.
+
+.. code-block:: console
+
+git clone https://github.com/MarvellEmbeddedProcessors/tvmdp.git
+cd tvmdp
+git checkout main
+cmake -S ./ -B build \
+  -DCMAKE_TOOLCHAIN_FILE=config/toolchains/arm64_linux_gcc.cmake \
+  -DBUILD_SHARED_LIBS=ON \
+  -DBUILD_TESTING=OFF
+make -C build
+make -C build install
+
+- **libarchive**
+
+  Apached TVM framework generates compiled models as tar archives. This
+  library enables support to decompress and read archive files in tar,
+  xz and other formats.
+
 
 Initialization
 --
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 8339f8342b..c3639320a5 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -564,6 +564,10 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct 
rte_ml_dev_config *co
goto error;
}
 
+   ret = mvtvm_ml_dev_configure(cnxk_mldev, conf);
+   if (ret != 0)
+   goto error;
+
/* Set device capabilities */
cnxk_mldev->max_nb_layers =

cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_models;
@@ -624,6 +628,9 @@ cnxk_ml_dev_close(struct rte_ml_dev *dev)
/* Un-initialize xstats */
cnxk_ml_xstats_uninit(cnxk_mldev);
 
+   if (mvtvm_ml_dev_close(cnxk_mldev) != 0)
+   plt_err("Failed to close MVTVM ML Device");
+
if (cn10k_ml_dev_close(cnxk_mldev) != 0)
plt_err("Failed to close CN10K ML Device");
 
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index d0c126f34b..b22a2b0d95 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -12,6 +12,12 @@
 
 #include "cn10k_ml_ops.h"
 
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+#include "mvtvm_ml_ops.h"
+#else
+#include "mvtvm_ml_stubs.h"
+#endif
+
 /* Request structure */
 struct cnxk_ml_req {
/* Device specific request */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 5d27a87d91..607e1c72e9 100644
-

[PATCH v6 20/34] ml/cnxk: add support for identify model type

2023-10-18 Thread Srikanth Yalavarthi
Enable support to parse model buffer to identify the
model type and model sub-type. Enabled basic checks
for Glow model type buffer.

Signed-off-by: Srikanth Yalavarthi 
Signed-off-by: Anup Prabhu 
---
 drivers/ml/cnxk/cnxk_ml_model.c  | 49 
 drivers/ml/cnxk/cnxk_ml_model.h  |  3 ++
 drivers/ml/cnxk/cnxk_ml_ops.c|  8 +
 drivers/ml/cnxk/meson.build  |  6 
 drivers/ml/cnxk/mvtvm_ml_model.c | 55 
 drivers/ml/cnxk/mvtvm_ml_model.h |  2 ++
 drivers/ml/cnxk/mvtvm_ml_stubs.c |  9 ++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |  1 +
 8 files changed, 133 insertions(+)
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_model.c

diff --git a/drivers/ml/cnxk/cnxk_ml_model.c b/drivers/ml/cnxk/cnxk_ml_model.c
index b069d4e3a5..02f80410ec 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.c
+++ b/drivers/ml/cnxk/cnxk_ml_model.c
@@ -2,11 +2,60 @@
  * Copyright (c) 2023 Marvell.
  */
 
+#include 
 #include 
 
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_utils.h"
 
+enum cnxk_ml_model_type
+cnxk_ml_model_get_type(struct rte_ml_model_params *params)
+{
+   struct cn10k_ml_model_metadata_header *metadata_header;
+   enum cnxk_ml_model_type type;
+   uint32_t payload_crc32c;
+   uint32_t header_crc32c;
+
+   type = mvtvm_ml_model_type_get(params);
+   if (type == ML_CNXK_MODEL_TYPE_TVM)
+   return ML_CNXK_MODEL_TYPE_TVM;
+   else if (type == ML_CNXK_MODEL_TYPE_INVALID)
+   return ML_CNXK_MODEL_TYPE_INVALID;
+
+   /* Check model magic string */
+   metadata_header = (struct cn10k_ml_model_metadata_header *)params->addr;
+   if (strncmp((char *)metadata_header->magic, MRVL_ML_MODEL_MAGIC_STRING, 
4) != 0) {
+   plt_err("Invalid Glow model, magic = %s", 
metadata_header->magic);
+   return ML_CNXK_MODEL_TYPE_INVALID;
+   }
+
+   /* Header CRC check */
+   if (metadata_header->header_crc32c != 0) {
+   header_crc32c = rte_hash_crc(
+   params->addr,
+   sizeof(struct cn10k_ml_model_metadata_header) - 
sizeof(uint32_t), 0);
+
+   if (header_crc32c != metadata_header->header_crc32c) {
+   plt_err("Invalid Glow model, Header CRC mismatch");
+   return ML_CNXK_MODEL_TYPE_INVALID;
+   }
+   }
+
+   /* Payload CRC check */
+   if (metadata_header->payload_crc32c != 0) {
+   payload_crc32c = rte_hash_crc(
+   PLT_PTR_ADD(params->addr, sizeof(struct 
cn10k_ml_model_metadata_header)),
+   params->size - sizeof(struct 
cn10k_ml_model_metadata_header), 0);
+
+   if (payload_crc32c != metadata_header->payload_crc32c) {
+   plt_err("Invalid Glow model, Payload CRC mismatch");
+   return ML_CNXK_MODEL_TYPE_INVALID;
+   }
+   }
+
+   return ML_CNXK_MODEL_TYPE_GLOW;
+}
+
 void
 cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model 
*model, FILE *fp)
 {
diff --git a/drivers/ml/cnxk/cnxk_ml_model.h b/drivers/ml/cnxk/cnxk_ml_model.h
index f100eca203..a2fced46a2 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.h
+++ b/drivers/ml/cnxk/cnxk_ml_model.h
@@ -13,6 +13,8 @@
 
 #ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
 #include "mvtvm_ml_model.h"
+#else
+#include "mvtvm_ml_stubs.h"
 #endif
 
 #include "cnxk_ml_io.h"
@@ -184,6 +186,7 @@ struct cnxk_ml_model {
set_poll_addr_t set_poll_addr;
 };
 
+enum cnxk_ml_model_type cnxk_ml_model_get_type(struct rte_ml_model_params 
*params);
 void cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model 
*model, FILE *fp);
 
 #endif /* _CNXK_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index ea6f59a70f..c140408023 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1018,6 +1018,7 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct 
rte_ml_model_params *params, u
 {
struct rte_ml_dev_info dev_info;
struct cnxk_ml_dev *cnxk_mldev;
+   enum cnxk_ml_model_type type;
struct cnxk_ml_model *model;
 
char str[RTE_MEMZONE_NAMESIZE];
@@ -1033,6 +1034,12 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct 
rte_ml_model_params *params, u
 
cnxk_mldev = dev->data->dev_private;
 
+   type = cnxk_ml_model_get_type(params);
+   if (type == ML_CNXK_MODEL_TYPE_INVALID) {
+   plt_err("Invalid / unsupported model type");
+   return -EINVAL;
+   }
+
/* Find model ID */
found = false;
for (lcl_model_id = 0; lcl_model_id < dev->data->nb_models; 
lcl_model_id++) {
@@ -1066,6 +1073,7 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct 
rte_ml_model_params *params, u
 
model = mz->addr;
model->cnxk_mldev = cnxk_mldev;
+   model->type = type;
model->model_id = lcl_model_id;
model->info = PLT_PTR_ADD(

[PATCH v6 21/34] ml/cnxk: add support to parse TVM model objects

2023-10-18 Thread Srikanth Yalavarthi
Added support to parse TVM model objects from the model
archive buffer. Added support to check for all expected
objects and copy TVM model objects to internal buffers.

Signed-off-by: Srikanth Yalavarthi 
Signed-off-by: Anup Prabhu 
---
 drivers/ml/cnxk/cnxk_ml_ops.c|  5 ++-
 drivers/ml/cnxk/mvtvm_ml_model.c | 57 +
 drivers/ml/cnxk/mvtvm_ml_model.h |  2 ++
 drivers/ml/cnxk/mvtvm_ml_ops.c   | 62 
 drivers/ml/cnxk/mvtvm_ml_ops.h   |  3 ++
 drivers/ml/cnxk/mvtvm_ml_stubs.c | 11 ++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |  3 ++
 7 files changed, 142 insertions(+), 1 deletion(-)

diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index c140408023..b18271545d 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1079,7 +1079,10 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct 
rte_ml_model_params *params, u
model, PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), 
dev_info.align_size));
dev->data->models[lcl_model_id] = model;
 
-   ret = cn10k_ml_model_load(cnxk_mldev, params, model);
+   if (type == ML_CNXK_MODEL_TYPE_GLOW)
+   ret = cn10k_ml_model_load(cnxk_mldev, params, model);
+   else
+   ret = mvtvm_ml_model_load(cnxk_mldev, params, model);
if (ret != 0)
goto error;
 
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
index ab5f8baa67..4c9a080c05 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.c
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -53,3 +53,60 @@ mvtvm_ml_model_type_get(struct rte_ml_model_params *params)
 
return ML_CNXK_MODEL_TYPE_TVM;
 }
+
+int
+mvtvm_ml_model_blob_parse(struct rte_ml_model_params *params, struct 
mvtvm_ml_model_object *object)
+{
+   bool object_found[ML_MVTVM_MODEL_OBJECT_MAX] = {false, false, false};
+   struct archive_entry *entry;
+   struct archive *a;
+   uint8_t i;
+   int ret;
+
+   /* Open archive */
+   a = archive_read_new();
+   archive_read_support_filter_all(a);
+   archive_read_support_format_all(a);
+
+   ret = archive_read_open_memory(a, params->addr, params->size);
+   if (ret != ARCHIVE_OK)
+   return archive_errno(a);
+
+   /* Read archive */
+   while (archive_read_next_header(a, &entry) == ARCHIVE_OK) {
+   for (i = 0; i < ML_MVTVM_MODEL_OBJECT_MAX; i++) {
+   if (!object_found[i] &&
+   (strcmp(archive_entry_pathname(entry), 
mvtvm_object_list[i]) == 0)) {
+   memcpy(object[i].name, mvtvm_object_list[i], 
RTE_ML_STR_MAX);
+   object[i].size = archive_entry_size(entry);
+   object[i].buffer = rte_malloc(NULL, 
object[i].size, 0);
+
+   if (archive_read_data(a, object[i].buffer, 
object[i].size) !=
+   object[i].size) {
+   plt_err("Failed to read object from 
model archive: %s",
+   object[i].name);
+   goto error;
+   }
+   object_found[i] = true;
+   }
+   }
+   archive_read_data_skip(a);
+   }
+
+   /* Check if all objects are parsed */
+   for (i = 0; i < ML_MVTVM_MODEL_OBJECT_MAX; i++) {
+   if (!object_found[i]) {
+   plt_err("Object %s not found in archive!\n", 
mvtvm_object_list[i]);
+   goto error;
+   }
+   }
+   return 0;
+
+error:
+   for (i = 0; i < ML_MVTVM_MODEL_OBJECT_MAX; i++) {
+   if (object[i].buffer != NULL)
+   rte_free(object[i].buffer);
+   }
+
+   return -EINVAL;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index b6162fceec..b11b66f495 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -44,5 +44,7 @@ struct mvtvm_ml_model_data {
 };
 
 enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params 
*params);
+int mvtvm_ml_model_blob_parse(struct rte_ml_model_params *params,
+ struct mvtvm_ml_model_object *object);
 
 #endif /* _MVTVM_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 88c6d5a864..e2413b6b15 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -8,8 +8,12 @@
 #include 
 
 #include "cnxk_ml_dev.h"
+#include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
 
+/* ML model macros */
+#define MVTVM_ML_MODEL_MEMZONE_NAME "ml_mvtvm_model_mz"
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct 
rte_ml_dev_config *conf)
 {
@@ -39,3 +43,61 @@ mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
 

[PATCH v6 22/34] ml/cnxk: fetch layer info and load TVM model

2023-10-18 Thread Srikanth Yalavarthi
Added support to fetch TVM model layer information and
update internal structures based on the layer information
Set callback functions for layer load and unload and
enable model loading using TVMDP library. Added support
to fetch full metadata after model load.

Signed-off-by: Srikanth Yalavarthi 
---
 drivers/ml/cnxk/cn10k_ml_model.c | 11 +
 drivers/ml/cnxk/cn10k_ml_model.h |  2 +
 drivers/ml/cnxk/cn10k_ml_ops.c   |  7 ++-
 drivers/ml/cnxk/mvtvm_ml_model.c | 25 ++
 drivers/ml/cnxk/mvtvm_ml_model.h |  4 ++
 drivers/ml/cnxk/mvtvm_ml_ops.c   | 81 
 drivers/ml/cnxk/mvtvm_ml_stubs.c | 10 
 drivers/ml/cnxk/mvtvm_ml_stubs.h |  3 ++
 8 files changed, 141 insertions(+), 2 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index b765b4ada9..9a80adf0fc 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -714,3 +714,14 @@ cn10k_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, 
struct cnxk_ml_layer *layer
cnxk_ml_print_line(fp, LINE_LEN);
fprintf(fp, "\n");
 }
+
+int
+cn10k_ml_model_get_layer_id(struct cnxk_ml_model *model, const char 
*layer_name, uint16_t *layer_id)
+{
+   if (model->type == ML_CNXK_MODEL_TYPE_TVM)
+   return mvtvm_ml_model_get_layer_id(model, layer_name, layer_id);
+
+   *layer_id = 0;
+
+   return 0;
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 45f2ed5fcf..6744175cd5 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -461,5 +461,7 @@ void cn10k_ml_model_info_set(struct cnxk_ml_dev 
*cnxk_mldev, struct cnxk_ml_mode
 struct cnxk_ml_io_info *io_info,
 struct cn10k_ml_model_metadata *metadata);
 void cn10k_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer 
*layer, FILE *fp);
+int cn10k_ml_model_get_layer_id(struct cnxk_ml_model *model, const char 
*layer_name,
+   uint16_t *layer_id);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index a471e98fbf..4191ccc840 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -576,7 +576,7 @@ cn10k_ml_layer_load(void *device, uint16_t model_id, const 
char *layer_name, uin
size_t layer_xstats_size;
uint8_t *base_dma_addr;
uint16_t scratch_pages;
-   uint16_t layer_id = 0;
+   uint16_t layer_id;
uint16_t wb_pages;
uint64_t mz_size;
uint16_t idx;
@@ -584,7 +584,6 @@ cn10k_ml_layer_load(void *device, uint16_t model_id, const 
char *layer_name, uin
int ret;
 
PLT_SET_USED(size);
-   PLT_SET_USED(layer_name);
 
cnxk_mldev = (struct cnxk_ml_dev *)device;
if (cnxk_mldev == NULL) {
@@ -598,6 +597,10 @@ cn10k_ml_layer_load(void *device, uint16_t model_id, const 
char *layer_name, uin
return -EINVAL;
}
 
+   ret = cn10k_ml_model_get_layer_id(model, layer_name, &layer_id);
+   if (ret != 0)
+   return ret;
+
layer = &model->layer[layer_id];
 
ret = cn10k_ml_model_metadata_check(buffer, size);
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
index 4c9a080c05..8536fd8927 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.c
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -110,3 +110,28 @@ mvtvm_ml_model_blob_parse(struct rte_ml_model_params 
*params, struct mvtvm_ml_mo
 
return -EINVAL;
 }
+
+int
+mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char 
*layer_name, uint16_t *layer_id)
+{
+   uint16_t i;
+
+   for (i = 0; i < model->mvtvm.metadata.model.nb_layers; i++) {
+   if (strcmp(model->layer[i].name, layer_name) == 0)
+   break;
+   }
+
+   if (i == model->mvtvm.metadata.model.nb_layers) {
+   plt_err("Invalid layer name: %s", layer_name);
+   return -EINVAL;
+   }
+
+   if (model->layer[i].type != ML_CNXK_LAYER_TYPE_MRVL) {
+   plt_err("Invalid layer type, name: %s type: %d", layer_name, 
model->layer[i].type);
+   return -EINVAL;
+   }
+
+   *layer_id = i;
+
+   return 0;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index b11b66f495..6cb2639876 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -11,6 +11,8 @@
 
 #include "cnxk_ml_io.h"
 
+struct cnxk_ml_model;
+
 /* Maximum number of objects per model */
 #define ML_MVTVM_MODEL_OBJECT_MAX 3
 
@@ -46,5 +48,7 @@ struct mvtvm_ml_model_data {
 enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params 
*params);
 int mvtvm_ml_model_blob_parse(struct rte_ml_model_params *params,
  struct mvtvm_ml_model_object *object);
+int mvtvm_ml_model_get_layer_id(struct cnxk_ml_mo

[PATCH v6 24/34] ml/cnxk: enable model unload in tvmdp library

2023-10-18 Thread Srikanth Yalavarthi
Enable unloading model using external tvmdp library. Updated
layer unload callback to support multiple layers.

Signed-off-by: Srikanth Yalavarthi 
Signed-off-by: Anup Prabhu 
---
 drivers/ml/cnxk/cn10k_ml_ops.c   |  8 +---
 drivers/ml/cnxk/cnxk_ml_ops.c|  7 +--
 drivers/ml/cnxk/mvtvm_ml_ops.c   | 28 
 drivers/ml/cnxk/mvtvm_ml_ops.h   |  1 +
 drivers/ml/cnxk/mvtvm_ml_stubs.c |  9 +
 drivers/ml/cnxk/mvtvm_ml_stubs.h |  1 +
 6 files changed, 49 insertions(+), 5 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 4191ccc840..e7208391fd 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -780,11 +780,9 @@ cn10k_ml_layer_unload(void *device, uint16_t model_id, 
const char *layer_name)
struct cnxk_ml_layer *layer;
 
char str[RTE_MEMZONE_NAMESIZE];
-   uint16_t layer_id = 0;
+   uint16_t layer_id;
int ret;
 
-   PLT_SET_USED(layer_name);
-
cnxk_mldev = (struct cnxk_ml_dev *)device;
if (cnxk_mldev == NULL) {
plt_err("Invalid device = %p", device);
@@ -797,6 +795,10 @@ cn10k_ml_layer_unload(void *device, uint16_t model_id, 
const char *layer_name)
return -EINVAL;
}
 
+   ret = cn10k_ml_model_get_layer_id(model, layer_name, &layer_id);
+   if (ret != 0)
+   return ret;
+
layer = &model->layer[layer_id];
 
snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u_%u", 
CN10K_ML_LAYER_MEMZONE_NAME,
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 90b23d9c1c..cd95a3c7ad 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1107,7 +1107,7 @@ cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t 
model_id)
struct cnxk_ml_model *model;
 
char str[RTE_MEMZONE_NAMESIZE];
-   int ret;
+   int ret = 0;
 
if (dev == NULL)
return -EINVAL;
@@ -1125,7 +1125,10 @@ cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t 
model_id)
return -EBUSY;
}
 
-   ret = cn10k_ml_model_unload(cnxk_mldev, model);
+   if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+   ret = cn10k_ml_model_unload(cnxk_mldev, model);
+   else
+   ret = mvtvm_ml_model_unload(cnxk_mldev, model);
if (ret != 0)
return ret;
 
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index e248310cb3..9fd9e58de6 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -185,3 +185,31 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct 
rte_ml_model_params *
 
return ret;
 }
+
+int
+mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model 
*model)
+{
+   char str[RTE_MEMZONE_NAMESIZE];
+   const struct plt_memzone *mz;
+   int ret;
+
+   RTE_SET_USED(cnxk_mldev);
+
+   /* Initialize model in TVMDP */
+   ret = tvmdp_model_unload(model->model_id);
+   if (ret != 0) {
+   plt_err("TVMDP: Model unload failed, model_id = %u, error = 
%d", model->model_id,
+   ret);
+   return ret;
+   }
+
+   snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", 
MVTVM_ML_MODEL_MEMZONE_NAME, model->model_id);
+   mz = rte_memzone_lookup(str);
+   if (mz == NULL) {
+   plt_err("Memzone lookup failed for TVM model: model_id = %u, mz 
= %s",
+   model->model_id, str);
+   return -EINVAL;
+   }
+
+   return plt_memzone_free(mz);
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index 6607537599..770794fe7d 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -18,5 +18,6 @@ int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, 
const struct rte_ml_d
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
 int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct 
rte_ml_model_params *params,
struct cnxk_ml_model *model);
+int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model 
*model);
 
 #endif /* _MVTVM_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index 80a9a90b4e..a17a76e41f 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -63,3 +63,12 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct 
rte_ml_model_params *
 
return -EINVAL;
 }
+
+int
+mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model 
*model)
+{
+   RTE_SET_USED(cnxk_mldev);
+   RTE_SET_USED(model);
+
+   return -EINVAL;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index 29f721072a..3776fb5369 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -15,6 +15,7 @@ int mvtvm_ml_dev_configu

[PATCH v6 23/34] ml/cnxk: update internal info for TVM model

2023-10-18 Thread Srikanth Yalavarthi
Enabled updating internal IO info structures for TVM model.
Compute static fields related to the model I/O.

Signed-off-by: Srikanth Yalavarthi 
---
 drivers/ml/cnxk/cnxk_ml_ops.c|   4 ++
 drivers/ml/cnxk/mvtvm_ml_model.c | 111 +++
 drivers/ml/cnxk/mvtvm_ml_model.h |   2 +
 drivers/ml/cnxk/mvtvm_ml_ops.c   |   3 +
 drivers/ml/cnxk/mvtvm_ml_stubs.c |   9 +++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |   1 +
 6 files changed, 130 insertions(+)

diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index b18271545d..90b23d9c1c 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1244,6 +1244,8 @@ cnxk_ml_io_quantize(struct rte_ml_dev *dev, uint16_t 
model_id, struct rte_ml_buf
 
if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
info = cn10k_ml_model_io_info_get(model, 0);
+   else
+   info = mvtvm_ml_model_io_info_get(model, 0);
 
if (info == NULL)
return -EINVAL;
@@ -1296,6 +1298,8 @@ cnxk_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t 
model_id, struct rte_ml_b
 
if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
info = cn10k_ml_model_io_info_get(model, model->nb_layers - 1);
+   else
+   info = mvtvm_ml_model_io_info_get(model, model->nb_layers - 1);
 
if (info == NULL)
return -EINVAL;
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
index 8536fd8927..14f4b258d8 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.c
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -7,6 +7,8 @@
 
 #include 
 
+#include 
+
 #include 
 
 #include "cnxk_ml_model.h"
@@ -135,3 +137,112 @@ mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, 
const char *layer_name,
 
return 0;
 }
+
+static enum rte_ml_io_type
+mvtvm_ml_io_type_map(uint8_t type)
+{
+   switch (type) {
+   case kDLInt:
+   return RTE_ML_IO_TYPE_INT32;
+   case kDLUInt:
+   return RTE_ML_IO_TYPE_UINT32;
+   case kDLFloat:
+   return RTE_ML_IO_TYPE_FP32;
+   case kDLBfloat:
+   return RTE_ML_IO_TYPE_BFLOAT16;
+   }
+
+   return RTE_ML_IO_TYPE_UNKNOWN;
+}
+
+void
+mvtvm_ml_model_io_info_set(struct cnxk_ml_model *model)
+{
+   struct tvmdp_model_metadata *metadata;
+   int32_t i;
+   int32_t j;
+
+   if (model->subtype == ML_CNXK_MODEL_SUBTYPE_TVM_MRVL)
+   goto tvm_mrvl_model;
+
+   metadata = &model->mvtvm.metadata;
+
+   /* Inputs, set for layer_id = 0 */
+   model->mvtvm.info.nb_inputs = metadata->model.num_input;
+   model->mvtvm.info.total_input_sz_d = 0;
+   model->mvtvm.info.total_input_sz_q = 0;
+   for (i = 0; i < metadata->model.num_input; i++) {
+   strncpy(model->mvtvm.info.input[i].name, 
metadata->input[i].name,
+   TVMDP_NAME_STRLEN);
+   model->mvtvm.info.input[i].dtype =
+   mvtvm_ml_io_type_map(metadata->input[i].datatype.code);
+   model->mvtvm.info.input[i].qtype =
+   
mvtvm_ml_io_type_map(metadata->input[i].model_datatype.code);
+   model->mvtvm.info.input[i].nb_dims = metadata->input[i].ndim;
+
+   model->mvtvm.info.input[i].nb_elements = 1;
+   for (j = 0; j < metadata->input[i].ndim; j++) {
+   model->mvtvm.info.input[i].shape[j] = 
metadata->input[i].shape[j];
+   model->mvtvm.info.input[i].nb_elements *= 
metadata->input[i].shape[j];
+   }
+
+   model->mvtvm.info.input[i].sz_d =
+   model->mvtvm.info.input[i].nb_elements *
+   
rte_ml_io_type_size_get(model->mvtvm.info.input[i].dtype);
+   model->mvtvm.info.input[i].sz_q =
+   model->mvtvm.info.input[i].nb_elements *
+   
rte_ml_io_type_size_get(model->mvtvm.info.input[i].qtype);
+
+   model->mvtvm.info.total_input_sz_d += 
model->mvtvm.info.input[i].sz_d;
+   model->mvtvm.info.total_input_sz_q += 
model->mvtvm.info.input[i].sz_q;
+
+   plt_ml_dbg("model_id = %u, input[%u] - sz_d = %u sz_q = %u", 
model->model_id, i,
+  model->mvtvm.info.input[i].sz_d, 
model->mvtvm.info.input[i].sz_q);
+   }
+
+   /* Outputs, set for nb_layers - 1 */
+   model->mvtvm.info.nb_outputs = metadata->model.num_output;
+   model->mvtvm.info.total_output_sz_d = 0;
+   model->mvtvm.info.total_output_sz_q = 0;
+   for (i = 0; i < metadata->model.num_output; i++) {
+   strncpy(model->mvtvm.info.output[i].name, 
metadata->output[i].name,
+   TVMDP_NAME_STRLEN);
+   model->mvtvm.info.output[i].dtype =
+   mvtvm_ml_io_type_map(metadata->output[i].datatype.code);
+   model->mvtvm.info.output[i].qtype =
+   
mvtvm_ml_io_

[PATCH v6 25/34] ml/cnxk: enable OCM check for multilayer TVM model

2023-10-18 Thread Srikanth Yalavarthi
From: Anup Prabhu 

Enabled check for OCM size requirement for multi-layer
TVM model. Compute OCM scratch and WB requirement for
all layers during the load stage.

Signed-off-by: Anup Prabhu 
---
 drivers/ml/cnxk/cnxk_ml_ops.c | 60 +++
 1 file changed, 60 insertions(+)

diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index cd95a3c7ad..03f4783b3f 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1023,8 +1023,12 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct 
rte_ml_model_params *params, u
 
char str[RTE_MEMZONE_NAMESIZE];
const struct plt_memzone *mz;
+   uint16_t max_scratch_pages;
+   struct cn10k_ml_ocm *ocm;
uint64_t model_info_size;
+   uint16_t total_wb_pages;
uint16_t lcl_model_id;
+   uint16_t layer_id;
uint64_t mz_size;
bool found;
int ret;
@@ -1086,6 +1090,62 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct 
rte_ml_model_params *params, u
if (ret != 0)
goto error;
 
+   max_scratch_pages = 0;
+   total_wb_pages = 0;
+   layer_id = 0;
+
+   ocm = &cnxk_mldev->cn10k_mldev.ocm;
+
+   if (model->type == ML_CNXK_MODEL_TYPE_GLOW) {
+   total_wb_pages = total_wb_pages + 
model->layer[layer_id].glow.ocm_map.wb_pages;
+   max_scratch_pages = PLT_MAX(max_scratch_pages,
+   
model->layer[layer_id].glow.ocm_map.scratch_pages);
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+   } else {
+   for (layer_id = 0; layer_id < 
model->mvtvm.metadata.model.nb_layers; layer_id++) {
+   if (model->layer[layer_id].type == 
ML_CNXK_LAYER_TYPE_MRVL) {
+   total_wb_pages = total_wb_pages +
+
model->layer[layer_id].glow.ocm_map.wb_pages;
+   max_scratch_pages =
+   PLT_MAX(max_scratch_pages,
+   
model->layer[layer_id].glow.ocm_map.scratch_pages);
+   }
+   }
+#endif
+   }
+
+   if ((total_wb_pages + max_scratch_pages) > ocm->num_pages) {
+   plt_err("model_id = %u: total_wb_pages (%u) + scratch_pages 
(%u) >  %u\n",
+   lcl_model_id, total_wb_pages, max_scratch_pages, 
ocm->num_pages);
+
+   if (model->type == ML_CNXK_MODEL_TYPE_GLOW) {
+   plt_ml_dbg("layer_id = %u: wb_pages = %u, scratch_pages 
= %u\n", layer_id,
+  model->layer[layer_id].glow.ocm_map.wb_pages,
+  
model->layer[layer_id].glow.ocm_map.scratch_pages);
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+   } else {
+   for (layer_id = 0; layer_id < 
model->mvtvm.metadata.model.nb_layers;
+layer_id++) {
+   if (model->layer[layer_id].type == 
ML_CNXK_LAYER_TYPE_MRVL) {
+   plt_ml_dbg(
+   "layer_id = %u: wb_pages = %u, 
scratch_pages = %u\n",
+   layer_id,
+   
model->layer[layer_id].glow.ocm_map.wb_pages,
+   
model->layer[layer_id].glow.ocm_map.scratch_pages);
+   }
+   }
+#endif
+   }
+
+   if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+   cn10k_ml_model_unload(cnxk_mldev, model);
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+   else {
+   mvtvm_ml_model_unload(cnxk_mldev, model);
+   return -ENOMEM;
+   }
+#endif
+   }
plt_spinlock_init(&model->lock);
model->state = ML_CNXK_MODEL_STATE_LOADED;
cnxk_mldev->nb_models_loaded++;
-- 
2.42.0



[PATCH v6 26/34] ml/cnxk: support start and stop for TVM models

2023-10-18 Thread Srikanth Yalavarthi
Added support to start and stop TVM models. TVM model
start would invoke layer start for all Glow layers part
of the model. TVM model stop would invoke layer stop
for all Glow layers part of the model.

Signed-off-by: Srikanth Yalavarthi 
Signed-off-by: Anup Prabhu 
---
 drivers/ml/cnxk/cn10k_ml_ops.c   | 16 ++
 drivers/ml/cnxk/cnxk_ml_ops.c| 14 +++--
 drivers/ml/cnxk/mvtvm_ml_ops.c   | 52 
 drivers/ml/cnxk/mvtvm_ml_ops.h   |  2 ++
 drivers/ml/cnxk/mvtvm_ml_stubs.c | 18 +++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |  2 ++
 6 files changed, 96 insertions(+), 8 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index e7208391fd..2d308802cf 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -827,7 +827,7 @@ cn10k_ml_layer_start(void *device, uint16_t model_id, const 
char *layer_name)
struct cn10k_ml_ocm *ocm;
struct cnxk_ml_req *req;
 
-   uint16_t layer_id = 0;
+   uint16_t layer_id;
bool job_enqueued;
bool job_dequeued;
uint8_t num_tiles;
@@ -838,8 +838,6 @@ cn10k_ml_layer_start(void *device, uint16_t model_id, const 
char *layer_name)
bool locked;
int ret = 0;
 
-   PLT_SET_USED(layer_name);
-
cnxk_mldev = (struct cnxk_ml_dev *)device;
if (cnxk_mldev == NULL) {
plt_err("Invalid device = %p", device);
@@ -852,6 +850,10 @@ cn10k_ml_layer_start(void *device, uint16_t model_id, 
const char *layer_name)
return -EINVAL;
}
 
+   ret = cn10k_ml_model_get_layer_id(model, layer_name, &layer_id);
+   if (ret != 0)
+   return ret;
+
layer = &model->layer[layer_id];
cn10k_mldev = &cnxk_mldev->cn10k_mldev;
ocm = &cn10k_mldev->ocm;
@@ -1015,14 +1017,12 @@ cn10k_ml_layer_stop(void *device, uint16_t model_id, 
const char *layer_name)
struct cn10k_ml_ocm *ocm;
struct cnxk_ml_req *req;
 
-   uint16_t layer_id = 0;
+   uint16_t layer_id;
bool job_enqueued;
bool job_dequeued;
bool locked;
int ret = 0;
 
-   PLT_SET_USED(layer_name);
-
cnxk_mldev = (struct cnxk_ml_dev *)device;
if (cnxk_mldev == NULL) {
plt_err("Invalid device = %p", device);
@@ -1035,6 +1035,10 @@ cn10k_ml_layer_stop(void *device, uint16_t model_id, 
const char *layer_name)
return -EINVAL;
}
 
+   ret = cn10k_ml_model_get_layer_id(model, layer_name, &layer_id);
+   if (ret != 0)
+   return ret;
+
layer = &model->layer[layer_id];
cn10k_mldev = &cnxk_mldev->cn10k_mldev;
ocm = &cn10k_mldev->ocm;
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 03f4783b3f..66cda513db 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1216,7 +1216,12 @@ cnxk_ml_model_start(struct rte_ml_dev *dev, uint16_t 
model_id)
return -EINVAL;
}
 
-   return cn10k_ml_model_start(cnxk_mldev, model);
+   if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+   return cn10k_ml_model_start(cnxk_mldev, model);
+   else
+   return mvtvm_ml_model_start(cnxk_mldev, model);
+
+   return 0;
 }
 
 int
@@ -1236,7 +1241,12 @@ cnxk_ml_model_stop(struct rte_ml_dev *dev, uint16_t 
model_id)
return -EINVAL;
}
 
-   return cn10k_ml_model_stop(cnxk_mldev, model);
+   if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+   return cn10k_ml_model_stop(cnxk_mldev, model);
+   else
+   return mvtvm_ml_model_stop(cnxk_mldev, model);
+
+   return 0;
 }
 
 static int
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 9fd9e58de6..1d0b3544a7 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -213,3 +213,55 @@ mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, 
struct cnxk_ml_model *mode
 
return plt_memzone_free(mz);
 }
+
+int
+mvtvm_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model 
*model)
+{
+   struct cnxk_ml_layer *layer;
+
+   uint16_t layer_id = 0;
+   int ret = 0;
+
+next_layer:
+   layer = &model->layer[layer_id];
+   if (layer->type == ML_CNXK_LAYER_TYPE_MRVL) {
+   ret = cn10k_ml_layer_start(cnxk_mldev, model->model_id, 
layer->name);
+   if (ret != 0) {
+   plt_err("Layer start failed, model_id = %u, layer_name 
= %s, error = %d",
+   model->model_id, layer->name, ret);
+   return ret;
+   }
+   }
+   layer_id++;
+
+   if (layer_id < model->nb_layers)
+   goto next_layer;
+
+   return 0;
+}
+
+int
+mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model 
*model)
+{
+   struct cnxk_ml_layer *layer;
+
+   uint16_t layer_id = 0;
+   

[PATCH v6 28/34] ml/cnxk: support device dump for TVM models

2023-10-18 Thread Srikanth Yalavarthi
Enabled support to print TVM model layer info.

Signed-off-by: Srikanth Yalavarthi 
---
 drivers/ml/cnxk/cnxk_ml_model.c  |  7 +++-
 drivers/ml/cnxk/mvtvm_ml_model.c | 59 
 drivers/ml/cnxk/mvtvm_ml_model.h |  2 ++
 drivers/ml/cnxk/mvtvm_ml_stubs.c |  8 +
 drivers/ml/cnxk/mvtvm_ml_stubs.h |  2 ++
 5 files changed, 77 insertions(+), 1 deletion(-)

diff --git a/drivers/ml/cnxk/cnxk_ml_model.c b/drivers/ml/cnxk/cnxk_ml_model.c
index 02f80410ec..ed6a1ed866 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.c
+++ b/drivers/ml/cnxk/cnxk_ml_model.c
@@ -68,6 +68,8 @@ cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct 
cnxk_ml_model *model,
cnxk_ml_print_line(fp, LINE_LEN);
fprintf(fp, "%*s : %u\n", FIELD_LEN, "model_id", model->model_id);
fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", model->name);
+   fprintf(fp, "%*s : %d\n", FIELD_LEN, "type", model->type);
+   fprintf(fp, "%*s : %d\n", FIELD_LEN, "subtype", model->subtype);
fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "model", 
PLT_U64_CAST(model));
fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", model->batch_size);
fprintf(fp, "%*s : %u\n", FIELD_LEN, "nb_layers", model->nb_layers);
@@ -84,6 +86,9 @@ cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct 
cnxk_ml_model *model,
 
for (layer_id = 0; layer_id < model->nb_layers; layer_id++) {
layer = &model->layer[layer_id];
-   cn10k_ml_layer_print(cnxk_mldev, layer, fp);
+   if (layer->type == ML_CNXK_LAYER_TYPE_MRVL)
+   cn10k_ml_layer_print(cnxk_mldev, layer, fp);
+   else
+   mvtvm_ml_layer_print(cnxk_mldev, layer, fp);
}
 }
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
index 569147aca7..4c12f584d5 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.c
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -13,6 +13,7 @@
 
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
+#include "cnxk_ml_utils.h"
 
 /* Objects list */
 char mvtvm_object_list[ML_MVTVM_MODEL_OBJECT_MAX][RTE_ML_STR_MAX] = {"mod.so", 
"mod.json",
@@ -311,3 +312,61 @@ mvtvm_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, 
struct cnxk_ml_model *mo
cn10k_ml_model_info_set(cnxk_mldev, model, &model->mvtvm.info,
&model->layer[0].glow.metadata);
 }
+
+void
+mvtvm_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer 
*layer, FILE *fp)
+{
+   char str[STR_LEN];
+   uint8_t i;
+
+   /* Print debug info */
+   cnxk_ml_print_line(fp, LINE_LEN);
+   fprintf(fp, " Layer Information (Layer ID: %u, Name: %s)\n",
+   cnxk_mldev->index_map[layer->index].layer_id, layer->name);
+   cnxk_ml_print_line(fp, LINE_LEN);
+   fprintf(fp, "%*s : %u\n", FIELD_LEN, "layer_id",
+   cnxk_mldev->index_map[layer->index].layer_id);
+   fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", layer->name);
+   fprintf(fp, "%*s : %d\n", FIELD_LEN, "type", layer->type);
+   fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "layer", 
PLT_U64_CAST(layer));
+   fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", layer->batch_size);
+
+   /* Print model state */
+   if (layer->state == ML_CNXK_LAYER_STATE_LOADED)
+   fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "loaded");
+   if (layer->state == ML_CNXK_LAYER_STATE_JOB_ACTIVE)
+   fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "job_active");
+   if (layer->state == ML_CNXK_LAYER_STATE_STARTED)
+   fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "started");
+
+   fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", 
layer->info.nb_inputs);
+   fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_outputs", 
layer->info.nb_outputs);
+   fprintf(fp, "\n");
+
+   cnxk_ml_print_line(fp, LINE_LEN);
+   fprintf(fp, "%8s  %16s  %12s\n", "input", "input_name", "input_type");
+   cnxk_ml_print_line(fp, LINE_LEN);
+   for (i = 0; i < layer->info.nb_inputs; i++) {
+   fprintf(fp, "%8u  ", i);
+   fprintf(fp, "%*s  ", 16, layer->info.input[i].name);
+   rte_ml_io_type_to_str(layer->info.input[i].qtype, str, STR_LEN);
+   fprintf(fp, "%*s  ", 12, str);
+   }
+   fprintf(fp, "\n");
+   cnxk_ml_print_line(fp, LINE_LEN);
+   fprintf(fp, "\n");
+
+   cnxk_ml_print_line(fp, LINE_LEN);
+   fprintf(fp, "%8s  %16s  %12s\n", "output", "output_name", 
"output_type");
+   cnxk_ml_print_line(fp, LINE_LEN);
+   for (i = 0; i < layer->info.nb_outputs; i++) {
+   fprintf(fp, "%8u  ", i);
+   fprintf(fp, "%*s  ", 16, layer->info.output[i].name);
+   rte_ml_io_type_to_str(layer->info.output[i].qtype, str, 
STR_LEN);
+   fprintf(fp, "%*s  ", 12, str);
+   fprintf(fp, "\n");
+   }
+   fprintf(fp, "\n");
+   cnxk_ml_print_line(fp, LINE_

[PATCH v6 27/34] ml/cnxk: update internal TVM model info structure

2023-10-18 Thread Srikanth Yalavarthi
From: Prince Takkar 

Added support to update internal model info structure
for TVM models.

Signed-off-by: Prince Takkar 
Signed-off-by: Srikanth Yalavarthi 
---
 drivers/ml/cnxk/mvtvm_ml_model.c | 65 
 drivers/ml/cnxk/mvtvm_ml_model.h |  2 +
 drivers/ml/cnxk/mvtvm_ml_ops.c   |  3 ++
 3 files changed, 70 insertions(+)

diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
index 14f4b258d8..569147aca7 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.c
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -11,6 +11,7 @@
 
 #include 
 
+#include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
 
 /* Objects list */
@@ -246,3 +247,67 @@ mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, 
uint16_t layer_id)
 
return &model->mvtvm.info;
 }
+
+void
+mvtvm_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model 
*model)
+{
+   struct tvmdp_model_metadata *metadata;
+   struct rte_ml_model_info *info;
+   struct rte_ml_io_info *output;
+   struct rte_ml_io_info *input;
+   uint8_t i;
+
+   info = PLT_PTR_CAST(model->info);
+   input = PLT_PTR_ADD(info, sizeof(struct rte_ml_model_info));
+   output = PLT_PTR_ADD(input, ML_CNXK_MODEL_MAX_INPUT_OUTPUT * 
sizeof(struct rte_ml_io_info));
+
+   /* Reset model info */
+   memset(info, 0, sizeof(struct rte_ml_model_info));
+
+   if (model->subtype == ML_CNXK_MODEL_SUBTYPE_TVM_MRVL)
+   goto tvm_mrvl_model;
+
+   metadata = &model->mvtvm.metadata;
+   rte_memcpy(info->name, metadata->model.name, TVMDP_NAME_STRLEN);
+   snprintf(info->version, RTE_ML_STR_MAX, "%u.%u.%u.%u", 
metadata->model.version[0],
+metadata->model.version[1], metadata->model.version[2],
+metadata->model.version[3]);
+   info->model_id = model->model_id;
+   info->device_id = cnxk_mldev->mldev->data->dev_id;
+   info->io_layout = RTE_ML_IO_LAYOUT_SPLIT;
+   info->min_batches = model->batch_size;
+   info->max_batches = model->batch_size;
+   info->nb_inputs = metadata->model.num_input;
+   info->input_info = input;
+   info->nb_outputs = metadata->model.num_output;
+   info->output_info = output;
+   info->wb_size = 0;
+
+   /* Set input info */
+   for (i = 0; i < info->nb_inputs; i++) {
+   rte_memcpy(input[i].name, metadata->input[i].name, 
MRVL_ML_INPUT_NAME_LEN);
+   input[i].nb_dims = metadata->input[i].ndim;
+   input[i].shape = &model->mvtvm.info.input[i].shape[0];
+   input[i].type = model->mvtvm.info.input[i].qtype;
+   input[i].nb_elements = model->mvtvm.info.input[i].nb_elements;
+   input[i].size = model->mvtvm.info.input[i].nb_elements *
+   
rte_ml_io_type_size_get(model->mvtvm.info.input[i].qtype);
+   }
+
+   /* Set output info */
+   for (i = 0; i < info->nb_outputs; i++) {
+   rte_memcpy(output[i].name, metadata->output[i].name, 
MRVL_ML_OUTPUT_NAME_LEN);
+   output[i].nb_dims = metadata->output[i].ndim;
+   output[i].shape = &model->mvtvm.info.output[i].shape[0];
+   output[i].type = model->mvtvm.info.output[i].qtype;
+   output[i].nb_elements = model->mvtvm.info.output[i].nb_elements;
+   output[i].size = model->mvtvm.info.output[i].nb_elements *
+
rte_ml_io_type_size_get(model->mvtvm.info.output[i].qtype);
+   }
+
+   return;
+
+tvm_mrvl_model:
+   cn10k_ml_model_info_set(cnxk_mldev, model, &model->mvtvm.info,
+   &model->layer[0].glow.metadata);
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index e86581bc6a..a1247ffbde 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -11,6 +11,7 @@
 
 #include "cnxk_ml_io.h"
 
+struct cnxk_ml_dev;
 struct cnxk_ml_model;
 
 /* Maximum number of objects per model */
@@ -52,5 +53,6 @@ int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, 
const char *layer_n
uint16_t *layer_id);
 void mvtvm_ml_model_io_info_set(struct cnxk_ml_model *model);
 struct cnxk_ml_io_info *mvtvm_ml_model_io_info_get(struct cnxk_ml_model 
*model, uint16_t layer_id);
+void mvtvm_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct 
cnxk_ml_model *model);
 
 #endif /* _MVTVM_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 1d0b3544a7..f13ba76207 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -178,6 +178,9 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct 
rte_ml_model_params *
/* Update model I/O data */
mvtvm_ml_model_io_info_set(model);
 
+   /* Set model info */
+   mvtvm_ml_model_info_set(cnxk_mldev, model);
+
return 0;
 
 error:
-- 
2.42.0



[PATCH v6 29/34] ml/cnxk: enable reporting model runtime as xstats

2023-10-18 Thread Srikanth Yalavarthi
Added model xstats entries to compute runtime latency.
Allocated internal resources for TVM model xstats.

Signed-off-by: Srikanth Yalavarthi 
---
 drivers/ml/cnxk/cn10k_ml_ops.c   |   9 +++
 drivers/ml/cnxk/cn10k_ml_ops.h   |   2 +
 drivers/ml/cnxk/cnxk_ml_ops.c| 131 +++
 drivers/ml/cnxk/cnxk_ml_ops.h|   1 +
 drivers/ml/cnxk/cnxk_ml_xstats.h |   7 ++
 drivers/ml/cnxk/mvtvm_ml_model.h |  24 ++
 drivers/ml/cnxk/mvtvm_ml_ops.c   |  96 +-
 drivers/ml/cnxk/mvtvm_ml_ops.h   |   8 ++
 drivers/ml/cnxk/mvtvm_ml_stubs.c |  23 ++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |   6 ++
 10 files changed, 289 insertions(+), 18 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 2d308802cf..0c67ce7b40 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -197,6 +197,15 @@ cn10k_ml_xstats_layer_name_update(struct cnxk_ml_dev 
*cnxk_mldev, uint16_t model
}
 }
 
+void
+cn10k_ml_xstat_model_name_set(struct cnxk_ml_dev *cnxk_mldev, struct 
cnxk_ml_model *model,
+ uint16_t stat_id, uint16_t entry, char *suffix)
+{
+   snprintf(cnxk_mldev->xstats.entries[stat_id].map.name,
+sizeof(cnxk_mldev->xstats.entries[stat_id].map.name), 
"%s-%s-%s",
+model->glow.metadata.model.name, model_xstats[entry].name, 
suffix);
+}
+
 #define ML_AVG_FOREACH_QP(cnxk_mldev, layer, qp_id, str, value, count) 
\
do {
   \
value = 0;  
   \
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 3d18303ed3..045e2e6cd2 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -331,6 +331,8 @@ int cn10k_ml_layer_start(void *device, uint16_t model_id, 
const char *layer_name
 int cn10k_ml_layer_stop(void *device, uint16_t model_id, const char 
*layer_name);
 
 /* xstats ops */
+void cn10k_ml_xstat_model_name_set(struct cnxk_ml_dev *cnxk_mldev, struct 
cnxk_ml_model *model,
+  uint16_t stat_id, uint16_t entry, char 
*suffix);
 uint64_t cn10k_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct 
cnxk_ml_layer *layer,
  enum cnxk_ml_xstats_type type);
 
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 66cda513db..fd2c46ac1f 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -138,7 +138,8 @@ cnxk_ml_xstats_init(struct cnxk_ml_dev *cnxk_mldev)
 
/* Allocate memory for xstats entries. Don't allocate during 
reconfigure */
nb_stats = RTE_DIM(device_xstats) +
-  RTE_DIM(layer_xstats) * ML_CNXK_MAX_MODELS * 
ML_CNXK_MODEL_MAX_LAYERS;
+  RTE_DIM(layer_xstats) * ML_CNXK_MAX_MODELS * 
ML_CNXK_MODEL_MAX_LAYERS +
+  RTE_DIM(model_xstats) * ML_CNXK_MAX_MODELS;
if (cnxk_mldev->xstats.entries == NULL)
cnxk_mldev->xstats.entries = rte_zmalloc(
"cnxk_ml_xstats", sizeof(struct cnxk_ml_xstats_entry) * 
nb_stats,
@@ -169,6 +170,25 @@ cnxk_ml_xstats_init(struct cnxk_ml_dev *cnxk_mldev)
for (model = 0; model < ML_CNXK_MAX_MODELS; model++) {
cnxk_mldev->xstats.offset_for_model[model] = stat_id;
 
+   for (i = 0; i < RTE_DIM(model_xstats); i++) {
+   cnxk_mldev->xstats.entries[stat_id].map.id = stat_id;
+   cnxk_mldev->xstats.entries[stat_id].mode = 
RTE_ML_DEV_XSTATS_MODEL;
+   cnxk_mldev->xstats.entries[stat_id].group = 
CNXK_ML_XSTATS_GROUP_MODEL;
+   cnxk_mldev->xstats.entries[stat_id].type = 
model_xstats[i].type;
+   cnxk_mldev->xstats.entries[stat_id].fn_id = 
CNXK_ML_XSTATS_FN_MODEL;
+   cnxk_mldev->xstats.entries[stat_id].obj_idx = model;
+   cnxk_mldev->xstats.entries[stat_id].layer_id = -1;
+   cnxk_mldev->xstats.entries[stat_id].reset_allowed =
+   model_xstats[i].reset_allowed;
+
+   /* Name of xstat is updated during model load */
+   snprintf(cnxk_mldev->xstats.entries[stat_id].map.name,
+
sizeof(cnxk_mldev->xstats.entries[stat_id].map.name),
+"Model-%u-%s", model, model_xstats[i].name);
+
+   stat_id++;
+   }
+
for (layer = 0; layer < ML_CNXK_MODEL_MAX_LAYERS; layer++) {
cnxk_mldev->xstats.offset_for_layer[model][layer] = 
stat_id;
 
@@ -195,7 +215,8 @@ cnxk_ml_xstats_init(struct cnxk_ml_dev *cnxk_mldev)
cnxk_mldev->xstats.count_per_layer[model][layer] = 
RTE_DIM(

[PATCH v6 31/34] ml/cnxk: add generic ML malloc and free callback

2023-10-18 Thread Srikanth Yalavarthi
Implemented generic ML malloc and free callbacks

Signed-off-by: Srikanth Yalavarthi 
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 30 ++
 drivers/ml/cnxk/cn10k_ml_ops.h |  3 +++
 drivers/ml/cnxk/mvtvm_ml_ops.c |  2 ++
 3 files changed, 35 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 7802425c87..01b0a44caa 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -1497,3 +1497,33 @@ cn10k_ml_io_free(void *device, uint16_t model_id, const 
char *layer_name)
 
return plt_memzone_free(mz);
 }
+
+int
+cn10k_ml_malloc(const char *name, size_t size, uint32_t align, void **addr)
+{
+   const struct plt_memzone *mz;
+
+   mz = plt_memzone_reserve_aligned(name, size, 0, align);
+   if (mz == NULL) {
+   plt_err("ml_malloc failed: Unable to allocate memory: name = 
%s", name);
+   return -ENOMEM;
+   }
+
+   *addr = mz->addr;
+
+   return 0;
+}
+
+int
+cn10k_ml_free(const char *name)
+{
+   const struct plt_memzone *mz;
+
+   mz = plt_memzone_lookup(name);
+   if (mz == NULL) {
+   plt_err("ml_free failed: Memzone not found: name = %s", name);
+   return -EINVAL;
+   }
+
+   return plt_memzone_free(mz);
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 9c41c1c0b0..eb3e1c139c 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -333,6 +333,9 @@ int cn10k_ml_io_alloc(void *device, uint16_t model_id, 
const char *layer_name,
  uint64_t **input_qbuffer, uint64_t **output_qbuffer);
 int cn10k_ml_io_free(void *device, uint16_t model_id, const char *layer_name);
 
+int cn10k_ml_malloc(const char *name, size_t size, uint32_t align, void 
**addr);
+int cn10k_ml_free(const char *name);
+
 /* xstats ops */
 void cn10k_ml_xstat_model_name_set(struct cnxk_ml_dev *cnxk_mldev, struct 
cnxk_ml_model *model,
   uint16_t stat_id, uint16_t entry, char 
*suffix);
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 77c2b5bcdc..b627355917 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -234,6 +234,8 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct 
rte_ml_model_params *
callback->tvmrt_glow_layer_unload = cn10k_ml_layer_unload;
callback->tvmrt_io_alloc = cn10k_ml_io_alloc;
callback->tvmrt_io_free = cn10k_ml_io_free;
+   callback->tvmrt_malloc = cn10k_ml_malloc;
+   callback->tvmrt_free = cn10k_ml_free;
} else {
callback = NULL;
}
-- 
2.42.0



[PATCH v6 32/34] ml/cnxk: support quantize and dequantize callback

2023-10-18 Thread Srikanth Yalavarthi
From: Prince Takkar 

Added support for quantize and dequantize callback
functions for TVM models.

Signed-off-by: Prince Takkar 
---
 drivers/ml/cnxk/mvtvm_ml_ops.c | 129 +
 drivers/ml/cnxk/mvtvm_ml_ops.h |   4 +
 2 files changed, 133 insertions(+)

diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index b627355917..776675843a 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -2,11 +2,15 @@
  * Copyright (c) 2023 Marvell.
  */
 
+#include 
+
 #include 
 #include 
 #include 
 #include 
 
+#include 
+
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
@@ -236,6 +240,8 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct 
rte_ml_model_params *
callback->tvmrt_io_free = cn10k_ml_io_free;
callback->tvmrt_malloc = cn10k_ml_malloc;
callback->tvmrt_free = cn10k_ml_free;
+   callback->tvmrt_quantize = mvtvm_ml_io_quantize;
+   callback->tvmrt_dequantize = mvtvm_ml_io_dequantize;
} else {
callback = NULL;
}
@@ -366,3 +372,126 @@ mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, 
struct cnxk_ml_model *model)
 
return 0;
 }
+
+int
+mvtvm_ml_io_quantize(void *device, uint16_t model_id, const char *layer_name,
+const DLTensor **deq_tensor, void *qbuffer)
+{
+   struct cnxk_ml_io_info *info = NULL;
+   struct cnxk_ml_dev *cnxk_mldev;
+   struct cnxk_ml_model *model;
+   uint16_t layer_id = 0;
+   uint8_t *lcl_dbuffer;
+   uint8_t *lcl_qbuffer;
+   uint32_t i;
+   int ret;
+
+#ifdef CNXK_ML_DEV_DEBUG
+   if ((device == NULL) || (deq_tensor == NULL) || (qbuffer == NULL))
+   return -EINVAL;
+#endif
+
+   cnxk_mldev = (struct cnxk_ml_dev *)device;
+
+   model = cnxk_mldev->mldev->data->models[model_id];
+#ifdef CNXK_ML_DEV_DEBUG
+   if (model == NULL) {
+   plt_err("Invalid model_id = %u", model_id);
+   return -EINVAL;
+   }
+#endif
+
+   /* Get layer id */
+   for (layer_id = 0; layer_id < model->mvtvm.metadata.model.nb_layers; 
layer_id++) {
+   if (strcmp(model->layer[layer_id].name, layer_name) == 0)
+   break;
+   }
+
+#ifdef CNXK_ML_DEV_DEBUG
+   if (layer_id == model->mvtvm.metadata.model.nb_layers) {
+   plt_err("Invalid layer name: %s", layer_name);
+   return -EINVAL;
+   }
+
+   if (model->layer[layer_id].type != ML_CNXK_LAYER_TYPE_MRVL) {
+   plt_err("Invalid layer name / type: %s", layer_name);
+   return -EINVAL;
+   }
+#endif
+
+   info = &model->layer[layer_id].info;
+   lcl_qbuffer = (uint8_t *)qbuffer;
+
+   for (i = 0; i < info->nb_inputs; i++) {
+   lcl_dbuffer = PLT_PTR_ADD(deq_tensor[i]->data, 
deq_tensor[i]->byte_offset);
+
+   ret = cnxk_ml_io_quantize_single(&info->input[i], lcl_dbuffer, 
lcl_qbuffer);
+   if (ret < 0)
+   return ret;
+
+   lcl_qbuffer += info->input[i].sz_q;
+   }
+
+   return 0;
+}
+
+int
+mvtvm_ml_io_dequantize(void *device, uint16_t model_id, const char 
*layer_name, void *qbuffer,
+  const DLTensor **deq_tensor)
+{
+   struct cnxk_ml_io_info *info = NULL;
+   struct cnxk_ml_dev *cnxk_mldev;
+   struct cnxk_ml_model *model;
+   uint16_t layer_id = 0;
+   uint8_t *lcl_dbuffer;
+   uint8_t *lcl_qbuffer;
+   uint32_t i;
+   int ret;
+
+#ifdef CNXK_ML_DEV_DEBUG
+   if ((device == NULL) || (deq_tensor == NULL) || (qbuffer == NULL))
+   return -EINVAL;
+#endif
+
+   cnxk_mldev = (struct cnxk_ml_dev *)device;
+
+   model = cnxk_mldev->mldev->data->models[model_id];
+#ifdef CNXK_ML_DEV_DEBUG
+   if (model == NULL) {
+   plt_err("Invalid model_id = %u", model_id);
+   return -EINVAL;
+   }
+#endif
+
+   for (layer_id = 0; layer_id < model->mvtvm.metadata.model.nb_layers; 
layer_id++) {
+   if (strcmp(model->layer[layer_id].name, layer_name) == 0)
+   break;
+   }
+
+#ifdef CNXK_ML_DEV_DEBUG
+   if (layer_id == model->mvtvm.metadata.model.nb_layers) {
+   plt_err("Invalid layer name: %s", layer_name);
+   return -EINVAL;
+   }
+
+   if (model->layer[layer_id].type != ML_CNXK_LAYER_TYPE_MRVL) {
+   plt_err("Invalid layer name / type: %s", layer_name);
+   return -EINVAL;
+   }
+#endif
+
+   info = &model->layer[layer_id].info;
+   lcl_qbuffer = (uint8_t *)qbuffer;
+
+   for (i = 0; i < info->nb_outputs; i++) {
+   lcl_dbuffer = PLT_PTR_ADD(deq_tensor[i]->data, 
deq_tensor[i]->byte_offset);
+
+   ret = cnxk_ml_io_dequantize_single(&info->output[i], 
lcl_qbuffer, lcl_dbuffer);
+   if (ret < 0)
+ 

[PATCH v6 30/34] ml/cnxk: implement I/O alloc and free callbacks

2023-10-18 Thread Srikanth Yalavarthi
Implemented callback functions for IO allocation and free
for Glow layers.

Signed-off-by: Srikanth Yalavarthi 
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 87 ++
 drivers/ml/cnxk/cn10k_ml_ops.h |  3 ++
 drivers/ml/cnxk/mvtvm_ml_ops.c |  2 +
 3 files changed, 92 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 0c67ce7b40..7802425c87 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -1410,3 +1410,90 @@ cn10k_ml_inference_sync(void *device, uint16_t index, 
void *input, void *output,
 error_enqueue:
return ret;
 }
+
+int
+cn10k_ml_io_alloc(void *device, uint16_t model_id, const char *layer_name, 
uint64_t **input_qbuffer,
+ uint64_t **output_qbuffer)
+{
+   struct cnxk_ml_dev *cnxk_mldev;
+   struct cnxk_ml_model *model;
+   struct cnxk_ml_layer *layer;
+
+   char str[RTE_MEMZONE_NAMESIZE];
+   const struct plt_memzone *mz;
+   uint64_t output_size;
+   uint64_t input_size;
+   uint16_t layer_id;
+   int ret;
+
+   cnxk_mldev = (struct cnxk_ml_dev *)device;
+   if (cnxk_mldev == NULL) {
+   plt_err("Invalid device = %p", device);
+   return -EINVAL;
+   }
+
+   model = cnxk_mldev->mldev->data->models[model_id];
+   if (model == NULL) {
+   plt_err("Invalid model_id = %u", model_id);
+   return -EINVAL;
+   }
+
+   ret = cn10k_ml_model_get_layer_id(model, layer_name, &layer_id);
+   if (ret != 0)
+   return ret;
+
+   layer = &model->layer[layer_id];
+   input_size = PLT_ALIGN_CEIL(layer->info.total_input_sz_q, 
ML_CN10K_ALIGN_SIZE);
+   output_size = PLT_ALIGN_CEIL(layer->info.total_output_sz_q, 
ML_CN10K_ALIGN_SIZE);
+
+   sprintf(str, "cn10k_ml_io_mz_%u_%u", model_id, layer_id);
+   mz = plt_memzone_reserve_aligned(str, input_size + output_size, 0, 
ML_CN10K_ALIGN_SIZE);
+   if (mz == NULL) {
+   plt_err("io_alloc failed: Unable to allocate memory: model_id = 
%u, layer_name = %s",
+   model_id, layer_name);
+   return -ENOMEM;
+   }
+
+   *input_qbuffer = mz->addr;
+   *output_qbuffer = PLT_PTR_ADD(mz->addr, input_size);
+
+   return 0;
+}
+
+int
+cn10k_ml_io_free(void *device, uint16_t model_id, const char *layer_name)
+{
+   struct cnxk_ml_dev *cnxk_mldev;
+   struct cnxk_ml_model *model;
+
+   char str[RTE_MEMZONE_NAMESIZE];
+   const struct plt_memzone *mz;
+   uint16_t layer_id;
+   int ret;
+
+   cnxk_mldev = (struct cnxk_ml_dev *)device;
+   if (cnxk_mldev == NULL) {
+   plt_err("Invalid device = %p", device);
+   return -EINVAL;
+   }
+
+   model = cnxk_mldev->mldev->data->models[model_id];
+   if (model == NULL) {
+   plt_err("Invalid model_id = %u", model_id);
+   return -EINVAL;
+   }
+
+   ret = cn10k_ml_model_get_layer_id(model, layer_name, &layer_id);
+   if (ret != 0)
+   return ret;
+
+   sprintf(str, "cn10k_ml_io_mz_%u_%u", model_id, layer_id);
+   mz = plt_memzone_lookup(str);
+   if (mz == NULL) {
+   plt_err("io_free failed: Memzone not found: model_id = %u, 
layer_name = %s",
+   model_id, layer_name);
+   return -EINVAL;
+   }
+
+   return plt_memzone_free(mz);
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 045e2e6cd2..9c41c1c0b0 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -329,6 +329,9 @@ int cn10k_ml_layer_load(void *device, uint16_t model_id, 
const char *layer_name,
 int cn10k_ml_layer_unload(void *device, uint16_t model_id, const char 
*layer_name);
 int cn10k_ml_layer_start(void *device, uint16_t model_id, const char 
*layer_name);
 int cn10k_ml_layer_stop(void *device, uint16_t model_id, const char 
*layer_name);
+int cn10k_ml_io_alloc(void *device, uint16_t model_id, const char *layer_name,
+ uint64_t **input_qbuffer, uint64_t **output_qbuffer);
+int cn10k_ml_io_free(void *device, uint16_t model_id, const char *layer_name);
 
 /* xstats ops */
 void cn10k_ml_xstat_model_name_set(struct cnxk_ml_dev *cnxk_mldev, struct 
cnxk_ml_model *model,
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 832837034b..77c2b5bcdc 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -232,6 +232,8 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct 
rte_ml_model_params *
callback = &model->mvtvm.cb;
callback->tvmrt_glow_layer_load = cn10k_ml_layer_load;
callback->tvmrt_glow_layer_unload = cn10k_ml_layer_unload;
+   callback->tvmrt_io_alloc = cn10k_ml_io_alloc;
+   callback->tvmrt_io_free = cn10k_ml_io_free;
} else {
ca

[PATCH v6 33/34] ml/cnxk: enable fast-path ops for TVM models

2023-10-18 Thread Srikanth Yalavarthi
From: Anup Prabhu 

Enable fast-path ops support for TVM models. Models would
use TVMDP library function calls to execute inference
operations for Hybrid and LLVM model sub-types.

For TVM MRVL model subtypes that have a single MRVL layer,
the inference requests are directly enqueued to hardware
by the driver.

Signed-off-by: Anup Prabhu 
Signed-off-by: Srikanth Yalavarthi 
---
 doc/guides/rel_notes/release_23_11.rst |   3 +
 drivers/ml/cnxk/cn10k_ml_ops.c |   4 -
 drivers/ml/cnxk/cnxk_ml_io.h   |   6 ++
 drivers/ml/cnxk/cnxk_ml_ops.c  |   4 +
 drivers/ml/cnxk/cnxk_ml_ops.h  |   5 +
 drivers/ml/cnxk/mvtvm_ml_model.c   |  20 
 drivers/ml/cnxk/mvtvm_ml_model.h   |   6 ++
 drivers/ml/cnxk/mvtvm_ml_ops.c | 124 +
 drivers/ml/cnxk/mvtvm_ml_ops.h |  43 +
 9 files changed, 211 insertions(+), 4 deletions(-)

diff --git a/doc/guides/rel_notes/release_23_11.rst 
b/doc/guides/rel_notes/release_23_11.rst
index 0a6fc76a9d..5fcf2a1897 100644
--- a/doc/guides/rel_notes/release_23_11.rst
+++ b/doc/guides/rel_notes/release_23_11.rst
@@ -243,6 +243,9 @@ New Features
   Added dispatcher library which purpose is to help decouple different
   parts (modules) of an eventdev-based application.
 
+* **Updated Marvell cnxk mldev driver.**
+
+  * Added support for models compiled using TVM framework.
 
 Removed Items
 -
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 01b0a44caa..b9d30278c6 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -371,10 +371,6 @@ cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, 
const struct rte_ml_dev_c
else
cn10k_mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_lf;
 
-   cnxk_mldev->mldev->enqueue_burst = cnxk_ml_enqueue_burst;
-   cnxk_mldev->mldev->dequeue_burst = cnxk_ml_dequeue_burst;
-   cnxk_mldev->mldev->op_error_get = cn10k_ml_op_error_get;
-
return 0;
 }
 
diff --git a/drivers/ml/cnxk/cnxk_ml_io.h b/drivers/ml/cnxk/cnxk_ml_io.h
index 5de166c252..6d5d25a7c9 100644
--- a/drivers/ml/cnxk/cnxk_ml_io.h
+++ b/drivers/ml/cnxk/cnxk_ml_io.h
@@ -47,6 +47,12 @@ struct cnxk_ml_io {
 
/* Scale */
float scale;
+
+   /* Dequantized offset */
+   uint32_t offset_d;
+
+   /* Quantized offset */
+   uint32_t offset_q;
 };
 
 /* Model / Layer IO structure */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index fd2c46ac1f..608e9fc4ca 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -632,6 +632,10 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct 
rte_ml_dev_config *co
cnxk_mldev->max_nb_layers =

cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_models;
 
+   cnxk_mldev->mldev->enqueue_burst = cnxk_ml_enqueue_burst;
+   cnxk_mldev->mldev->dequeue_burst = cnxk_ml_dequeue_burst;
+   cnxk_mldev->mldev->op_error_get = cn10k_ml_op_error_get;
+
/* Allocate and initialize index_map */
if (cnxk_mldev->index_map == NULL) {
cnxk_mldev->index_map =
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index ab32676b3e..7b49793a57 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -24,6 +24,11 @@ struct cnxk_ml_req {
union {
/* CN10K */
struct cn10k_ml_req cn10k_req;
+
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+   /* MVTVM */
+   struct mvtvm_ml_req mvtvm_req;
+#endif
};
 
/* Address of status field */
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
index 4c12f584d5..1dfd0d176a 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.c
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -198,6 +198,16 @@ mvtvm_ml_model_io_info_set(struct cnxk_ml_model *model)
model->mvtvm.info.total_input_sz_d += 
model->mvtvm.info.input[i].sz_d;
model->mvtvm.info.total_input_sz_q += 
model->mvtvm.info.input[i].sz_q;
 
+   model->mvtvm.info.input[i].offset_d = 
model->mvtvm.info.total_input_sz_d;
+   model->mvtvm.info.input[i].offset_q = 
model->mvtvm.info.total_input_sz_q;
+
+   model->mvtvm.input_tensor[i].device = metadata->input[i].device;
+   model->mvtvm.input_tensor[i].ndim = metadata->input[i].ndim;
+   model->mvtvm.input_tensor[i].dtype = 
metadata->input[i].datatype;
+   model->mvtvm.input_tensor[i].shape = metadata->input[i].shape;
+   model->mvtvm.input_tensor[i].strides = NULL;
+   model->mvtvm.input_tensor[i].byte_offset = 
model->mvtvm.info.input[i].offset_q;
+
plt_ml_dbg("model_id = %u, input[%u] - sz_d = %u sz_q = %u", 
model->model_id, i,
   model->mvtvm.info.input[i].sz_d, 
model->mvtvm.info.input[i].sz_q);
}
@@ -231,6 

[PATCH v6 34/34] ml/cnxk: enable creation of mvtvm virtual device

2023-10-18 Thread Srikanth Yalavarthi
Enable support to create a mvtvm virtual device on system's
without a PCI based ML HW accelerator.

Signed-off-by: Srikanth Yalavarthi 
---
 doc/guides/mldevs/cnxk.rst   |  49 +++-
 drivers/ml/cnxk/cn10k_ml_dev.c   |   8 ++
 drivers/ml/cnxk/cn10k_ml_dev.h   |   3 +
 drivers/ml/cnxk/cnxk_ml_dev.c|   3 +
 drivers/ml/cnxk/cnxk_ml_dev.h|  21 
 drivers/ml/cnxk/cnxk_ml_ops.c|  82 +
 drivers/ml/cnxk/meson.build  |   2 +
 drivers/ml/cnxk/mvtvm_ml_dev.c   | 196 +++
 drivers/ml/cnxk/mvtvm_ml_dev.h   |  40 +++
 drivers/ml/cnxk/mvtvm_ml_ops.c   |  31 +
 drivers/ml/cnxk/mvtvm_ml_ops.h   |   2 +
 drivers/ml/cnxk/mvtvm_ml_stubs.c |  18 +++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |   2 +
 13 files changed, 433 insertions(+), 24 deletions(-)
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_dev.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_dev.h

diff --git a/doc/guides/mldevs/cnxk.rst b/doc/guides/mldevs/cnxk.rst
index ef2b5d4581..1d7f63993b 100644
--- a/doc/guides/mldevs/cnxk.rst
+++ b/doc/guides/mldevs/cnxk.rst
@@ -148,6 +148,22 @@ Bind the ML PF device to the vfio_pci driver:
usertools/dpdk-devbind.py -u :00:10.0
usertools/dpdk-devbind.py -b vfio-pci :00:10.0
 
+VDEV support
+
+
+On platforms which don't support ML hardware acceleration through PCI device, 
the
+Marvell ML CNXK PMD can execute inference operations on a vdev with the ML 
models
+compiled using Apache TVM framework.
+
+VDEV can be enabled by passing the EAL arguments
+
+.. code-block:: console
+
+   --vdev ml_mvtvm
+
+VDEV can also be used on platforms with ML HW accelerator. However use of VDEV 
and
+PCI HW accelerator is mutually exclusive.
+
 
 Runtime Config Options
 --
@@ -158,6 +174,8 @@ Runtime Config Options
   The parameter ``fw_path`` can be used by the user
   to load ML firmware from a custom path.
 
+  This option is supported only on PCI HW accelerator.
+
   For example::
 
  -a :00:10.0,fw_path="/home/user/ml_fw.bin"
@@ -173,6 +191,8 @@ Runtime Config Options
   When enabled, firmware would mask the DPE non-fatal hardware errors as 
warnings.
   The parameter ``enable_dpe_warnings`` is used fo this configuration.
 
+  This option is supported only on PCI HW accelerator.
+
   For example::
 
  -a :00:10.0,enable_dpe_warnings=0
@@ -189,11 +209,19 @@ Runtime Config Options
   Caching of model data improves the inferencing throughput / latency for the 
model.
   The parameter ``cache_model_data`` is used to enable data caching.
 
+  This option is supported on PCI HW accelerator and vdev.
+
   For example::
 
  -a :00:10.0,cache_model_data=0
 
-  With the above configuration, model data caching is disabled.
+  With the above configuration, model data caching is disabled on HW 
accelerator.
+
+  For example::
+
+ --vdev ml_mvtvm,cache_model_data=0
+
+  With the above configuration, model data caching is disabled on vdev.
 
 
 **OCM allocation mode** (default ``lowest``)
@@ -209,6 +237,8 @@ Runtime Config Options
   ``largest``
 Allocate OCM for the model from the slot with largest amount of free space.
 
+  This option is supported only on PCI HW accelerator.
+
   For example::
 
  -a :00:10.0,ocm_alloc_mode=lowest
@@ -226,6 +256,8 @@ Runtime Config Options
   Supported page sizes by the driver are 1 KB, 2 KB, 4 KB, 8 KB and 16 KB.
   Default page size is 16 KB.
 
+  This option is supported only on PCI HW accelerator.
+
   For example::
 
  -a :00:10.0,ocm_page_size=8192
@@ -250,6 +282,8 @@ Runtime Config Options
 Enabling spinlock version would disable restrictions on the number of 
queue-pairs
 that can be supported by the driver.
 
+   This option is supported only on PCI HW accelerator.
+
   For example::
 
  -a :00:10.0,hw_queue_lock=1
@@ -258,6 +292,19 @@ Runtime Config Options
   in the fast path enqueue burst operation.
 
 
+**Maximum queue pairs** (default ``1``)
+
+  VDEV supports additional EAL arguments to configure the maximum number of
+  queue-pairs on the ML device through the option ``max_qps``.
+
+  This option is supported only on vdev.
+
+  For example::
+
+ --vdev ml_mvtvm,max_qps=4
+
+  With the above configuration, 4 queue-pairs are created on the vdev.
+
 Debugging Options
 -
 
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 91813e9d0a..caa13ba08c 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -309,6 +309,12 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_de
 
PLT_SET_USED(pci_drv);
 
+   if (cnxk_ml_dev_initialized == 1) {
+   plt_err("ML CNXK device already initialized!");
+   plt_err("Cannot initialize CN10K PCI dev");
+   rte_exit(-EINVAL, "Invalid EAL arguments ");
+   }
+
init_params = (struct rte_ml_dev_pmd_init_params){
.socket_id = rte_so

Re: [PATCH v6 00/34] Implementation of revised ml/cnxk driver

2023-10-18 Thread Jerin Jacob
On Wed, Oct 18, 2023 at 7:24 PM Srikanth Yalavarthi
 wrote:
>
> This patch series is an implementation of revised ml/cnxk driver
> to support models compiled with TVM compiler framework. TVM models
> use a hybrid mode for execution, with regions of the model executing
> on the ML accelerator and the rest executing on CPU cores.
>
> This series of commits reorganizes the ml/cnxk driver and adds support
> to execute multiple regions with-in a TVM model.
>

Fix this warning

### [PATCH] ml/cnxk: enable creation of mvtvm virtual device

Warning in drivers/ml/cnxk/cn10k_ml_dev.c:
Using rte_panic/rte_exit

Fix as needed which is relevent
### [PATCH] ml/cnxk: add generic cnxk device structure

WARNING:STRNCPY: Prefer strscpy, strscpy_pad, or __nonstring over
strncpy - see: https://github.com/KSPP/linux/issues/90
#1778: FILE: drivers/ml/cnxk/cn10k_ml_ops.c:1316:
+   strncpy(xstats_map[idx].name,
cn10k_mldev->xstats.entries[i].map.name,

total: 0 errors, 1 warnings, 2276 lines checked

### [PATCH] ml/cnxk: add generic model and layer structures

WARNING:STRNCPY: Prefer strscpy, strscpy_pad, or __nonstring over
strncpy - see: https://github.com/KSPP/linux/issues/90
#117: FILE: drivers/ml/cnxk/cn10k_ml_model.c:379:
+   strncpy(layer->info.input[i].name, (char
*)metadata->input1[i].input_name,

WARNING:STRNCPY: Prefer strscpy, strscpy_pad, or __nonstring over
strncpy - see: https://github.com/KSPP/linux/issues/90
#166: FILE: drivers/ml/cnxk/cn10k_ml_model.c:411:
+   strncpy(layer->info.input[i].name, (char
*)metadata->input2[j].input_name,

WARNING:STRNCPY: Prefer strscpy, strscpy_pad, or __nonstring over
strncpy - see: https://github.com/KSPP/linux/issues/90
#221: FILE: drivers/ml/cnxk/cn10k_ml_model.c:449:
+   strncpy(layer->info.output[i].name,

WARNING:STRNCPY: Prefer strscpy, strscpy_pad, or __nonstring over
strncpy - see: https://github.com/KSPP/linux/issues/90
#255: FILE: drivers/ml/cnxk/cn10k_ml_model.c:472:
+   strncpy(layer->info.output[i].name,

total: 0 errors, 4 warnings, 1905 lines checked

### [PATCH] ml/cnxk: update model load and unload functions

WARNING:STRNCPY: Prefer strscpy, strscpy_pad, or __nonstring over
strncpy - see: https://github.com/KSPP/linux/issues/90
#83: FILE: drivers/ml/cnxk/cn10k_ml_model.c:367:
+   strncpy(io_info->input[i].name, (char
*)metadata->input1[i].input_name,

WARNING:STRNCPY: Prefer strscpy, strscpy_pad, or __nonstring over
strncpy - see: https://github.com/KSPP/linux/issues/90
#135: FILE: drivers/ml/cnxk/cn10k_ml_model.c:399:
+   strncpy(io_info->input[i].name, (char
*)metadata->input2[j].input_name,

WARNING:STRNCPY: Prefer strscpy, strscpy_pad, or __nonstring over
strncpy - see: https://github.com/KSPP/linux/issues/90
#204: FILE: drivers/ml/cnxk/cn10k_ml_model.c:437:
+   strncpy(io_info->output[i].name, (char
*)metadata->output1[i].output_name,

WARNING:STRNCPY: Prefer strscpy, strscpy_pad, or __nonstring over
strncpy - see: https://github.com/KSPP/linux/issues/90
#244: FILE: drivers/ml/cnxk/cn10k_ml_model.c:461:
+   strncpy(io_info->output[i].name, (char
*)metadata->output2[j].output_name,

total: 0 errors, 4 warnings, 1094 lines checked

### [PATCH] ml/cnxk: update device and model xstats functions

WARNING:STRNCPY: Prefer strscpy, strscpy_pad, or __nonstring over
strncpy - see: https://github.com/KSPP/linux/issues/90
#1100: FILE: drivers/ml/cnxk/cnxk_ml_ops.c:856:
WARNING:STRNCPY: Prefer strscpy, strscpy_pad, or __nonstring over
strncpy - see: https://github.com/KSPP/linux/issues/90
#1100: FILE: drivers/ml/cnxk/cnxk_ml_ops.c:856:
+   strncpy(xstats_map[idx].name, xs->map.name, RTE_ML_STR_MAX);

total: 0 errors, 1 warnings, 1248 lines checked

### [PATCH] ml/cnxk: fetch layer info and load TVM model

WARNING:STRNCPY: Prefer strscpy, strscpy_pad, or __nonstring over
strncpy - see: https://github.com/KSPP/linux/issues/90
#172: FILE: drivers/ml/cnxk/mvtvm_ml_ops.c:125:
+   strncpy(model->layer[layer_id].name,

total: 0 errors, 1 warnings, 207 lines checked

### [PATCH] ml/cnxk: update internal info for TVM model

WARNING:STRNCPY: Prefer strscpy, strscpy_pad, or __nonstring over
strncpy - see: https://github.com/KSPP/linux/issues/90
#85: FILE: drivers/ml/cnxk/mvtvm_ml_model.c:175:
+   strncpy(model->mvtvm.info.input[i].name,
metadata->input[i].name,

WARNING:STRNCPY: Prefer strscpy, strscpy_pad, or __nonstring over
strncpy - see: https://github.com/KSPP/linux/issues/90
#118: FILE: drivers/ml/cnxk/mvtvm_ml_model.c:208:
+   strncpy(model->mvtvm.info.output[i].name,
metadata->output[i].name,

total: 0 errors, 2 warnings, 173 lines checked

### [PATCH] ml/cnxk: enable reporting model runtime as xstats

WARNING:STRCPY: Prefer strscpy over strcpy - see:
https://github.com/KSPP/linux/issues/88
#113: FILE: drivers/ml/cnxk/cnxk_ml_ops.c:243:
+   strcp

Re: [EXT] [PATCH] ml/cnxk: don't export internal headers

2023-10-18 Thread Jerin Jacob
On Wed, Oct 18, 2023 at 3:03 PM Srikanth Yalavarthi
 wrote:
>
> > -Original Message-
> > From: David Marchand 
> > Sent: 18 October 2023 14:46
> > To: dev@dpdk.org
> > Cc: Jerin Jacob Kollanukkaran ; tho...@monjalon.net;
> > Srikanth Yalavarthi ; Prince Takkar
> > 
> > Subject: [EXT] [PATCH] ml/cnxk: don't export internal headers
> >
> > External Email
> >
> > --
> > driver_sdk_headers is used to expose headers that may be used by external
> > drivers.
> > Don't export ml/cnxk internal headers.
> >
> > Fixes: fe83ffd9ec2e ("ml/cnxk: add skeleton")
> >
> > Signed-off-by: David Marchand 
>
> Acked-by: Srikanth Yalavarthi 

Applied to dpdk-next-net-mrvl/for-next-net. Thanks


[PATCH v2] add CREDITS file

2023-10-18 Thread Stephen Hemminger
Add a credits file of past contributors to DPDK.
There are obviously more names that should be added but
lets start this with Venky.

Signed-off-by: Stephen Hemminger 
Acked-by: Jerin Jacob 
---
v2 - reword opening, fix spelling

 CREDITS | 9 +
 1 file changed, 9 insertions(+)
 create mode 100644 CREDITS

diff --git a/CREDITS b/CREDITS
new file mode 100644
index ..7c068d91bf57
--- /dev/null
+++ b/CREDITS
@@ -0,0 +1,9 @@
+This file lists some of the significant past contributors to the DPDK.
+It is sorted by name and formatted to allow easy searching use by scripts.
+The fields are: name (N), email (E), web-address (W), and description (D).
+
+--
+
+N: Venky Venkatesan
+D: Original founder of DPDK project
+W: https://www.dpdk.org/about/venky/
-- 
2.39.2



Re: [PATCH v2 02/29] cmdline: make experimental API's stable

2023-10-18 Thread Bruce Richardson
On Tue, Aug 08, 2023 at 05:09:50PM -0700, Stephen Hemminger wrote:
> These API's have all ben around for several releases.
> 
> Signed-off-by: Stephen Hemminger 
> ---

Acked-by: Bruce Richardson 



Re: [PATCH v2 10/29] mbuf: remove experimental from create_extbuf

2023-10-18 Thread Bruce Richardson
On Tue, Aug 08, 2023 at 05:09:58PM -0700, Stephen Hemminger wrote:
> This API was added in 2020 and should no longer be experimental.
> 
> Signed-off-by: Stephen Hemminger 
> ---
Acked-by: Bruce Richardson 


Re: [PATCH v2 14/29] dmadev: mark API's as not experimental

2023-10-18 Thread Bruce Richardson
On Tue, Aug 08, 2023 at 05:10:02PM -0700, Stephen Hemminger wrote:
> These were added in 20.11 time now to remove experimental flag.
> 
> Signed-off-by: Stephen Hemminger 
> ---
Acked-by: Bruce Richardson 


Re: [PATCH v2 15/29] meter: remove experimental warning from comments

2023-10-18 Thread Bruce Richardson
On Tue, Aug 08, 2023 at 05:10:03PM -0700, Stephen Hemminger wrote:
> The API's for rte_meter_trtcm were never properly flagged
> as experimental; missing __rte_experimental but there was
> an experimental comment in the docbook comment.
> Remove the comment.
> 
> Signed-off-by: Stephen Hemminger 
> ---
Acked-by: Bruce Richardson 


Re: [PATCH v2 17/29] kvargs: remove experimental flag

2023-10-18 Thread Bruce Richardson
On Tue, Aug 08, 2023 at 05:10:05PM -0700, Stephen Hemminger wrote:
> The function rte_kvargs_get_with_value was added in 21.11
> so experimental flag can be removed.
> 
> Signed-off-by: Stephen Hemminger 
> ---
Acked-by: Bruce Richardson 


Re: [PATCH v6 18/34] ml/cnxk: support config and close of tvmdp library

2023-10-18 Thread Jerin Jacob
On Wed, Oct 18, 2023 at 7:52 PM Srikanth Yalavarthi
 wrote:
>
> Added support to configure and close TVMDP library based
> on ML device configuration options.
>
> Updated meson build to enable Jansson, TVM runtime, TVMDP
> library as build dependencies.
>
> Signed-off-by: Srikanth Yalavarthi 
> ---

>
> +Compilation Prerequisites
> +-
> +
> +This driver requires external libraries to optionally enable support for
> +models compiled using Apache TVM framework. The following dependencies are
> +not part of DPDK and must be installed separately:
> +
> +- **Jansson**
> +
> +  This library enables support to parse and read JSON files.
> +
> +- **DLPack**
> +
> +  This library provides headers for open in-memory tensor structures.
> +
> +.. note::
> +
> +DPDK CNXK ML driver requires DLPack version 0.7
> +
> +.. code-block:: console


Please add sections for cross and native.

> +git clone https://github.com/dmlc/dlpack.git
> +cd dlpack
> +git checkout v0.7 -b v0.7
> +cmake -S ./ -B build \
> +  -DCMAKE_C_COMPILER=aarch64-linux-gnu-gcc \
> +  -DCMAKE_CXX_COMPILER=aarch64-linux-gnu-g++ \
> +  -DBUILD_MOCK=OFF
> +make -C build
> +make -C build install
> +
> +- **TVM**
> +
> +  Apache TVM provides a runtime library (libtvm_runtime) used to execute
> +  models on CPU cores or hardware accelerators.
> +
> +.. note::
> +
> +DPDK CNXK ML driver requires TVM version 0.10.0
> +
> +.. code-block:: console
> +
> +git clone https://github.com/apache/tvm.git

I need to use --recursive to avoid
CMake Error at /usr/share/cmake/Modules/ExternalProject.cmake:3176 (message):
  No download info given for 'project_libbacktrace' and its source directory:


> +cd tvm
> +git checkout v0.10.0 -b v0.10.0
> +cmake -S ./ -B build \
> +  -DCMAKE_C_COMPILER=aarch64-linux-gnu-gcc \
> +  -DCMAKE_CXX_COMPILER=aarch64-linux-gnu-g++ \
> +  -DMACHINE_NAME=aarch64-linux-gnu \
> +  -DCMAKE_FIND_ROOT_PATH_MODE_PROGRAM=NEVER \
> +  -DCMAKE_FIND_ROOT_PATH_MODE_LIBRARY=ONLY
> +make -C build
> +make -C build install
> +
> +- **TVMDP**
> +
> +  Marvell's `TVM Dataplane Library 
> `_
> +  works as an interface between TVM runtime and DPDK drivers. TVMDP library
> +  provides a simplified C interface for TVM's runtime based on C++.
> +
> +.. code-block:: console
> +
> +git clone https://github.com/MarvellEmbeddedProcessors/tvmdp.git
> +cd tvmdp
> +git checkout main
> +cmake -S ./ -B build \
> +  -DCMAKE_TOOLCHAIN_FILE=config/toolchains/arm64_linux_gcc.cmake \
> +  -DBUILD_SHARED_LIBS=ON \
> +  -DBUILD_TESTING=OFF

[main]dell[tvmdp] $ cmake -S ./ -B build
-DCMAKE_INSTALL_PREFIX=/export/cross_prefix/prefix
-DCMAKE_TOOLCHAIN_FILE=config/toolchains/arm64_linux_gcc.cmake
-DBUILD_SHARED_LIBS=ON  -DBUILD_TESTING=OFF
-- The CXX compiler identification is GNU 13.2.0
-- The C compiler identification is GNU 13.2.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/aarch64-linux-gnu-g++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/aarch64-linux-gnu-gcc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
CMake Error at CMakeLists.txt:53 (find_package):
  By not providing "Finddmlc.cmake" in CMAKE_MODULE_PATH this project has
  asked CMake to find a package configuration file provided by "dmlc", but
  CMake did not find one.

  Could not find a package configuration file provided by "dmlc" with any of
  the following names:

dmlcConfig.cmake
dmlc-config.cmake

  Add the installation prefix of "dmlc" to CMAKE_PREFIX_PATH or set
  "dmlc_DIR" to a directory containing one of the above files.  If "dmlc"
  provides a separate development package or SDK, be sure it has been
  installed.


-- Configuring incomplete, errors occurred!


> +enable_mvtvm = true
> +
> +if not jansson_dep.found()
> +message('drivers/ml/cnxk: jansson not found')
> +enable_mvtvm = false
> +endif
> +
> +if not cc.check_header('dlpack/dlpack.h')
> +message('drivers/ml/cnxk: dlpack.h not found')
> +enable_mvtvm = false
> +endif
> +
> +tvmrt_lib = cc.find_library('tvm_runtime', required: false)
> +if tvmrt_lib.found()
> +tvmrt_dep = declare_dependency(dependencies: tvmrt_lib)
> +else
> +message('drivers/ml/cnxk: tvm_runtime not found')
> +enable_mvtvm = false
> +endif
> +
> +tvmdp_dep = dependency('tvmdp', required: false)
> +if not tvmdp_dep.found()
> +message('drivers/ml/cnxk: tvmdp not found')
> +enable_mvtvm = false
> +endif
> +
>  sources = files(
>  'cn10k_ml_dev.c',
>  'cn10k_ml_ops.c',
> @@ -21,6 +47,39 @@ sources = files(
>
>  deps += ['mldev', 'common_cnxk', 'kvarg

Re: [PATCH v2 00/29] promote many API's to stable

2023-10-18 Thread David Marchand
Hello Stephen,

On Wed, Aug 9, 2023 at 2:10 AM Stephen Hemminger
 wrote:
>
> Since 23.11 is an LTS release it is time to remove the experimental
> bandaid off many API's. There are about 850 API's marked with experimental
> on current main branch. This addresses the easy to remove ones and
> gets it down to about 690 places.
>
> The rule is any API that has been in since 22.11 needs to have
> experimental removed (or deleted). The experimental flag is
> intended to be temporary not a "get out of ABI stability for free" card.
>
> v2 - add more libraries to the mix
>- remove EXPERIMENTAL where tagged in MAINTAINERS

There were some API updates merged in -rc1.
Could you please rebase this series?

Thanks.

-- 
David Marchand



Re: [PATCH 00/15] eal: mark older API's stable

2023-10-18 Thread David Marchand
On Wed, Aug 9, 2023 at 6:44 PM Stephen Hemminger
 wrote:
>
> About 80 function in EAL were marked experimental
> and should have been made stable by now.
>
> Stephen Hemminger (15):
>   eal: make bitops a stable API
>   eal: mark rte_dev API's as stable
>   eal: make rte_class API's stable
>   eal: make rte_version_XXX API's stable
>   eal: make rte_drand a stable API
>   eal: make rte_service_lcore_may_be_active stable
>   eal: make rte_devargs_reset stable
>   eal: make pflock API stable
>   eal: make seqcount and seqlock stable
>   eal: mark rte_intr_XXX API's as stable
>   eal: mark rte_atomic128_cmp_exchange as stable
>   eal: make most rte_thread API's stable
>   eal: mark rte_power API's stable
>   eal: mark rte_eal_vfio_get_token stable
>   eal: mark rte_vect simd bandwidth API as stable

This needs some rebasing too.


-- 
David Marchand



  1   2   >