On Tue, Jul 06, 2021 at 11:01:17AM +0800, fengchengwen wrote:
> Many thanks, mostly OK, and a few comment inline
> 
> On 2021/7/4 17:30, Jerin Jacob wrote:
> > On Fri, Jul 2, 2021 at 6:51 PM Chengwen Feng <fengcheng...@huawei.com> 
> > wrote:
> >>
> >> This patch introduces 'dmadevice' which is a generic type of DMA
> >> device.
> ...
> >> +#include <rte_compat.h>
> > 
> > Sort in alphabetical order.
> > 
> >> +
> >> +/**
> >> + * dma_cookie_t - an opaque DMA cookie
> > 
> > Since we are defining the behaviour is not opaque any more.
> > I think, it is better to call ring_idx or so.
> > 
> 
> 
> This type is designed to have two meanings, return <0 on failure and return 
> >=0 on success.
> 
> How about follwing definition:
>     typedef int dma_ring_index_t;
> 
> if >= 0, it's value range is [0, 65535] = uint16_t, so driver implementation 
> will simply.
> if <0, then men enqueue failure
> 
> For driver, it could hold uint16_t ring_index, if enquer fail just return 
> fail, else return
> the current ring_index, and update it by: ring_index++;
> 

Well, yes and no on the "two meanings". For the enqueue function, yes the
return value can have two meanings, but I don't consider them one type. On
the completion call, however, this can only be positive values <
UINT16_MAX, so having two meanings is actually confusing. Better to have

* enqueue return regular int, with doxygen comment 
        "@return 
          Negative on error, otherwise job index between 0 and UINT16_MAX"
* for completions, take a uint16_t* parameter for the last completed index
  since no negative values are needed.

Beyond this, we generally don't use typedefs in DPDK for basic types (with
a few exceptions e.g. rte_iova_t), and save their use only for function
pointers.

> >> +
> >> +/**
> >> + * A structure used to retrieve the contextual information of
> >> + * an DMA device
> >> + */
> >> +struct rte_dmadev_info {
> >> +       /**
> >> +        * Fields filled by framewok
> > 
> > typo.
> > 
> >> +        */
> >> +       struct rte_device *device; /**< Generic Device information */
> >> +       const char *driver_name; /**< Device driver name */
> >> +       int socket_id; /**< Socket ID where memory is allocated */
> >> +
> >> +       /**
> >> +        * Specification fields filled by driver
> >> +        */
> >> +       uint64_t dev_capa; /**< Device capabilities (RTE_DMA_DEV_CAPA_) */
> >> +       uint16_t max_hw_queues; /**< Maximum number of HW queues. */
> >> +       uint16_t max_vqs_per_hw_queue;
> >> +       /**< Maximum number of virt queues to allocate per HW queue */
> >> +       uint16_t max_desc;
> >> +       /**< Maximum allowed number of virt queue descriptors */
> >> +       uint16_t min_desc;
> >> +       /**< Minimum allowed number of virt queue descriptors */
> > 
> > Please add max_nb_segs. i.e maximum number of segments supported.
> 
> Do you means something like "burst_size" ?
> 
> > 
> >> +
> >> +       /**
> >> +        * Status fields filled by driver
> >> +        */
> >> +       uint16_t nb_hw_queues; /**< Number of HW queues configured */
> >> +       uint16_t nb_vqs; /**< Number of virt queues configured */
> >> +};
> >> + i
> >> +
> >> +/**
> >> + * dma_address_type
> >> + */
> >> +enum dma_address_type {
> >> +       DMA_ADDRESS_TYPE_IOVA, /**< Use IOVA as dma address */
> >> +       DMA_ADDRESS_TYPE_VA, /**< Use VA as dma address */
> >> +};
> >> +
> >> +/**
> >> + * A structure used to configure a DMA device.
> >> + */
> >> +struct rte_dmadev_conf {
> >> +       enum dma_address_type addr_type; /**< Address type to used */
> > 
> > I think, there are 3 kinds of limitations/capabilities.
> > 
> > When the system is configured as IOVA as VA
> > 1) Device supports any VA address like memory from rte_malloc(),
> > rte_memzone(), malloc, stack memory
> > 2) Device support only VA address from rte_malloc(), rte_memzone() i.e
> > memory backed by hugepage and added to DMA map.
> > 
> > When the system is configured as IOVA as PA
> > 1) Devices support only PA addresses .
> > 
> > IMO, Above needs to be  advertised as capability and application needs
> > to align with that
> > and I dont think application requests the driver to work in any of the 
> > modes.
> > 
> 
> OK, Let's put together our ideas on address type:
> 
> There are three mode, we may define as:
>       IOVA_as_VA-ALL     ---for device which may need support SVA feature
>                            ---may also be a CPU memcpy 'device'
>       IOVA_as_VA         ---for device which need support IOMMU
>       IOVA_as_PA
> 
> There are many combination of the modes which device supports: eg. some device
> may only support IOVA_as_PA, some may only support IOVA_as_VA, and some 
> support
> IOVA_as_PA and IOVA_as_VA. The specific runtime type is determined by the vfio
> and drive capability(e.g RTE_PCI_DRV_NEED_IOVA_AS_VA).
> 
> So we already define two capabilities for this:
>       #define RTE_DMA_DEV_CAPA_IOVA   (1ull << 8) /**< Support IOVA as DMA 
> address */
>                                       ---this cover IOVA_as_VA and IOVA_as_PA
>       #define RTE_DMA_DEV_CAPA_VA     (1ull << 9) /**< Support VA as DMA 
> address */
>                                       ---this cover IOVA_as_VA-ALL
> for a device which don't support SVA:
>       only declare RTE_DMA_DEV_CAPA_IOVA
> for a device which support SVA:
>       delcare RTE_DAMA_DEV_CAPA_IOVA
>       delcare RTE_DMA_DEV_CAPA_VA (only when IOMMU enabled and 'SVA flag' was 
> set)
> for a CPU memcpy device:
>       only declare RTE_DMA_DEV_CAPA_VA
> 
> As application:
> - if RTE_DMA_DEV_CAPA_VA support, then it could pass any va address to the 
> DMA,
> - else if RTE_DMA_DEV_CAPA_IOVA support, then it should pass iova address to 
> the DMA
> - else the DMA device should not exist.
> 

I still don't think we need all of this. DPDK already has support through
the existing bus infrastructure for determining if DPDK needs to use
physical or virtual addresses, so we should not be duplicating that as
devices *cannot* use a different addressing mode to DPDK itself.
Given that, the only flag we need is one to indicate SVA support.

> > 
<snip>
> > 
> > I also think, we may need to set status as bitmask and enumerate all
> > the combination of error codes
> > of all the driver and return string from driver existing rte_flow_error
> > 
> 
> bitmask has limit for most 32 (or we can extend 64), and also the 
> rte_flow_error is
> heavy.
> 
> Considering that errors are a small number of scenarios, so it's OK to
> pass status array, and status have 32bit it could denotes a very large number
> of errcode.
> 

+1 to this.

> >> +
> >> +struct rte_dmadev_stats {
> >> +       uint64_t enqueue_fail_count;
> >> +       /**< Conut of all operations which failed enqueued */
> >> +       uint64_t enqueued_count;
> >> +       /**< Count of all operations which successful enqueued */
> >> +       uint64_t completed_fail_count;
> >> +       /**< Count of all operations which failed to complete */
> >> +       uint64_t completed_count;
> >> +       /**< Count of all operations which successful complete */
> >> +};
> > 
> > We need to have capability API to tell which items are
> > updated/supported by the driver.
> > 
> 
> There are fewer fields, and I don't think it's necessary to add capability 
> API,
> for those who don't support, it could don't implement the callback.
> For those support, these fields are minimum et.
> 
> > 
> >> diff --git a/lib/dmadev/rte_dmadev_core.h b/lib/dmadev/rte_dmadev_core.h
> >> new file mode 100644
> >> index 0000000..a3afea2
> >> --- /dev/null
> >> +++ b/lib/dmadev/rte_dmadev_core.h
> >> @@ -0,0 +1,98 @@
> >> +/* SPDX-License-Identifier: BSD-3-Clause
> >> + * Copyright 2021 HiSilicon Limited.
> >> + */
> >> +
> >> +#ifndef _RTE_DMADEV_CORE_H_
> >> +#define _RTE_DMADEV_CORE_H_
> >> +
> >> +/**
> >> + * @file
> >> + *
> >> + * RTE DMA Device internal header.
> >> + *
> >> + * This header contains internal data types. But they are still part of 
> >> the
> >> + * public API because they are used by inline public functions.
> >> + */
> >> +
> >> +struct rte_dmadev;
> >> +
> >> +typedef dma_cookie_t (*dmadev_copy_t)(struct rte_dmadev *dev, uint16_t 
> >> vq_id,
> >> +                                     void *src, void *dst,
> >> +                                     uint32_t length, uint64_t flags);
> >> +/**< @internal Function used to enqueue a copy operation. */
> > 
> > To avoid namespace conflict(as it is public API) use rte_
> 
> These are internal function used by driver, not application.
> and the eth/regexdev_core also defined without rte_
> 
> So I think it should remain as it is.
> 

Even if only used by a driver, APIs are exported from the .so built for
the library, which means that they become public for apps using the lib.
Even for header-only symbols for drivers, it's good practice to put the
prefix since they are for use outside the compilation unit.

> > 
> > 
> >> +
> >> +/**
> >> + * The data structure associated with each DMA device.
> >> + */
> >> +struct rte_dmadev {
> >> +       /**< Enqueue a copy operation onto the DMA device. */
> >> +       dmadev_copy_t copy;
> >> +       /**< Enqueue a scatter list copy operation onto the DMA device. */
> >> +       dmadev_copy_sg_t copy_sg;
> >> +       /**< Enqueue a fill operation onto the DMA device. */
> >> +       dmadev_fill_t fill;
> >> +       /**< Enqueue a scatter list fill operation onto the DMA device. */
> >> +       dmadev_fill_sg_t fill_sg;
> >> +       /**< Add a fence to force ordering between operations. */
> >> +       dmadev_fence_t fence;
> >> +       /**< Trigger hardware to begin performing enqueued operations. */
> >> +       dmadev_perform_t perform;
> >> +       /**< Returns the number of operations that successful completed. */
> >> +       dmadev_completed_t completed;
> >> +       /**< Returns the number of operations that failed to complete. */
> >> +       dmadev_completed_fails_t completed_fails;
> > 
> > We need to limit fastpath items in 1 CL
> 
> yes, currently there are 8 callback, which just fill one cache line.
> 

Before we get overly concerned about this, I think we should benchmark it
to see how much our "one cacheline" is giving us compared to having them in
ops. For example, the "perform" doorbell function, or the completed
function is only called once every burst, so it would be interesting to see
how much difference it really makes for that.

/Bruce

Reply via email to