On Mon, Jul 12, 2021 at 10:30 PM Bruce Richardson <bruce.richard...@intel.com> wrote: > > On Mon, Jul 12, 2021 at 10:04:07PM +0530, Jerin Jacob wrote: > > On Mon, Jul 12, 2021 at 7:02 PM Bruce Richardson > > <bruce.richard...@intel.com> wrote: > > > > > > On Mon, Jul 12, 2021 at 03:29:27PM +0530, Jerin Jacob wrote: > > > > On Sun, Jul 11, 2021 at 2:59 PM Chengwen Feng <fengcheng...@huawei.com> > > > > wrote: > > > > > > > > > > This patch introduce 'dmadevice' which is a generic type of DMA > > > > > device. > > > > > > > > > > The APIs of dmadev library exposes some generic operations which can > > > > > enable configuration and I/O with the DMA devices. > > > > > > > > > > Signed-off-by: Chengwen Feng <fengcheng...@huawei.com> > > > > > --- > > > > > +/** > > > > > + * @warning > > > > > + * @b EXPERIMENTAL: this API may change without prior notice. > > > > > + * > > > > > + * Enqueue a fill operation onto the virtual DMA channel. > > > > > + * > > > > > + * This queues up a fill operation to be performed by hardware, but > > > > > does not > > > > > + * trigger hardware to begin that operation. > > > > > + * > > > > > + * @param dev_id > > > > > + * The identifier of the device. > > > > > + * @param vchan > > > > > + * The identifier of virtual DMA channel. > > > > > + * @param pattern > > > > > + * The pattern to populate the destination buffer with. > > > > > + * @param dst > > > > > + * The address of the destination buffer. > > > > > + * @param length > > > > > + * The length of the destination buffer. > > > > > + * @param flags > > > > > + * An flags for this operation. > > > > > + * > > > > > + * @return > > > > > + * - 0..UINT16_MAX: index of enqueued copy job. > > > > > > > > fill job > > > > > > > > > + * - <0: Error code returned by the driver copy function. > > > > > + */ > > > > > +__rte_experimental > > > > > +static inline int > > > > > +rte_dmadev_fill(uint16_t dev_id, uint16_t vchan, uint64_t pattern, > > > > > + rte_iova_t dst, uint32_t length, uint64_t flags) > > > > > +{ > > > > > + struct rte_dmadev *dev = &rte_dmadevices[dev_id]; > > > > > +#ifdef RTE_DMADEV_DEBUG > > > > > + RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL); > > > > > + RTE_FUNC_PTR_OR_ERR_RET(*dev->fill, -ENOTSUP); > > > > > + if (vchan >= dev->data->dev_conf.max_vchans) { > > > > > + RTE_DMADEV_LOG(ERR, "Invalid vchan %d\n", vchan); > > > > > + return -EINVAL; > > > > > + } > > > > > +#endif > > > > > + return (*dev->fill)(dev, vchan, pattern, dst, length, flags); > > > > > +} > > > > > + > > > > > > > > > +/** > > > > > + * @warning > > > > > + * @b EXPERIMENTAL: this API may change without prior notice. > > > > > + * > > > > > + * Returns the number of operations that have been successfully > > > > > completed. > > > > > + * > > > > > + * @param dev_id > > > > > + * The identifier of the device. > > > > > + * @param vchan > > > > > + * The identifier of virtual DMA channel. > > > > > + * @param nb_cpls > > > > > + * The maximum number of completed operations that can be > > > > > processed. > > > > > + * @param[out] last_idx > > > > > + * The last completed operation's index. > > > > > + * If not required, NULL can be passed in. > > > > > > > > This means the driver will be tracking the last index. > > > > > > > > > > Driver will be doing this anyway, no, since it needs to ensure we don't > > > > Yes. > > > > > wrap around? > > > > > > > > > > > Is that mean, the application needs to call this API periodically to > > > > consume the completion slot. > > > > I.e up to 64K (UINT16_MAX) outstanding jobs are possible. If the > > > > application fails to call this > > > > >64K outstand job then the subsequence enqueue will fail. > > > > > > Well, given that there will be a regular enqueue ring which will probably > > > be <= 64k in size, the completion call will need to be called frequently > > > anyway. I don't think we need to document this restriction as it's fairly > > > understood that you can't go beyond the size of the ring without cleanup. > > > > > > See below. > > > > > > > > > > > > > If so, we need to document this. > > > > > > > > One of the concerns of keeping UINT16_MAX as the limit is the > > > > completion memory will always not in cache. > > > > On the other hand, if we make this size programmable. it may introduce > > > > complexity in the application. > > > > > > > > Thoughts? > > > > > > The reason for using powers-of-2 sizes, e.g. 0 .. UINT16_MAX, is that the > > > ring can be any other power-of-2 size and we can index it just by masking. > > > In the sample app for dmadev, I expect the ring size used to be set the > > > same as the dmadev enqueue ring size, for simplicity. > > > > No question on not using power of 2. Aligned on that. > > > > At least in our HW, the size of the ring is rte_dmadev_vchan_conf::nb_desc. > > But completion happens in _different_ memory space. Currently, we are > > allocating > > UINT16_MAX entries to hold that. That's where cache miss aspects of > > completion aspects > > came. > > Depending on HW, our completions can be written back to a separate memory > area - a completion ring, if you will - but I've generally found it works > as well to reuse the enqueue ring for that purpose. However, with a > separate memory area for completions, why do you need to allocate 64K > entries for the completions? Would nb_desc entries not be enough? Is that > to allow the user to have more than nb_desc jobs outstanding before calling > "get_completions" API?
Yes. That's what I thought. Thats where my question on what is the max number of outstanding completions. I thought it can be up to 64K. Agree to keep it implementation-specific and not need to highlight this in the documentation. > > > > > In your case, Is completion happens in the same ring memory(looks like > > one bit in job desc represents the job completed or not) ? > > And when application calls rte_dmadev_completed(), You are converting > > UINT16_MAX based index to > > rte_dmadev_vchan_conf::nb_desc. Right? > > Yes, we are masking to do that. Actually, for simplicity and perf we should > only allow power-of-2 ring sizes. Having to use modulus instead of masking > could be a problem. [Alternatively, I suppose we can allow drivers to round > up the ring sizes to the next power of 2, but I prefer just documenting it > as a limitation]. OK. > > /Bruce