On Wed, Jun 23, 2021 at 9:00 AM fengchengwen <fengcheng...@huawei.com> wrote: >
> >>> > >>>> > >>>>> The above will give better performance and is the best trade-off c > >>>>> between performance and per transfer variables. > >>>> > >>>> We may need to have different APIs for context-aware and context-unaware > >>>> processing, with which to use determined by the capabilities discovery. > >>>> Given that for these DMA devices the offload cost is critical, more so > >>>> than > >>>> any other dev class I've looked at before, I'd like to avoid having APIs > >>>> with extra parameters than need to be passed about since that just adds > >>>> extra CPU cycles to the offload. > >>> > >>> If driver does not support additional attributes and/or the > >>> application does not need it, rte_dmadev_desc_t can be NULL. > >>> So that it won't have any cost in the datapath. I think, we can go to > >>> different API > >>> cases if we can not abstract problems without performance impact. > >>> Otherwise, it will be too much > >>> pain for applications. > >> > >> Yes, currently we plan to use different API for different case, e.g. > >> rte_dmadev_memcpy() -- deal with local to local memcopy > >> rte_dmadev_memset() -- deal with fill with local memory with pattern > >> maybe: > >> rte_dmadev_imm_data() --deal with copy very little data > >> rte_dmadev_p2pcopy() --deal with peer-to-peer copy of diffenet PCIE > >> addr > >> > >> These API capabilities will be reflected in the device capability set so > >> that > >> application could know by standard API. > > > > > > There will be a lot of combination of that it will be like M x N cross > > base case, It won't scale. > > Currently, it is hard to define generic dma descriptor, I think the > well-defined > APIs is feasible. I would like to understand why not feasible? if we move the preparation to the slow path. i.e struct rte_dmadev_desc defines all the "attributes" of all DMA devices available using capability. I believe with the scheme, we can scale and incorporate all features of all DMA HW without any performance impact. something like: struct rte_dmadev_desc { /* Attributes all DMA transfer available for all HW under capability. */ channel or port; ops ; // copy, fill etc.. /* impemention opqueue memory as zero length array, rte_dmadev_desc_prep() update this memory with HW specific information */ uint8_t impl_opq[]; } // allocate the memory for dma decriptor struct rte_dmadev_desc *rte_dmadev_desc_alloc(devid); // Convert DPDK specific descriptors to HW specific descriptors in slowpath */ rte_dmadev_desc_prep(devid, struct rte_dmadev_desc *desc); // Free dma descriptor memory rte_dmadev_desc_free(devid, struct rte_dmadev_desc *desc ) The above calls in slow path. Only below call in fastpath. // Here desc can be NULL(in case you don't need any specific attribute attached to transfer, if needed, it can be an object which is gone through rte_dmadev_desc_prep()) rte_dmadev_enq(devid, struct rte_dmadev_desc *desc, void *src, void *dest, unsigned int len, cookie) > > > > >> > >>> > >>> Just to understand, I think, we need to HW capabilities and how to > >>> have a common API. > >>> I assume HW will have some HW JOB descriptors which will be filled in > >>> SW and submitted to HW. > >>> In our HW, Job descriptor has the following main elements > >>> > >>> - Channel // We don't expect the application to change per transfer > >>> - Source address - It can be scatter-gather too - Will be changed per > >>> transfer > >>> - Destination address - It can be scatter-gather too - Will be changed > >>> per transfer > >>> - Transfer Length - - It can be scatter-gather too - Will be changed > >>> per transfer > >>> - IOVA address where HW post Job completion status PER Job descriptor > >>> - Will be changed per transfer > >>> - Another sideband information related to channel // We don't expect > >>> the application to change per transfer > >>> - As an option, Job completion can be posted as an event to > >>> rte_event_queue too // We don't expect the application to change per > >>> transfer > >> > >> The 'option' field looks like a software interface field, but not HW > >> descriptor. > > > > It is in HW descriptor. > > The HW is interesting, something like: DMA could send completion direct to > EventHWQueue, > the DMA and EventHWQueue are link in the hardware range, rather than by > software. Yes. > > Could you provide public driver of this HW ? So we could know more about it's > working > mechanism and software-hardware collaboration. http://code.dpdk.org/dpdk/v21.05/source/drivers/raw/octeontx2_dma/otx2_dpi_rawdev.h#L149 is the DMA instruction header.