On Thu, Jun 17, 2021 at 01:12:22PM +0530, Jerin Jacob wrote: > On Thu, Jun 17, 2021 at 12:43 AM Bruce Richardson > <bruce.richard...@intel.com> wrote: > > > > On Wed, Jun 16, 2021 at 11:38:08PM +0530, Jerin Jacob wrote: > > > On Wed, Jun 16, 2021 at 11:01 PM Bruce Richardson > > > <bruce.richard...@intel.com> wrote: > > > > > > > > On Wed, Jun 16, 2021 at 05:41:45PM +0800, fengchengwen wrote: > > > > > On 2021/6/16 0:38, Bruce Richardson wrote: > > > > > > On Tue, Jun 15, 2021 at 09:22:07PM +0800, Chengwen Feng wrote: > > > > > >> This patch introduces 'dmadevice' which is a generic type of DMA > > > > > >> device. > > > > > >> > > > > > >> The APIs of dmadev library exposes some generic operations which > > > > > >> can > > > > > >> enable configuration and I/O with the DMA devices. > > > > > >> > > > > > >> Signed-off-by: Chengwen Feng <fengcheng...@huawei.com> > > > > > >> --- > > > > > > Thanks for sending this. > > > > > > > > > > > > Of most interest to me right now are the key data-plane APIs. While > > > > > > we are > > > > > > still in the prototyping phase, below is a draft of what we are > > > > > > thinking > > > > > > for the key enqueue/perform_ops/completed_ops APIs. > > > > > > > > > > > > Some key differences I note in below vs your original RFC: > > > > > > * Use of void pointers rather than iova addresses. While using > > > > > > iova's makes > > > > > > sense in the general case when using hardware, in that it can > > > > > > work with > > > > > > both physical addresses and virtual addresses, if we change the > > > > > > APIs to use > > > > > > void pointers instead it will still work for DPDK in VA mode, > > > > > > while at the > > > > > > same time allow use of software fallbacks in error cases, and > > > > > > also a stub > > > > > > driver than uses memcpy in the background. Finally, using iova's > > > > > > makes the > > > > > > APIs a lot more awkward to use with anything but mbufs or similar > > > > > > buffers > > > > > > where we already have a pre-computed physical address. > > > > > > > > > > The iova is an hint to application, and widely used in DPDK. > > > > > If switch to void, how to pass the address (iova or just va ?) > > > > > this may introduce implementation dependencies here. > > > > > > > > > > Or always pass the va, and the driver performs address translation, > > > > > and this > > > > > translation may cost too much cpu I think. > > > > > > > > > > > > > On the latter point, about driver doing address translation I would > > > > agree. > > > > However, we probably need more discussion about the use of iova vs just > > > > virtual addresses. My thinking on this is that if we specify the API > > > > using > > > > iovas it will severely hurt usability of the API, since it forces the > > > > user > > > > to take more inefficient codepaths in a large number of cases. Given a > > > > pointer to the middle of an mbuf, one cannot just pass that straight as > > > > an > > > > iova but must instead do a translation into offset from mbuf pointer and > > > > then readd the offset to the mbuf base address. > > > > > > > > My preference therefore is to require the use of an IOMMU when using a > > > > dmadev, so that it can be a much closer analog of memcpy. Once an iommu > > > > is > > > > present, DPDK will run in VA mode, allowing virtual addresses to our > > > > hugepage memory to be sent directly to hardware. Also, when using > > > > dmadevs on top of an in-kernel driver, that kernel driver may do all > > > > iommu > > > > management for the app, removing further the restrictions on what memory > > > > can be addressed by hardware. > > > > > > > > > One issue of keeping void * is that memory can come from stack or heap . > > > which HW can not really operate it on. > > > > when kernel driver is managing the IOMMU all process memory can be worked > > on, not just hugepage memory, so using iova is wrong in these cases. > > But not for stack and heap memory. Right? > Yes, even stack and heap can be accessed.
> > > > As I previously said, using iova prevents the creation of a pure software > > dummy driver too using memcpy in the background. > > Why ? the memory alloced uing rte_alloc/rte_memzone etc can be touched by CPU. > Yes, but it can't be accessed using physical address, so again only VA mode where iova's are "void *" make sense. > Thinking more, Since anyway, we need a separate function for knowing > the completion status, > I think, it can be an opaque object as the completion code. Exposing > directly the status may not help > . As the driver needs a "context" or "call" to change the > driver-specific completion code to DPDK completion code. > I'm sorry, I didn't follow this. By completion code, you mean the status of whether a copy job succeeded/failed?