On Fri, 2025-09-26 at 13:00 -0300, Jason Gunthorpe wrote:
> On Fri, Sep 26, 2025 at 04:51:29PM +0200, Christian König wrote:
> > On 26.09.25 16:41, Jason Gunthorpe wrote:
> > > On Fri, Sep 26, 2025 at 03:51:21PM +0200, Thomas Hellström wrote:
> > > 
> > > > Well both exporter and exporter has specific information WRT
> > > > this. The
> > > > ultimate decision is done in the exporter attach() callback,
> > > > just like
> > > > pcie_p2p. And the exporter acknowledges that by setting the
> > > > dma_buf_attachment::interconnect_attach field. In analogy with
> > > > the
> > > > dma_buf_attachment::peer2peer member.
> > > 
> > > Having a single option seems too limited to me..
> > 
> > Yeah, agree.
> > 
> > > I think it would be nice if the importer could supply a list of
> > > 'interconnects' it can accept, eg:
> > > 
> > >  - VRAM offset within this specific VRAM memory
> > >  - dma_addr_t for this struct device
> > >  - "IOVA" for this initiator on a private interconnect
> > >  - PCI bar slice
> > >  - phys_addr_t (used between vfio, kvm, iommufd)
> > 
> > I would rather say that the exporter should provide the list of
> > what
> > interconnects the buffer might be accessible through.
> 
> Either direction works, I sketched it like this because I thought
> there were more importers than exporters, and in the flow it is easy
> for the importer to provide a list on the stack
> 
> I didn't sketch further, but I think the exporter and importer should
> both be providing a compatible list and then in almost all cases the
> core code should do the matching.
> 
> If the importer works as I showed, then the exporter version would be
> in an op:
> 
> int exporter_negotiate_op(struct dma_buf *dmabuf,
>    struct dma_buf_interconnect_negotiation *importer_support, size_t
> importer_len)
> {
>      struct dma_buf_interconnect_negotiation exporter_support[2] = {
>          [0] = {.interconnect = myself->xe_vram},
>          [1] = {.interconnect = &dmabuf_generic_dma_addr_t,
> .interconnect_args = exporter_dev},
>      };
>      return dma_buf_helper_negotiate(dmabuf, exporter_support,
>              ARRAY_SIZE(exporter_support), importer_support,
> importer_len);
> }
> 
> Which the dma_buf_negotiate() calls.
> 
> The core code does the matching generically, probably there is a
> struct dma_buf_interconnect match() op it uses to help this process.
> 
> I don't think importer or exporter should be open coding any
> matching.
> 
> For example, we have some systems with multipath PCI. This could
> actually support those properly. The RDMA NIC has two struct devices
> it operates with different paths, so it would write out two
> &dmabuf_generic_dma_addr_t's - one for each.
> 
> The GPU would do the same. The core code can have generic code to
> evaluate if P2P is possible and estimate some QOR between the
> options.

This sounds OK with me. I have some additional questions, though,

1) Everybody agrees that the interconnect used is a property of the
attachment? It should be negotiated during attach()?

2) dma-buf pcie-p2p allows transparent fallback to system memory dma-
buf. I think that is a good thing to keep even for other interconnects
(if possible). Like if someone wants to pull the network cable, we
could trigger a move_notify() and on next map() we'd fall back. Any
ideas around this?

Thanks,
Thomas



> 
> Jason

Reply via email to