Re: [PATCH 4/7] alpha: provide ioread64 and iowrite64 implementations

2017-06-25 Thread Stephen Bates
> +#define iowrite64be(v,p) iowrite32(cpu_to_be64(v), (p))
 
Logan, thanks for taking this cleanup on. I think this should be iowrite64 not 
iowrite32?

Stephen


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Enabling peer to peer device transactions for PCIe devices

2016-12-04 Thread Stephen Bates
>>
>> The NVMe fabrics stuff could probably make use of this. It's an
>> in-kernel system to allow remote access to an NVMe device over RDMA. So
>> they ought to be able to optimize their transfers by DMAing directly to
>>  the NVMe's CMB -- no userspace interface would be required but there
>> would need some kernel infrastructure.
>
> Yes, that's what I was thinking. The NVMe/f driver needs to map the CMB
> for RDMA. I guess if it used ZONE_DEVICE like in the iopmem patches it
> would be relatively easy to do.
>

Haggai, yes that was one of the use cases we considered when we put
together the patchset.



Enabling peer to peer device transactions for PCIe devices

2016-12-04 Thread Stephen Bates
Hi All

This has been a great thread (thanks to Alex for kicking it off) and I
wanted to jump in and maybe try and put some summary around the
discussion. I also wanted to propose we include this as a topic for LFS/MM
because I think we need more discussion on the best way to add this
functionality to the kernel.

As far as I can tell the people looking for P2P support in the kernel fall
into two main camps:

1. Those who simply want to expose static BARs on PCIe devices that can be
used as the source/destination for DMAs from another PCIe device. This
group has no need for memory invalidation and are happy to use
physical/bus addresses and not virtual addresses.

2. Those who want to support devices that suffer from occasional memory
pressure and need to invalidate memory regions from time to time. This
camp also would like to use virtual addresses rather than physical ones to
allow for things like migration.

I am wondering if people agree with this assessment?

I think something like the iopmem patches Logan and I submitted recently
come close to addressing use case 1. There are some issues around
routability but based on feedback to date that does not seem to be a
show-stopper for an initial inclusion.

For use-case 2 it looks like there are several options and some of them
(like HMM) have been around for quite some time without gaining
acceptance. I think there needs to be more discussion on this usecase and
it could be some time before we get something upstreamable.

I for one, would really like to see use case 1 get addressed soon because
we have consumers for it coming soon in the form of CMBs for NVMe devices.

Long term I think Jason summed it up really well. CPU vendors will put
high-speed, open, switchable, coherent buses on their processors and all
these problems will vanish. But I ain't holding my breathe for that to
happen ;-).

Cheers

Stephen


Enabling peer to peer device transactions for PCIe devices

2016-12-06 Thread Stephen Bates
>>> I've already recommended that iopmem not be a block device and
>>> instead be a device-dax instance. I also don't think it should claim
>>> the PCI ID, rather the driver that wants to map one of its bars this
>>> way can register the memory region with the device-dax core.
>>>
>>> I'm not sure there are enough device drivers that want to do this to
>>> have it be a generic /sys/.../resource_dmableX capability. It still
>>> seems to be an exotic one-off type of configuration.
>>
>>
>> Yes, this is essentially my thinking. Except I think the userspace
>> interface should really depend on the device itself. Device dax is a
>> good  choice for many and I agree the block device approach wouldn't be
>> ideal.

I tend to agree here. The block device interface has seen quite a bit of
resistance and /dev/dax looks like a better approach for most. We can look
at doing it that way in v2.

>>
>> Specifically for NVME CMB: I think it would make a lot of sense to just
>> hand out these mappings with an mmap call on /dev/nvmeX. I expect CMB
>> buffers would be volatile and thus you wouldn't need to keep track of
>> where in the BAR the region came from. Thus, the mmap call would just be
>> an allocator from BAR memory. If device-dax were used, userspace would
>> need to lookup which device-dax instance corresponds to which nvme
>> drive.
>>
>
> I'm not opposed to mapping /dev/nvmeX.  However, the lookup is trivial
> to accomplish in sysfs through /sys/dev/char to find the sysfs path of the
> device-dax instance under the nvme device, or if you already have the nvme
> sysfs path the dax instance(s) will appear under the "dax" sub-directory.
>

Personally I think mapping the dax resource in the sysfs tree is a nice
way to do this and a bit more intuitive than mapping a /dev/nvmeX.




Re: Enabling peer to peer device transactions for PCIe devices

2017-01-12 Thread Stephen Bates
On Fri, January 6, 2017 4:10 pm, Logan Gunthorpe wrote:
>
>
> On 06/01/17 11:26 AM, Jason Gunthorpe wrote:
>
>
>> Make a generic API for all of this and you'd have my vote..
>>
>>
>> IMHO, you must support basic pinning semantics - that is necessary to
>> support generic short lived DMA (eg filesystem, etc). That hardware can
>> clearly do that if it can support ODP.
>
> I agree completely.
>
>
> What we want is for RDMA, O_DIRECT, etc to just work with special VMAs
> (ie. at least those backed with ZONE_DEVICE memory). Then
> GPU/NVME/DAX/whatever drivers can just hand these VMAs to userspace
> (using whatever interface is most appropriate) and userspace can do what
> it pleases with them. This makes _so_ much sense and actually largely
> already works today (as demonstrated by iopmem).

+1 for iopmem ;-)

I feel like we are going around and around on this topic. I would like to
see something that is upstream that enables P2P even if it is only the
minimum viable useful functionality to begin. I think aiming for the moon
(which is what HMM and things like it are) are simply going to take more
time if they ever get there.

There is a use case for in-kernel P2P PCIe transfers between two NVMe
devices and between an NVMe device and an RDMA NIC (using NVMe CMBs or
BARs on the NIC). I am even seeing users who now want to move data P2P
between FPGAs and NVMe SSDs and the upstream kernel should be able to
support these users or they will look elsewhere.

The iopmem patchset addressed all the use cases above and while it is not
an in kernel API it could have been modified to be one reasonably easily.
As Logan states the driver can then choose to pass the VMAs to user-space
in a manner that makes sense.

Earlier in the thread someone mentioned LSF/MM. There is already a
proposal to discuss this topic so if you are interested please respond to
the email letting the committee know this topic is of interest to you [1].

Also earlier in the thread someone discussed the issues around the IOMMU.
Given the known issues around P2P transfers in certain CPU root complexes
[2] it might just be a case of only allowing P2P when a PCIe switch
connects the two EPs. Another option is just to use CONFIG_EXPERT and make
sure people are aware of the pitfalls if they invoke the P2P option.

Finally, as Jason noted, we could all just wait until
CAPI/OpenCAPI/CCIX/GenZ comes along. However given that these interfaces
are the remit of the CPU vendors I think it behooves us to solve this
problem before then. Also some of the above mentioned protocols are not
even switchable and may not be amenable to a P2P topology...

Stephen

[1] http://marc.info/?l=linux-mm&m=148156541804940&w=2
[2] https://community.mellanox.com/docs/DOC-1119

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel