On 08/12/2022 12:11, Jakub Jelinek wrote:
On Thu, Jul 07, 2022 at 11:34:33AM +0100, Andrew Stubbs wrote:

Implement the OpenMP pinned memory trait on Linux hosts using the mlock
syscall.  Pinned allocations are performed using mmap, not malloc, to ensure
that they can be unpinned safely when freed.

As I said before, I think the pinned memory is too precious to waste it this
way, we should handle the -> pinned case through memkind_create_fixed on
mmap + mlock area, that way we can create even quite small pinned
allocations.

This has been delayed due to other priorities, but our current plan is to switch to using cudaHostAlloc, when available, but we can certainly use memkind_create_fixed for the fallback case (including amdgcn).

Using Cuda might be trickier to implement because there's a layering violation inherent in routing target independent allocations through the nvptx plugin, but benchmarking shows that that's the only way to get the faster path through the Cuda black box; being pinned is good because it avoids page faults, but apparently if Cuda *knows* it is pinned then you get a speed boost even when there would be *no* faults (i.e. on a quiet machine). Additionally, Cuda somehow ignores the OS-defining limits.

Thomas Schwinge has been assigned this task and will be getting to it soonish.

Andrew

Reply via email to