On 08/12/2022 12:11, Jakub Jelinek wrote:
On Thu, Jul 07, 2022 at 11:34:33AM +0100, Andrew Stubbs wrote:
Implement the OpenMP pinned memory trait on Linux hosts using the mlock
syscall. Pinned allocations are performed using mmap, not malloc, to ensure
that they can be unpinned safely when freed.
As I said before, I think the pinned memory is too precious to waste it this
way, we should handle the -> pinned case through memkind_create_fixed on
mmap + mlock area, that way we can create even quite small pinned
allocations.
This has been delayed due to other priorities, but our current plan is
to switch to using cudaHostAlloc, when available, but we can certainly
use memkind_create_fixed for the fallback case (including amdgcn).
Using Cuda might be trickier to implement because there's a layering
violation inherent in routing target independent allocations through the
nvptx plugin, but benchmarking shows that that's the only way to get the
faster path through the Cuda black box; being pinned is good because it
avoids page faults, but apparently if Cuda *knows* it is pinned then you
get a speed boost even when there would be *no* faults (i.e. on a quiet
machine). Additionally, Cuda somehow ignores the OS-defining limits.
Thomas Schwinge has been assigned this task and will be getting to it
soonish.
Andrew