On Tue, Jan 04, 2022 at 03:32:17PM +0000, Andrew Stubbs wrote:
> This patch implements the OpenMP pinned memory trait for Linux hosts. On
> other hosts and on devices the trait becomes a no-op (instead of being
> rejected).
> 
> The memory is locked via the mlock syscall, which is both the "correct" way
> to do it on Linux, and a problem because the default ulimit for pinned
> memory is very small (and most users don't have permission to increase it
> (much?)). Therefore the code emits a non-fatal warning message if locking
> fails.
> 
> Another approach might be to use cudaHostAlloc to allocate the memory in the
> first place, which bypasses the ulimit somehow, but this would not help
> non-NVidia users.
> 
> The tests work on Linux and will xfail on other hosts; neither libgomp nor
> the test knows how to allocate or query pinned memory elsewhere.
> 
> The patch applies on top of the text of my previously submitted patches, but
> does not actually depend on the functionality of those patches.
> 
> OK for stage 1?
> 
> I'll commit a backport to OG11 shortly.
> 
> Andrew

> libgomp: pinned memory
> 
> Implement the OpenMP pinned memory trait on Linux hosts using the mlock
> syscall.
> 
> libgomp/ChangeLog:
> 
>       * allocator.c (MEMSPACE_PIN): New macro.
>       (xmlock): New function.
>       (omp_init_allocator): Don't disallow the pinned trait.
>       (omp_aligned_alloc): Add pinning via MEMSPACE_PIN.
>       (omp_aligned_calloc): Likewise.
>       (omp_realloc): Likewise.
>       * testsuite/libgomp.c/alloc-pinned-1.c: New test.
>       * testsuite/libgomp.c/alloc-pinned-2.c: New test.
> 
> diff --git a/libgomp/allocator.c b/libgomp/allocator.c
> index b1f5fe0a5e2..671b91e7ff8 100644
> --- a/libgomp/allocator.c
> +++ b/libgomp/allocator.c
> @@ -51,6 +51,25 @@
>  #define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE) \
>    ((void)MEMSPACE, (void)SIZE, free (ADDR))
>  #endif
> +#ifndef MEMSPACE_PIN
> +/* Only define this on supported host platforms.  */
> +#ifdef __linux__
> +#define MEMSPACE_PIN(MEMSPACE, ADDR, SIZE) \
> +  ((void)MEMSPACE, xmlock (ADDR, SIZE))
> +
> +#include <sys/mman.h>
> +#include <stdio.h>
> +void
> +xmlock (void *addr, size_t size)
> +{
> +  if (mlock (addr, size))
> +      perror ("libgomp: failed to pin memory (ulimit too low?)");
> +}
> +#else
> +#define MEMSPACE_PIN(MEMSPACE, ADDR, SIZE) \
> +  ((void)MEMSPACE, (void)ADDR, (void)SIZE)
> +#endif
> +#endif

The usual libgomp way of doing this wouldn't be to use #ifdef __linux__, but
instead add libgomp/config/linux/allocator.c that includes some headers,
defines some macros and then includes the generic allocator.c.

I think perror is the wrong thing to do, omp_alloc etc. has a well defined
interface what to do in such cases - the allocation should just fail (not be
allocated) and depending on user's choice that can be fatal, or return NULL,
or chain to some other allocator with other properties etc.

Other issues in the patch are that it doesn't munlock on deallocation and
that because of that deallocation we need to figure out what to do on page
boundaries.  As documented, mlock can be passed address and/or address +
size that aren't at page boundaries and pinning happens even just for
partially touched pages.  But munlock unpins also even the partially
overlapping pages and we don't know at that point whether some other pinned
allocations don't appear in those pages.
Some bad options are only pin pages wholy contained within the allocation
and don't pin partial pages around it, force at least page alignment and
size so that everything can be pinned, somehow ensure that we never allocate
more than one pinned allocation in such partial pages (but can allocate
there non-pinned allocations), or e.g. use some internal data structure to
track how many pinned allocations are on the partial pages (say a hash map
from page start address to a counter how many pinned allocations are there,
if it goes to 0 munlock even that page, otherwise munlock just the wholy
contained pages), or perhaps use page size aligned allocation and size and
just remember in some data structure that the partial pages could be used
for other pinned (small) allocations.

        Jakub

Reply via email to