Re: [PATCH] libgomp, OpenMP, nvptx: Low-latency memory allocator

Tom de Vries via Gcc-patches Wed, 05 Jan 2022 03:08:50 -0800

On 12/20/21 16:58, Andrew Stubbs wrote:

This patch is submitted now for review and so I can commit a backport itto the OG11 branch, but isn't suitable for mainline until stage 1.
The patch implements support for omp_low_lat_mem_space andomp_low_lat_mem_alloc on NVPTX offload devices. The omp_pteam_mem_alloc,omp_cgroup_mem_alloc and omp_thread_mem_alloc allocators are alsoconfigured to use this space (this to match the current or intendedbehaviour in other toolchains).
The memory is drawn from the ".shared" space that is accessible onlyfrom within the team in which it is allocated, and which effectivelyceases to exist when the kernel exits. By default, 8 KiB of space isreserved for each team at launch time. This can be adjusted, at runtime,via a new environment variable "GOMP_NVPTX_LOWLAT_POOL". Reserving alarger amount may limit the number of teams that can be run in parallel(due to hardware limitations). Conversely, reducing the allocation mayincrease the number of teams that can be run in parallel. (I have notyet attempted to tune the default too precisely.) The actual maximumsize will vary according to the available hardware and the number ofvariables that the compiler has placed in .shared space.
The allocator implementation is designed to add no extra space-overheadthan omp_alloc already does (aside from rounding allocations up to amultiple of 8 bytes), thus the internal free and realloc must be toldhow big the original allocation was. The free algorithm maintains anin-order linked-list of free memory chunks. Memory is allocated on afirst-fit basis.
If the allocation fails the NVPTX allocator returns NULL and omp_allochandles the fall-back. Now that this is a thing that is likely to happen(low-latency memory is small) this patch also implements appropriatefall-back modes for the predefined allocators (fall-back for customallocators already worked).
In order to support the %dynamic_smem_size PTX feature is is necessaryto bump the minimum supported PTX version from 3.1 (~2013) to 4.1 (~2014).

I applied the patch (but used the libgomp/configure.tgt patch to force-mptx=4.1, rather than changing the default).

I ran into the following (using export GOMP_NVPTX_JIT=-O0 to work aroundknown driver problems), and observed these extra FAILs:

...
FAIL: libgomp.c/../libgomp.c-c++-common/alloc-7.c execution test
FAIL: libgomp.c/../libgomp.c-c++-common/alloc-8.c execution test
FAIL: libgomp.c/allocators-1.c (test for excess errors)
FAIL: libgomp.c/allocators-2.c (test for excess errors)
FAIL: libgomp.c/allocators-3.c (test for excess errors)
FAIL: libgomp.c/allocators-4.c (test for excess errors)
FAIL: libgomp.c/allocators-5.c (test for excess errors)
FAIL: libgomp.c/allocators-6.c (test for excess errors)
FAIL: libgomp.c++/../libgomp.c-c++-common/alloc-7.c execution test
FAIL: libgomp.c++/../libgomp.c-c++-common/alloc-8.c execution test
FAIL: libgomp.fortran/alloc-10.f90   -O  execution test
FAIL: libgomp.fortran/alloc-9.f90   -O  execution test
...

The allocators-1.c test-case doesn't compile because:
...
FAIL: libgomp.c/allocators-1.c (test for excess errors)
Excess errors:

/home/vries/oacc/trunk/source-gcc/libgomp/testsuite/libgomp.c/allocators-1.c:7:22:sorry, unimplemented: ' ' clause on 'requires' directive not supported yet

UNRESOLVED: libgomp.c/allocators-1.c compilation failed to produceexecutable

...

So, I suppose I need "[PATCH] OpenMP front-end: allow requiresdynamic_allocators" as well, I'll try again with that applied.

The alloc-7.c execution test failure is a regression, AFAICT. It failshere:

...

38 if ((((uintptr_t) p) % __alignof (int)) != 0 || p[0] || p[1]|| p[2])

39          abort ();
...
because:
...
(gdb) p p[0]
$2 = 772014104
(gdb) p p[1]
$3 = 0
(gdb) p p[2]
$4 = 9
...

In other words, the pointer returned by omp_calloc does not point tozeroed out memory.


Thanks,
- Tom

Re: [PATCH] libgomp, OpenMP, nvptx: Low-latency memory allocator

Reply via email to