On 09.06.22 11:38, Thomas Schwinge wrote:
On 2022-06-07T13:28:33+0100, Andrew Stubbs <a...@codesourcery.com> wrote:
On 07/06/2022 13:10, Jakub Jelinek wrote:
On Tue, Jun 07, 2022 at 12:05:40PM +0100, Andrew Stubbs wrote:
The memory pinned via the mlock call does not give the expected performance
boost. I had not expected that it would do much in my test setup, given that
the machine has a lot of RAM and my benchmarks are small, but others have
tried more and on varying machines and architectures.
I don't understand why there should be any expected performance boost (at
least not unless the machine starts swapping out pages),
{ omp_atk_pinned, true } is solely about the requirement that the memory
can't be swapped out.
It seems like it takes a faster path through the NVidia drivers. [...]
I think this conflates two parts:
* User-defined allocators in general – there CUDA does not make much
sense and without unified-shared memory, it will always be inaccessible
on the device (w/o explicit/implicit mapping).
* Memory which is supposed to be accessible both on the host and on the
device. That's most obvious by explicitly allocating to be accessible
on both – it is less clear cut when just creating an allocator with
unified-shared memory as it is not clear when it is only using on the
host (e.g. with host-based thread parallelization) – and when it is also
relevant for the device.
Currently, the user has no means to express the intent that it should be
accessible on both the host and one/several devices, except for 'omp
requires unified_shared_memory'.
The next OpenMP version will likely permit a means to create an
allocator which permits this →
https://github.com/OpenMP/spec/issues/1843 (not publicly available;
slides (last comment) are slightly outdated).
* * *
The question is only what to do with 'requires unified_shared_memory' –
and a non-multi-device allocator.
Probably: unified_shared_memory or no nvptx device: just use mlock.
Otherwise (i.e. both nvptx device and (unified_shared_memory or a
multi-device-allocator)), use the CUDA one.
For the latter, I think Thomas' remarks are helpful.
Tobias
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht
München, HRB 106955