On 09.06.22 11:38, Thomas Schwinge wrote:
On 2022-06-07T13:28:33+0100, Andrew Stubbs <a...@codesourcery.com> wrote:
On 07/06/2022 13:10, Jakub Jelinek wrote:
On Tue, Jun 07, 2022 at 12:05:40PM +0100, Andrew Stubbs wrote:
The memory pinned via the mlock call does not give the expected performance
boost. I had not expected that it would do much in my test setup, given that
the machine has a lot of RAM and my benchmarks are small, but others have
tried more and on varying machines and architectures.
I don't understand why there should be any expected performance boost (at
least not unless the machine starts swapping out pages),
{ omp_atk_pinned, true } is solely about the requirement that the memory
can't be swapped out.
It seems like it takes a faster path through the NVidia drivers. [...]

I think this conflates two parts:

* User-defined allocators in general – there CUDA does not make much
sense and without unified-shared memory, it will always be inaccessible
on the device (w/o explicit/implicit mapping).

* Memory which is supposed to be accessible both on the host and on the
device. That's most obvious by  explicitly allocating to be accessible
on both – it is less clear cut when just creating an allocator with
unified-shared memory as it is not clear when it is only using on the
host (e.g. with host-based thread parallelization) – and when it is also
relevant for the device.

Currently, the user has no means to express the intent that it should be
accessible on both the host and one/several devices, except for 'omp
requires unified_shared_memory'.

The next OpenMP version will likely permit a means to create an
allocator which permits this →
https://github.com/OpenMP/spec/issues/1843 (not publicly available;
slides (last comment) are slightly outdated).

 * * *

The question is only what to do with 'requires unified_shared_memory' –
and a non-multi-device allocator.

Probably: unified_shared_memory or no nvptx device: just use mlock.
Otherwise (i.e. both nvptx device and (unified_shared_memory or a
multi-device-allocator)), use the CUDA one.

For the latter, I think Thomas' remarks are helpful.

Tobias

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955

Reply via email to