Re: Host/device shared memory

Andrew Stubbs Mon, 02 Dec 2019 06:42:17 -0800

On 02/12/2019 14:23, Thomas Schwinge wrote:

Hi!


On 2019-11-15T13:43:04+0100, Jakub Jelinek <ja...@redhat.com> wrote:

On Fri, Nov 15, 2019 at 12:38:06PM +0000, Andrew Stubbs wrote:

On 15/11/2019 12:21, Jakub Jelinek wrote:

I'm surprised by the set acc_mem_shared 0, I thought gcn is a shared memory
offloading target.


APUs, such as Carizzo are shared memory. DGPUs, such as Fiji and Vega, have
their own memory. A DGPU can access host memory, provided that it has been
set up just so, but that is very slow, and I don't know of a way to do that
without still having to copy the program data into that special region.


For a few years already, Nvidia GPUs/drivers have been supporting what
they call Unified Memory, where the driver/kernel automatically handles
the movement of memory pages between host/device memories.  Given some
reasonable pre-fetching logic (either automatic in the driver/kernel, or
"guided" by the compiler/runtime), this reportedly achieves good
performance -- or even better performance than manually-managed memory
copying, as really only the data pages accessed (plus pre-fetched) will
be copied.

Yeah, this is not that. When the AMD GPU accesses host memory itappears to bypass both L1 and L2 caching. There's no copying, justdirect, on-demand accesses. This makes the performance really bad. Weuse it only for message passing, which is probably the original intent.

For example, see <https://dl.acm.org/citation.cfm?id=3356141> "Compiler
assisted hybrid implicit and explicit GPU memory management under unified
address space", which I've recently (SuperComputing 2019) have seen
presented, or other publications.

This is not currently implemented in GCC, but could/should be at some
point.

This (or even a mixture of manual-discrete/automatic-shared?) would then
be an execution mode of libgomp/plugin, selected at run-time?

All we really need from libgomp, to support AMD APUs, is to be able totoggle the shared memory mode dynamically, rather than having it bakedinto the capabilities at start-up. Probably we could figure out thecapabilities at run-time already, but that would break when a system hasboth kinds of device. Anyway, this is theoretical as I have no intentionto implement support for such devices.


Andrew

Re: Host/device shared memory

Reply via email to