This data-shuttling is one of the things that GPU vendors have been working
on.

Most of the data the GPU needs is never touched by the CPU, except to move
it to GPU memory.  This is wasteful.
But the GPU already sits on the PCIe bus, as does the storage device.  Why
not move the data directly from storage to GPU memory?
Recent iterations of GPUs can support this.  Likewise, Nvidia's NVLink
high-throughput fabric allows loading directly to GPU memory without
touching CPU memory at the same time.

Although CPU and GPU memory are often the same storage architecture, the
memory controllers differ sufficiently to make it a performance hit to
support both CPU-like random access and GPU-like streaming memory
patterns.  UMA certainly works in mobile and in graphics workloads (witness
phones and game consoles), it's more challenging when trying to squeeze the
ultimate performance-per-watt out of HPC workloads.

It's also important not to conflate a uniform memory address space with a
uniformly implemented address space - it's possible to map a chunk of the
GPU memory to the CPU memory space and treat it like RAM from the CPU's
view, but the operations are typically strongly unbalanced with writes
costing significantly more because of the difference in memory consistency
models between the two devices.

Paul

On Sat, Dec 28, 2024 at 8:25 AM Frank D. Engel, Jr. <fde...@fjrhome.net>
wrote:

> Consequently the CPU and GPU work
> together much more directly without needing to waste time to shuttle
> data between them.
>
>

------------------------------------------
9fans: 9fans
Permalink: 
https://9fans.topicbox.com/groups/9fans/T7692a612f26c8ec5-Mf15c6498d2a1be494bc3799c
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

Reply via email to