This data-shuttling is one of the things that GPU vendors have been working on.
Most of the data the GPU needs is never touched by the CPU, except to move it to GPU memory. This is wasteful. But the GPU already sits on the PCIe bus, as does the storage device. Why not move the data directly from storage to GPU memory? Recent iterations of GPUs can support this. Likewise, Nvidia's NVLink high-throughput fabric allows loading directly to GPU memory without touching CPU memory at the same time. Although CPU and GPU memory are often the same storage architecture, the memory controllers differ sufficiently to make it a performance hit to support both CPU-like random access and GPU-like streaming memory patterns. UMA certainly works in mobile and in graphics workloads (witness phones and game consoles), it's more challenging when trying to squeeze the ultimate performance-per-watt out of HPC workloads. It's also important not to conflate a uniform memory address space with a uniformly implemented address space - it's possible to map a chunk of the GPU memory to the CPU memory space and treat it like RAM from the CPU's view, but the operations are typically strongly unbalanced with writes costing significantly more because of the difference in memory consistency models between the two devices. Paul On Sat, Dec 28, 2024 at 8:25 AM Frank D. Engel, Jr. <fde...@fjrhome.net> wrote: > Consequently the CPU and GPU work > together much more directly without needing to waste time to shuttle > data between them. > > ------------------------------------------ 9fans: 9fans Permalink: https://9fans.topicbox.com/groups/9fans/T7692a612f26c8ec5-Mf15c6498d2a1be494bc3799c Delivery options: https://9fans.topicbox.com/groups/9fans/subscription