This may be of some use to non-experts: https://enccs.github.io/gpu-programming/
> On Dec 27, 2024, at 8:32 AM, Paul Lalonde <paul.a.lalo...@gmail.com> wrote: > > GPUs have been my bread and butter for 20+ years. > > The best introductory source continues to be Kayvon Fatahalian and Mike > Houston's 2008 CACM paper: https://dl.acm.org/doi/10.1145/1400181.1400197 > > It says little about the software interface to the GPU, but does a very good > job of motivating and describing the architecture. > > The in-depth resource for modern GPU architecture is the Nvidia A100 tensor > architecture paper: > https://images.nvidia.com/aem-dam/en-zz/Solutions/data-center/nvidia-ampere-architecture-whitepaper.pdf. > It's a slog, but clearly shows how compute has changed. Particularly, much > of the success is in turning branchy workloads with scattered memory accesses > into much more bulk-oriented data streams that match well to the "natural" > structure of the tensor cores. The performance gains can be astronomical. > I've personally made > 1000x - yes, that's *times* not percentages - speedups > with some workloads. There is very little compute that's "cpu-limited" at > multi-second scales that can't benefit from these approaches, hence the death > of non-GPU supercomputing. > > Paul > > > > On Fri, Dec 27, 2024 at 7:13 AM <tlaro...@kergis.com > <mailto:tlaro...@kergis.com>> wrote: >> On Thu, Dec 26, 2024 at 10:24:23PM -0800, Ron Minnich wrote: >> [very interesting stuff] >> > >> > Finally, why did something like this not ever happen? Because GPUs >> > came along a few years later and that's where all the parallelism in >> > HPC is nowadays. NIX was a nice idea, but it did not survive in the >> > GPU era. >> > >> >> GPUs are actually wreaking havoc other kernels, with, in the Unix >> world, X11 being in a bad shape for several reasons, one being that >> GPU are not limited to graphical display---this tends to be >> anecdoctical in some sense. >> >> Can you elaborate on the GPUs paradigm break? I tend to think that >> there is a main difference between "equals" sharing a same address >> space via MMU, and auxiliary processors that are using another address >> space. A GPU, as far as I know (this is not much), is an auxiliary >> processor when the GPU is discrete, and is a specialized processor >> sharing the same address space when integrated (but I guess that a >> HPC have discrete GPUs with perhaps a specialized connection). >> >> Do you know good references about: >> >> - organizing processors depending on memory connection---I found >> mainly M. J. Flynn's paper(s) about this, but nothing more >> recent---and the impact on an OS design; >> >> - IPC vs threads---from your description, it seems that your solution >> was multiplying processes so IPC instead of multiplying threads---but >> nonetheless the sharing of differing memories remains, and is more >> easy to solve with IPC than with threads; >> >> - Present GPU's architecture (supposing it is documented; it seems not >> totally from "General-Purpose Graphics Processor Archictectures", >> Aamodt, Lun Fung, Rogers, SpringerVerlag) and the RISC-V approach, >> composing hardware by connecting dedicated elements, and vectors vs >> SIMT. >> >> Thanks for sharing (what can be shared)! >> -- >> Thierry Laronde <tlaronde +AT+ kergis +dot+ com> >> http://www.kergis.com/ >> http://kertex.kergis.com/ >> Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C > > 9fans <https://9fans.topicbox.com/latest> / 9fans / see discussions > <https://9fans.topicbox.com/groups/9fans> + participants > <https://9fans.topicbox.com/groups/9fans/members> + delivery options > <https://9fans.topicbox.com/groups/9fans/subscription>Permalink > <https://9fans.topicbox.com/groups/9fans/T7692a612f26c8ec5-M515bf7357d2a3e968c25260f> ------------------------------------------ 9fans: 9fans Permalink: https://9fans.topicbox.com/groups/9fans/T7692a612f26c8ec5-M65cc13f1936d4f4401738ce6 Delivery options: https://9fans.topicbox.com/groups/9fans/subscription