Andrew Stubbs wrote:

PS: I would love to do some comparisons [...]
Actually, I think testing only data transfer is fine for this, but we
might like to try some different access patterns, besides straight
linear copies.

I have now tried it on my laptop with 
BabelStream,https://github.com/UoB-HPC/BabelStream

Compiling with:
echo "#pragma omp requires unified_shared_memory" > omp-usm.h
cmake -DMODEL=omp -DCMAKE_CXX_COMPILER=$HOME/projects/gcc-trunk-offload/bin/g++ 
\
      -DCXX_EXTRA_FLAGS="-g -include ../omp-usm.h -foffload=nvptx-none 
-fopenmp" -DOFFLOAD=ON ..

(and the variants: no -include (→ map) + -DOFFLOAD=OFF (= host), and with 
hostfallback,
via env var (or usm-14 by due to lacking support.)

For mainline, I get (either with libgomp.so of mainline or GCC 14, i.e. w/o USM 
support):

host-14.log                     195.84user 0.94system 0 11.20elapsed 1755%CPU 
(0avgtext+0avgdata 1583268maxresident)k
host-mainline.log               200.16user 1.00system 0 11.89elapsed 1691%CPU 
(0avgtext+0avgdata 1583272maxresident)k
hostfallback-mainline.log       288.99user 4.57system 0 19.39elapsed 1513%CPU 
(0avgtext+0avgdata 1583972maxresident)k
usm-14.log                      279.91user 5.38system 0 19.57elapsed 1457%CPU 
(0avgtext+0avgdata 1590168maxresident)k
map-14.log                      4.17user 0.45system 0   03.58elapsed 129%CPU 
(0avgtext+0avgdata 1691152maxresident)k
map-mainline.log                4.15user 0.44system 0   03.58elapsed 128%CPU 
(0avgtext+0avgdata 1691260maxresident)k
usm-mainline.log                3.63user 1.96system 0   03.88elapsed 144%CPU 
(0avgtext+0avgdata 1692068maxresident)k

Thus: GPU is faster than host, host fallback takes 40% longer than doing host 
compilation.
USM is 15% faster than mapping.


With OG13, the pattern is similar, except that USM is only 3% faster. Thus, HMM 
seems to win my my laptop.

host-og13.log                   191.51user 0.70system 0 09.80elapsed 1960%CPU 
(0avgtext+0avgdata 1583280maxresident)k
map-hostfallback-og13.log       205.12user 1.09system 0 10.82elapsed 1905%CPU 
(0avgtext+0avgdata 1585092maxresident)k
usm-hostfallback-og13.log       338.82user 4.60system 0 19.34elapsed 1775%CPU 
(0avgtext+0avgdata 1584580maxresident)k
map-og13.log                    4.43user 0.42system 0   03.59elapsed 135%CPU 
(0avgtext+0avgdata 1692692maxresident)k
usm-og13.log                    4.31user 1.18system 0   03.68elapsed 149%CPU 
(0avgtext+0avgdata 1686256maxresident)k

* * *

I planned to try an AMD Instinct MI200 device, but due to two IT issues, I 
cannot.
(Shutdown for maintenance of the MI250X system and an NFS issues for the MI210 
run,
but being unable to reboot due to the absence of a colleague having tons of 
editors
still open).

Tobias

Reply via email to