Andrew Stubbs wrote:
PS: I would love to do some comparisons [...]
Actually, I think testing only data transfer is fine for this, but we
might like to try some different access patterns, besides straight
linear copies.
I have now tried it on my laptop with
BabelStream,https://github.com/UoB-HPC/BabelStream
Compiling with:
echo "#pragma omp requires unified_shared_memory" > omp-usm.h
cmake -DMODEL=omp -DCMAKE_CXX_COMPILER=$HOME/projects/gcc-trunk-offload/bin/g++
\
-DCXX_EXTRA_FLAGS="-g -include ../omp-usm.h -foffload=nvptx-none
-fopenmp" -DOFFLOAD=ON ..
(and the variants: no -include (→ map) + -DOFFLOAD=OFF (= host), and with
hostfallback,
via env var (or usm-14 by due to lacking support.)
For mainline, I get (either with libgomp.so of mainline or GCC 14, i.e. w/o USM
support):
host-14.log 195.84user 0.94system 0 11.20elapsed 1755%CPU
(0avgtext+0avgdata 1583268maxresident)k
host-mainline.log 200.16user 1.00system 0 11.89elapsed 1691%CPU
(0avgtext+0avgdata 1583272maxresident)k
hostfallback-mainline.log 288.99user 4.57system 0 19.39elapsed 1513%CPU
(0avgtext+0avgdata 1583972maxresident)k
usm-14.log 279.91user 5.38system 0 19.57elapsed 1457%CPU
(0avgtext+0avgdata 1590168maxresident)k
map-14.log 4.17user 0.45system 0 03.58elapsed 129%CPU
(0avgtext+0avgdata 1691152maxresident)k
map-mainline.log 4.15user 0.44system 0 03.58elapsed 128%CPU
(0avgtext+0avgdata 1691260maxresident)k
usm-mainline.log 3.63user 1.96system 0 03.88elapsed 144%CPU
(0avgtext+0avgdata 1692068maxresident)k
Thus: GPU is faster than host, host fallback takes 40% longer than doing host
compilation.
USM is 15% faster than mapping.
With OG13, the pattern is similar, except that USM is only 3% faster. Thus, HMM
seems to win my my laptop.
host-og13.log 191.51user 0.70system 0 09.80elapsed 1960%CPU
(0avgtext+0avgdata 1583280maxresident)k
map-hostfallback-og13.log 205.12user 1.09system 0 10.82elapsed 1905%CPU
(0avgtext+0avgdata 1585092maxresident)k
usm-hostfallback-og13.log 338.82user 4.60system 0 19.34elapsed 1775%CPU
(0avgtext+0avgdata 1584580maxresident)k
map-og13.log 4.43user 0.42system 0 03.59elapsed 135%CPU
(0avgtext+0avgdata 1692692maxresident)k
usm-og13.log 4.31user 1.18system 0 03.68elapsed 149%CPU
(0avgtext+0avgdata 1686256maxresident)k
* * *
I planned to try an AMD Instinct MI200 device, but due to two IT issues, I
cannot.
(Shutdown for maintenance of the MI250X system and an NFS issues for the MI210
run,
but being unable to reboot due to the absence of a colleague having tons of
editors
still open).
Tobias