On 10/20/22 08:07, Jakub Jelinek wrote:
Thus, IMHO it is exactly the pass_omp_simd_clone pass where you want to
implement this auto-simdization discovery, guarded with
#ifdef ACCEL_COMPILER and the new option (which means it will be done
only for gcn and not on the host right now).
I'm running into a practical difficulty with making this controlled by a
static #ifdef: namely, testing.
One of my test cases examines the .s output to make sure that the clones
are emitted as local symbols and not global. I have not been able to
find the symbol linkage information in any of the dump files, and I have
also not been able to figure out how to get a .s file from the offload
compiler even outside of the DejaGnu test harness. (It's possible I am
just an extreme dummy about the latter problem, but so far none of my
colleagues here has been able to give me a recipe either.)
On top of that, I worry that this should be tested more broadly than for
the one target we're presently focusing on (AMD GCN), and we'll get much
more regular test coverage if it's also enabled for x86_64 target which
has the necessary compute_vecsize_and_simdlen target hook.
I remember Carlos O'Donnell used to have a favorite mantra, "design for
test". So, maybe generalize the new -fopenmp-target-simd-clone option
to take a parameter to force clones to be generated on the OpenMP host
for test purposes? The "declare target" directive already has a clause
device_type(host|nohost|any)
that defaults to "any"; maybe we could use that syntax like
-fopenmp-target-simd-clone=any
and use the intersection of the two sets to determine what to
auto-generate clones for?
-Sandra