Re: [PATCH v3] Re: OpenMP: Generate SIMD clones for functions with "declare target"

Jakub Jelinek via Gcc-patches Thu, 20 Oct 2022 07:08:58 -0700

On Sun, Oct 16, 2022 at 07:23:05PM -0600, Sandra Loosemore wrote:
> My sense is that the first approach would be more straightforward than the
> second one, and I am willing to continue to work on that.  However, I think
> I need some direction to get started, as I presently know nothing about
> cgraph and I was unable to find any useful overview or interface
> documentation in the GCC internals manual.  Is this as simple as inserting
> an existing pass into the passlist to clean up after vectorization, or does
> it involve writing something more or less from scratch?


We (as I've discovered during the work on assumptions) have
TODO_discard_function which when returned from an execute pass throws away
a function completely (except now assumption functions for which it doesn't
release body; this could be done in some pass shortly after IPA, or
alternatively before expansion).  But another thing that needs to be done is 
for the
non-public declare simd clones (both explicit and implicit from your patch)
to be ordered in cgraph after anything that has a cgraph edge to its
original function.  I don't know how to do that, you should talk to Honza,
Richi or Martin about that.
I think the current behavior is that callees are processed before callers
if possible (unless there are cycles), which is certainly what we want for
say assume functions, or IPA RA etc.  But in case of non-public simd clones
we want to do it the other way around (at the expense of IPA RA), so that
we can throw away functions which aren't needed.

> > I admit I don't remember where exactly the simd clone happens wrt. other
> > IPA passes, but I think it is late pass; so, does it happen for GCN
> > offloading only in the lto1 offloading compiler?
> > Shouldn't the auto optimization be then done only in the offloading
> > lto1 for GCN then (say guard on targetm boolean)?
> 
> I'm afraid I don't know much about offloading, but I was under the
> impression it all goes through the same compilation process, just with a
> different target?

I've looked at it today and it seems late ipa passes are executed after LTO
bytecode is streamed back in.
If you say try:
#pragma omp declare simd
int foo (int x) { return x; }

int
main ()
{
  int a[64] = {};
  #pragma omp target map(a)
  #pragma omp simd
  for (int i = 0; i < 64; i++)
    a[i] = foo (a[i]);
}
with
gcc -foffload-options='-fdump-tree-all -fdump-ipa-all' -fdump-tree-all 
-fdump-ipa-all -O2 -fopenmp a.c -o a
you ought to see the simdclone dump both as a.c.*i.simdclone and 
a.x*.mkoffload.*i.simdclone
where the former is what is done for the host code (and host fallback),
while the latter is what is done in the offloading lto.
Can't verify it 100% because I have only nvptx-none offloading configured
and in that case pass_omp_simd_clone::gate is disabled in offloading lto
because targetm.simd_clone.compute_vecsize_and_simdlen is NULL for nvptx.
But it is non-NULL for gcn.

Thus, IMHO it is exactly the pass_omp_simd_clone pass where you want to
implement this auto-simdization discovery, guarded with
#ifdef ACCEL_COMPILER and the new option (which means it will be done
only for gcn and not on the host right now).  And do it at the start of
ipa_omp_simd_clone, before the
  FOR_EACH_FUNCTION (node)
    expand_simd_clones (node);
loop, or, if it is purely local decision for each function, at the
start of expand_simd_clones with similar guarding, punt on functions
with "noclone" attribute, or !node->definition.  You need to repeat the
  if (node->has_gimple_body_p ())
    node->get_body ();
to get body before you analyze it.

And please put the new functions for such analysis into omp-simd-clone.cc
where they belong.

        Jakub

Re: [PATCH v3] Re: OpenMP: Generate SIMD clones for functions with "declare target"

Reply via email to