Re: [PATCH] OpenMP #pragma omp declare simd support

Jakub Jelinek Fri, 22 Nov 2013 03:20:16 -0800

On Fri, Nov 22, 2013 at 11:08:41AM +0100, Richard Biener wrote:
> > @@ -284,6 +382,12 @@ public:
> >    /* Declaration node used to be clone of. */
> >    tree former_clone_of;
> >  
> > +  /* If this is a SIMD clone, this points to the SIMD specific
> > +     information for it.  */
> > +  struct cgraph_simd_clone *simdclone;
> > +  /* If this function has SIMD clones, this points to the first clone.  */
> > +  struct cgraph_node *simd_clones;
> > +
> 
> I wonder how you run all of this through LTO (I'll see below I guess ;))


It doesn't work, as in, all the added testcases work just fine without -flto
and all of them ICE with -flto, but there are multiple known issues with LTO
before that (internal fns, etc.).  More below.

> The expr.c hunk is also ok independently of the patch.

Ok, thanks (though without the rest of the patch probably nothing emits it).

> > @@ -3758,6 +3772,124 @@ ipa_modify_call_arguments (struct cgraph
> >    free_dominance_info (CDI_DOMINATORS);
> >  }
> 
> You've run the above through Martin IIRC, but ...

Aldy did.

> > +/* If the expression *EXPR should be replaced by a reduction of a 
> > parameter, do
> > +   so.  ADJUSTMENTS is a pointer to a vector of adjustments.  CONVERT
> > +   specifies whether the function should care about type incompatibility 
> > the
> > +   current and new expressions.  If it is false, the function will leave
> > +   incompatibility issues to the caller.  Return true iff the expression
> > +   was modified. */
> > +
> > +bool
> > +ipa_modify_expr (tree *expr, bool convert,
> > +            ipa_parm_adjustment_vec adjustments)
> > +{
> > +  struct ipa_parm_adjustment *cand
> > +    = ipa_get_adjustment_candidate (&expr, &convert, adjustments, false);
> > +  if (!cand)
> > +    return false;
> > +
> > +  tree src;
> > +  if (cand->by_ref)
> > +    src = build_simple_mem_ref (cand->new_decl);
> 
> is this function mostly copied from elsewhere?  Because
> using build_simple_mem_ref always smells like possible TBAA problems.

Perhaps, but this is just code reorg, the same

-  if (cand->by_ref)
-    src = build_simple_mem_ref (cand->reduction);
-  else
-    src = cand->reduction;

used to sit in sra_ipa_modify_expr before.

> 
> > +  else
> > +    src = cand->new_decl;
> > +
> > +  if (dump_file && (dump_flags & TDF_DETAILS))
> > +    {
> > +      fprintf (dump_file, "About to replace expr ");
> > +      print_generic_expr (dump_file, *expr, 0);
> > +      fprintf (dump_file, " with ");
> > +      print_generic_expr (dump_file, src, 0);
> > +      fprintf (dump_file, "\n");
> > +    }
> > +
> > +  if (convert && !useless_type_conversion_p (TREE_TYPE (*expr), 
> > cand->type))
> > +    {
> > +      tree vce = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (*expr), src);
> > +      *expr = vce;
> 
> Why build1 and not fold it?  I assume from above you either have a plain 
> decl (cand->new_decl) or a MEM_REF.  For both cases simply folding
> the VCE into a MEM_REF works.

Again, preexisting code from sra_ipa_modify_expr.  Can it be changed
incrementally/independently of this?

> > +    }
> > +  else
> > +    *expr = src;
> > +  return true;
> > +}
> > +
> > +/* If T is an SSA_NAME, return NULL if it is not a default def or
> > +   return its base variable if it is.  If IGNORE_DEFAULT_DEF is true,
> > +   the base variable is always returned, regardless if it is a default
> > +   def.  Return T if it is not an SSA_NAME.  */
> > +
> > +static tree
> > +get_ssa_base_param (tree t, bool ignore_default_def)
> > +{
> > +  if (TREE_CODE (t) == SSA_NAME)
> > +    {
> > +      if (ignore_default_def || SSA_NAME_IS_DEFAULT_DEF (t))
> > +   return SSA_NAME_VAR (t);
> > +      else
> > +   return NULL_TREE;
> > +    }
> > +  return t;
> > +}
> 
> This function will return non-NULL for non-PARMs - is that intended?

Again, seems to be preexisting code from tree-sra.c.  Aldy/Martin?

> > +  /* Ignore
> > +     #pragma omp declare simd
> > +     extern int foo ();
> > +     in C, there we don't know the argument types at all.  */
> > +  if (!node->definition
> > +      && TYPE_ARG_TYPES (TREE_TYPE (node->decl)) == NULL_TREE)
> > +    return;
> 
> I wonder if you want to diagnose this case (but where?  best during
> parsing if that is allowed).

It isn't invalid per the standard, though of course if you have
#pragma omp declare simd
int foo ();
you can't supply any clauses that refer to parameters (thus, all are assumed
to be vector arguments.  If the function is defined locally and supplies
arguments there, it will have DECL_ARGUMENTS and can be handled easily,
otherwise I just chose to punt, it is too hard for too little gain.
Perhaps could warn with -Wopenmp-simd about it.  I mean to guard also
the other warnings about inability to emit simd clones with -Wopenmp-simd.

> > +      if (count == 0)
> > +   continue;
> > +
> > +      for (int i = 0; i < count * 2; i++)
> 
> Here (and also elsewhere) the patch could do with a few extra
> comments what is happening.

Ok.

> > --- gcc/passes.def  (.../trunk)     (revision 205223)
> > +++ gcc/passes.def  (.../branches/gomp-4_0-branch)  (revision 205231)
> > @@ -97,6 +97,7 @@ along with GCC; see the file COPYING3.
> >        NEXT_PASS (pass_feedback_split_functions);
> >    POP_INSERT_PASSES ()
> >    NEXT_PASS (pass_ipa_increase_alignment);
> > +  NEXT_PASS (pass_omp_simd_clone);
> >    NEXT_PASS (pass_ipa_tm);
> >    NEXT_PASS (pass_ipa_lower_emutls);
> >    TERMINATE_PASS_LIST ()
> 
> So clones are created before streaming LTO.  You do have vect.exp
> testcases that are also run through -flto but does it actually
> "work" there?  I remember seeing changes to cgraph unreachable
> node removal based on some flag that isn't streamed, no?

Aldy has done the pass placement, I wonder also whether it wouldn't be
best to put the OpenMP cloning as the very last IPA pass where all the other
cloning etc. is already done.
Right now we want to punt on IPA-CP/IPA-SRA etc. cloning of
#pragma omp declare simd functions, because if the simd clones are created
first, then cloning the origins and adjusting calls to them would lead to
the simd clones not actually being used, and if simd clones are created
late, on the other side the code isn't able to adjust "omp declare simd"
attribute (hopefully it could be taught at least e.g. about removing
arguments, either because they are unused or because they can be assumed
to be constant, we perhaps could punt only if IPA cloning wants to replace
an argument with something else).

> > +                 tree fndecl = gimple_call_fndecl (stmt), op;
> > +                 if (fndecl != NULL_TREE)
> > +                   {
> > +                     struct cgraph_node *node = cgraph_get_node (fndecl);
> > +                     if (node != NULL && node->simd_clones != NULL)
> 
> So you use node->simd_clones which also need LTO streaming.
> 
> What's the reason you cannot defer SIMD cloning to LTRANS stage
> as simple IPA pass next to IPA-PTA?

Yeah, see above.
> 
> > +                       {
> > +                         unsigned int j, n = gimple_call_num_args (stmt);
> > +                         for (j = 0; j < n; j++)
> > +                           {
> > +                             op = gimple_call_arg (stmt, j);
> > +                             if (DECL_P (op)
> > +                                 || (REFERENCE_CLASS_P (op)
> > +                                     && get_base_address (op)))
> > +                               break;
> > +                           }
> > +                         op = gimple_call_lhs (stmt);
> > +                         /* Ignore #pragma omp declare simd functions
> > +                            if they don't have data references in the
> > +                            call stmt itself.  */
> > +                         if (j == n
> > +                             && !(op
> > +                                  && (DECL_P (op)
> > +                                      || (REFERENCE_CLASS_P (op)
> > +                                          && get_base_address (op)))))
> > +                           continue;
> 
> Hmm.  I guess I have an idea now how to "better" support calls in
> data-ref/dependence analysis.  The above is fine for now - you
> might want to dump sth here if you fail because datarefs in a declare
> simd fn call.

Okay.

> > +         if (is_gimple_call (stmt))
> > +           {
> > +             /* Ignore calls with no lhs.  These must be calls to
> > +                #pragma omp simd functions, and what vectorization factor
> > +                it really needs can't be determined until
> > +                vectorizable_simd_clone_call.  */
> 
> Ick - that's bad.  Well, or rather it doesn't participate in
> vectorization factor determining then, resulting in missed
> vectorizations eventually.  You basically say "any vect factor is ok"
> here?

Right.  The thing is, if there is no lhs, I really don't know how it will
participate in the vectorization factor decision, and won't know it until
the vectorizable_simd_clone_call call, because whether a particular
clone is usable depends on which of the arguments are uniform, linear (with
what linear step) and tons of other things.
Perhaps if there is just one simd clone or all simd clones have some
non-empty set of arguments all without uniform/linear clauses, then we could
pick the smallest of those surely vector args as the one for determining
vectorization factor.  If those arguments have internal def, then the type
will be used already somewhere else in the loop to determine vf, so it is
only about parameters that are passed constant/external def values, but are
required to be in vector parameters.  But I believe
vectorizable_simd_clone_call can handle those just fine, say if you have
all types in the loop long and thus vf decisions are only for long,
so for AVX2 say vf = 4, then if you have
#pragma omp declare simd uniform (a) aligned (a : 32) linear (b)
void foo (long *a, long b, int c);
and pass constant 23 to it, then if there is a simdlen(4) clone (will be
on i?86/x86_64), then the last argument is passed in V4SImode parameter
and the code should handle it fine.  Similarly if all types are int
and there is a vector long argument passed a constant (or external def),
it will be passed in two parameters, each one containing half, and the
function should handle that too.
> 
> > +             if (STMT_VINFO_VECTYPE (stmt_info) == NULL_TREE)
> > +               {
> > +                 unsigned int j, n = gimple_call_num_args (stmt);
> > +                 for (j = 0; j < n; j++)
> > +                   {
> > +                     scalar_type = TREE_TYPE (gimple_call_arg (stmt, j));
> > +                     vectype = get_vectype_for_scalar_type (scalar_type);
> > +                     if (vectype)
> > +                       {
> > +                         STMT_VINFO_VECTYPE (stmt_info) = vectype;
> > +                         break;
> > +                       }
> > +                   }
> > +               }
> > +             if (STMT_VINFO_VECTYPE (stmt_info) != NULL_TREE)
> > +               {
> > +                 if (!analyze_pattern_stmt && gsi_end_p (pattern_def_si))
> > +                   {
> > +                     pattern_def_seq = NULL;
> > +                     gsi_next (&si);
> > +                   }
> > +                 continue;
> > +               }
> 
> Both cases above need comments - why do you chose the first param
> for determining STMT_VINFO_VECTYPE?  Isn't STMT_VINFO_VECTYPE
> completely irrelevant for calls w/o LHS?  Answer: yes it is!

It is completely irrelevant, yes.

> I'd have expected an unconditional continue here (and leave
> STMT_VINFO_VECTYPE == NULL - fact is that the vector type of
> the argument is determined by its definition and thus may
> be different from what you record here anyway).

Unfortunately it doesn't work (tried that).  The way all the
vectorizable_* functions are called in sequence, most of them
actually look at STMT_VINFO_VECTYPE before bailing out because
they are for stmts that aren't simd clone calls and thus ICE/segfault.
It was much easier to pass some non-NULL value than to change all of them.

> > +  if (stmt_can_throw_internal (stmt))
> > +    return false;
> 
> Can't happen (loop form checks).

But vectorizable_call has the same call.  So shall both be removed?

> > +  vectype = STMT_VINFO_VECTYPE (stmt_info);
> 
> See above - questionable if this doesn't result from looking at
> the LHS.

This particular function just loads it into a variable and uses
only if it has lhs.

> > +      if (thisarginfo.vectype != NULL_TREE
> > +     && loop_vinfo
> > +     && TREE_CODE (op) == SSA_NAME
> > +     && simple_iv (loop, loop_containing_stmt (stmt), op, &iv, false)
> > +     && tree_fits_shwi_p (iv.step))
> > +   {
> > +     thisarginfo.linear_step = tree_to_shwi (iv.step);
> 
> Hmm, you should check thisarginfo.dt instead (I assume this case
> is for induction/reduction defs)?  In this case you also should
> use STMT_VINFO_LOOP_PHI_EVOLUTION_PART and not re-analyze via simple_iv.

I can try that.
> 
> > +     thisarginfo.op = iv.base;
> > +   }
> > +      else if (thisarginfo.vectype == NULL_TREE
> > +          && POINTER_TYPE_P (TREE_TYPE (op)))
> > +   thisarginfo.align = get_pointer_alignment (op) / BITS_PER_UNIT;
> 
> So this is for dt_external defs?

I guess even both vect_constant_def and vect_external_def, simply something
that is uniform.

> Please switch on thisarginfo.dt here - that more naturally explains
> what you are doing (otherwise this definitely misses a comment).

> > +      this_badness += target_badness * 512;
> > +      /* FORNOW: Have to add code to add the mask argument.  */
> > +      if (n->simdclone->inbranch)
> > +   continue;
> 
> We don't support if-converting calls anyway, no?

Not yet.  Supporting them I guess depends on the
http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01268.html
http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01437.html
http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01550.html
series.  With that infrastructure, I think we could e.g. represent
the conditional calls as MASK_CALL internal call that would have
a mask argument (like MASK_LOAD/STORE), then ADDR_EXPR of the
function decl that has simd clones, plus the original arguments,
or something similar, then we'd just extract the function decl
from it in this function and just vectorize the mask argument
too and pass it through as the last argument (or set of arguments)
to the inbranch simd clone.

> > +      for (i = 0; i < nargs; i++)
> > +   {
> > +     switch (n->simdclone->args[i].arg_type)
> > +       {
> > +       case SIMD_CLONE_ARG_TYPE_VECTOR:
> > +         if (!useless_type_conversion_p
> > +                (n->simdclone->args[i].orig_type,
> > +                 TREE_TYPE (gimple_call_arg (stmt, i))))
> > +           i = -1;
> 
> But you don't verify the vectype against the clone vectype?

The code can handle vector narrowing or widening, splitting
into multiple arguments etc.  If the clone exist, we know the
corresponding vector type exists, so does the arginfo[i].vectype
that the vectorizer gives us the argument in.
The above only handles the case where arguments are promoted
from the types in TYPE_ARG_TYPES of the call/DECL_ARGUMENTS
to something wider in the GIMPLE_CALL (happens for short/char
arguments apparently).  The above code just punts on it, I don't
want to have in that function yet another full copy of narrowing/widening
conversions.  The plan was (so far unimplemented) to handle this
in tree-vect-patterns.c, if we have say char argument and pass an
int to it, if the argument is constant, we'd just fold_convert it
to the right type, if there is widening right before it, we'd use
the unwidened SSA_NAME instead, otherwise narrow.  Then vf
determination etc. would handle it right.  Does that look reasonable to you?

> > +         else if (arginfo[i].vectype == NULL_TREE
> 
> I'd like to see checks based on the def type, not vectype.

Ok.
> 
> > +                  || arginfo[i].linear_step)
> > +           this_badness += 64;
> > +         break;
> > +       case SIMD_CLONE_ARG_TYPE_UNIFORM:
> > +         if (arginfo[i].vectype != NULL_TREE)
> 
> Likewise (and below, too).

> > +  if (!vec_stmt) /* transformation not required.  */
> > +    {
> > +      STMT_VINFO_TYPE (stmt_info) = call_simd_clone_vec_info_type;
> > +      if (dump_enabled_p ())
> > +   dump_printf_loc (MSG_NOTE, vect_location,
> > +                    "=== vectorizable_simd_clone_call ===\n");
> > +/*      vect_model_simple_cost (stmt_info, ncopies, dt, NULL, NULL); */
> > +      arginfo.release ();
> 
> Please save the result from the analysis (selecting the simd clone)
> in the stmt_vinfo and skip the analysis during transform phase.

Just stick there the selected cgraph_node?

As for the cost computation commented out above, it is hard to predict it
right, probably we should at least add the cost of the scalar call, so
the vectorizable function isn't considered cheaper.  But more than that?

> > +                 vec_oprnd0
> > +                   = build3 (BIT_FIELD_REF, atype, vec_oprnd0,
> > +                             build_int_cst (integer_type_node, prec),
> > +                             build_int_cst (integer_type_node,
> > +                                            (m & (k - 1)) * prec));
> 
> Some helpers to build the tree to select a sub-vector would be nice
> (I remember seeing this kind of pattern elsewhere).

Ok, I'll try something.

> > +             new_stmt
> > +               = gimple_build_assign_with_ops (TREE_CODE (t),
> > +                                               make_ssa_name (vectype,
> > +                                                              NULL),
> > +                                               t, NULL_TREE);
> 
> For SINGLE_RHS assigns I prefer gimple_build_assign.

Okay.

> > +
> > +  /* Update the exception handling table with the vector stmt if 
> > necessary.  */
> > +  if (maybe_clean_or_replace_eh_stmt (stmt, *vec_stmt))
> > +    gimple_purge_dead_eh_edges (gimple_bb (stmt));
> 
> But you've early-outed on throwing stmts?  Generally this shouldn't 
> happen.

This is again a copy from vectorizable_call.  So, do you think it can
be dropped there too?

> Overall it looks good - it would be nice to split out and commit
> separately the IPA cloning infrastructure re-org (and the expr.c hunk).
> 
> The LTO issue needs to be addressed - the simplest thing to me looks
> to defer cloning to LTRANS stage.

Yeah, but the start should be to handle the internal calls that are used
everywhere now by #pragma omp simd too, and ubsan etc.

        Jakub

Re: [PATCH] OpenMP #pragma omp declare simd support

Reply via email to