On Mon, 2018-01-15 at 09:08 -0600, Segher Boessenkool wrote:
> Hi Will,
> 
> On Fri, Jan 12, 2018 at 03:22:06PM -0600, Will Schmidt wrote:
> >   Add support for gimple folding of the mergeh, mergel intrinsics.
> > Since the merge low and merge high variants are almost identical, a
> > new helper function has been added so that code can be shared.
> > 
> > This also adds define_insn for xxmrghw, xxmrglw instructions, allowing us
> > to generate xxmrglw instead of vmrglw after folding.  A few whitespace
> > fixes have been made to the existing vmrg?w defines.
> >     
> > The changes introduced here affect the existing target testcases
> > gcc.target/powerpc/builtins-1-be.c and builtins-1-le.c, such that
> > a number of the scan-assembler tests would fail due to instruction counts
> > changing.  Since the purpose of that test is to primarily ensure those
> > intrinsics are accepted by the compiler, I have disabled gimple-folding for
> > the existing tests that count instructions, and created new variants of 
> > those
> > tests with folding enabled and a higher optimization level, that do not 
> > count
> > instructions.
> 
> > 2018-01-12  Will Schmidt  <will_schm...@vnet.ibm.com>
> >     
> >     * config/rs6000/rs6000.c: (rs6000_gimple_builtin) Add gimple folding
> >     support for merge[hl].
> 
> New line here.
> 
> >     (fold_mergehl_helper): New helper function.
> >     * config/rs6000/altivec.md (altivec_xxmrghw_direct): New.
> >     (altivec_xxmrglw_direct): New.
> 
> > +(define_insn "altivec_xxmrghw_direct"
> > +  [(set (match_operand:V4SI 0 "register_operand" "=v")
> > +   (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v")
> > +                 (match_operand:V4SI 2 "register_operand" "v")]
> > +                UNSPEC_VMRGH_DIRECT))]
> > +  "TARGET_P8_VECTOR"
> > +  "xxmrghw %x0,%x1,%x2"
> > +  [(set_attr "type" "vecperm")])
> > +
> >  (define_insn "altivec_vmrghw_direct"
> >    [(set (match_operand:V4SI 0 "register_operand" "=v")
> >          (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v")
> >                        (match_operand:V4SI 2 "register_operand" "v")]
> >                       UNSPEC_VMRGH_DIRECT))]
> 
> How do these two differ?  The xx variant can write all 64 VSR registers,
> it needs different constraints (wa?).  Can the two patterns be merged?
> It doesn't need the TARGET_P8_VECTOR condition then: the constraints
> will handle that.  And actually it is a v2.06 insn (p7)?


They differ in.. 
  TARGET_P8_VECTOR, versus TARGET_ALTIVEC
  xxmrghw %x0,%x1,%x2,  versus vmrghw %0,%1,%2

Not clear to me if they can be merged.   I'm weak in my grasp of the
constraints.  I can dig into that, (and would accept additional hints
too :-)  )

xxmrghw does show up in my book as V2.06. 


In full, for reference:


(define_insn "altivec_xxmrghw_direct"
  [(set (match_operand:V4SI 0 "register_operand" "=v")
        (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v")
                      (match_operand:V4SI 2 "register_operand" "v")]
                     UNSPEC_VMRGH_DIRECT))]
  "TARGET_P8_VECTOR"
  "xxmrghw %x0,%x1,%x2"
  [(set_attr "type" "vecperm")])

(define_insn "altivec_vmrghw_direct"
  [(set (match_operand:V4SI 0 "register_operand" "=v")
        (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v")
                      (match_operand:V4SI 2 "register_operand" "v")]
                     UNSPEC_VMRGH_DIRECT))]
  "TARGET_ALTIVEC"
  "vmrghw %0,%1,%2"
  [(set_attr "type" "vecperm")])



> 
> >  (define_insn "altivec_vmrglb_direct"
> >    [(set (match_operand:V16QI 0 "register_operand" "=v")
> >          (unspec:V16QI [(match_operand:V16QI 1 "register_operand" "v")
> 
> This line should start with a tab as well?
> 
> > -                          (match_operand:V16QI 2 "register_operand" "v")]
> > -                      UNSPEC_VMRGL_DIRECT))]
> > +                  (match_operand:V16QI 2 "register_operand" "v")]
> > +                 UNSPEC_VMRGL_DIRECT))]
> >    "TARGET_ALTIVEC"
> >    "vmrglb %0,%1,%2"
> >    [(set_attr "type" "vecperm")])
> 
> 
> >  (define_insn "altivec_vmrglh_direct"
> >    [(set (match_operand:V8HI 0 "register_operand" "=v")
> >          (unspec:V8HI [(match_operand:V8HI 1 "register_operand" "v")
> 
> Same here.
> 
> >                   (match_operand:V8HI 2 "register_operand" "v")]
> > -                     UNSPEC_VMRGL_DIRECT))]
> > +                UNSPEC_VMRGL_DIRECT))]
> >    "TARGET_ALTIVEC"
> >    "vmrglh %0,%1,%2"
> >    [(set_attr "type" "vecperm")])
> 
> 
> > +(define_insn "altivec_xxmrglw_direct"
> > +  [(set (match_operand:V4SI 0 "register_operand" "=v")
> > +   (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v")
> > +                 (match_operand:V4SI 2 "register_operand" "v")]
> > +                UNSPEC_VMRGL_DIRECT))]
> > +  "TARGET_P8_VECTOR"
> > +  "xxmrglw %x0,%x1,%x2"
> > +  [(set_attr "type" "vecperm")])
> 
> Exactly analogous to mrghw comments.
> 
> > +/* Helper function to handle the vector merge[hl] built-ins.  The
> > + implementation difference between h and l versions for this code are in
> > + the values used when building of the permute vector for high word versus
> > + low word merge.  The variance is keyed off the use_high parameter.  */
> 
> The continuation lines should be indented by three spaces, so that the
> text lines up.
> 
> > +static void
> > +fold_mergehl_helper (gimple_stmt_iterator *gsi, gimple *stmt, int use_high)
> > +{
> > +  tree arg0 = gimple_call_arg (stmt, 0);
> > +  tree arg1 = gimple_call_arg (stmt, 1);
> > +  tree lhs = gimple_call_lhs (stmt);
> > +  tree lhs_type = TREE_TYPE (lhs);
> > +  tree lhs_type_type = TREE_TYPE (lhs_type);
> > +  gimple *g;
> > +  int n_elts = TYPE_VECTOR_SUBPARTS (lhs_type);
> > +  vec<constructor_elt, va_gc> *ctor_elts = NULL;
> > +  int midpoint = n_elts / 2;
> > +  int offset = 0;
> > +  if (use_high == 1)
> > +    offset = midpoint;
> > +  for (int i = 0; i < midpoint; i++)
> > +    {
> > +      tree tmp1 = build_int_cst (lhs_type_type, offset + i);
> > +      tree tmp2 = build_int_cst (lhs_type_type, offset + n_elts + i);
> > +      CONSTRUCTOR_APPEND_ELT (ctor_elts, NULL_TREE, tmp1);
> > +      CONSTRUCTOR_APPEND_ELT (ctor_elts, NULL_TREE, tmp2);
> > +    }
> > +  tree permute = create_tmp_reg_or_ssa_name (lhs_type);
> > +  g = gimple_build_assign (permute, build_constructor (lhs_type, 
> > ctor_elts));
> > +  gimple_set_location (g, gimple_location (stmt));
> > +  gsi_insert_before (gsi, g, GSI_SAME_STMT);
> > +  g = gimple_build_assign (lhs, VEC_PERM_EXPR, arg0, arg1, permute);
> > +  gimple_set_location (g, gimple_location (stmt));
> > +  gsi_replace (gsi, g, true);
> > +}
> 
> Maybe add a few empty lines, to aid reading?
> 
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/builtins-1-be-folded.c
> > @@ -0,0 +1,11 @@
> > +/* { dg-do compile { target { powerpc64-*-* } } } */
> 
> powerpc*-*-*, and if you really want to run on 64-bit only test lp64 (but
> is there any reason to?)  (Same in the other tests).

I think this was a direct copy/paste from the other tests.  I'll give it
another look and see. 

> 
> We really could use some big_endian / little_endian selectors, too; I'll
> see what I can do.  No need to hold up this patch though.
> 
> 
> Segher
> 


Reply via email to