Hi Will,

On Fri, Jan 12, 2018 at 03:22:06PM -0600, Will Schmidt wrote:
>   Add support for gimple folding of the mergeh, mergel intrinsics.
> Since the merge low and merge high variants are almost identical, a
> new helper function has been added so that code can be shared.
> 
> This also adds define_insn for xxmrghw, xxmrglw instructions, allowing us
> to generate xxmrglw instead of vmrglw after folding.  A few whitespace
> fixes have been made to the existing vmrg?w defines.
>     
> The changes introduced here affect the existing target testcases
> gcc.target/powerpc/builtins-1-be.c and builtins-1-le.c, such that
> a number of the scan-assembler tests would fail due to instruction counts
> changing.  Since the purpose of that test is to primarily ensure those
> intrinsics are accepted by the compiler, I have disabled gimple-folding for
> the existing tests that count instructions, and created new variants of those
> tests with folding enabled and a higher optimization level, that do not count
> instructions.

> 2018-01-12  Will Schmidt  <will_schm...@vnet.ibm.com>
>     
>       * config/rs6000/rs6000.c: (rs6000_gimple_builtin) Add gimple folding
>       support for merge[hl].

New line here.

>       (fold_mergehl_helper): New helper function.
>       * config/rs6000/altivec.md (altivec_xxmrghw_direct): New.
>       (altivec_xxmrglw_direct): New.

> +(define_insn "altivec_xxmrghw_direct"
> +  [(set (match_operand:V4SI 0 "register_operand" "=v")
> +     (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v")
> +                   (match_operand:V4SI 2 "register_operand" "v")]
> +                  UNSPEC_VMRGH_DIRECT))]
> +  "TARGET_P8_VECTOR"
> +  "xxmrghw %x0,%x1,%x2"
> +  [(set_attr "type" "vecperm")])
> +
>  (define_insn "altivec_vmrghw_direct"
>    [(set (match_operand:V4SI 0 "register_operand" "=v")
>          (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v")
>                        (match_operand:V4SI 2 "register_operand" "v")]
>                       UNSPEC_VMRGH_DIRECT))]

How do these two differ?  The xx variant can write all 64 VSR registers,
it needs different constraints (wa?).  Can the two patterns be merged?
It doesn't need the TARGET_P8_VECTOR condition then: the constraints
will handle that.  And actually it is a v2.06 insn (p7)?

>  (define_insn "altivec_vmrglb_direct"
>    [(set (match_operand:V16QI 0 "register_operand" "=v")
>          (unspec:V16QI [(match_operand:V16QI 1 "register_operand" "v")

This line should start with a tab as well?

> -                    (match_operand:V16QI 2 "register_operand" "v")]
> -                      UNSPEC_VMRGL_DIRECT))]
> +                    (match_operand:V16QI 2 "register_operand" "v")]
> +                   UNSPEC_VMRGL_DIRECT))]
>    "TARGET_ALTIVEC"
>    "vmrglb %0,%1,%2"
>    [(set_attr "type" "vecperm")])


>  (define_insn "altivec_vmrglh_direct"
>    [(set (match_operand:V8HI 0 "register_operand" "=v")
>          (unspec:V8HI [(match_operand:V8HI 1 "register_operand" "v")

Same here.

>                     (match_operand:V8HI 2 "register_operand" "v")]
> -                     UNSPEC_VMRGL_DIRECT))]
> +                  UNSPEC_VMRGL_DIRECT))]
>    "TARGET_ALTIVEC"
>    "vmrglh %0,%1,%2"
>    [(set_attr "type" "vecperm")])


> +(define_insn "altivec_xxmrglw_direct"
> +  [(set (match_operand:V4SI 0 "register_operand" "=v")
> +     (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v")
> +                   (match_operand:V4SI 2 "register_operand" "v")]
> +                  UNSPEC_VMRGL_DIRECT))]
> +  "TARGET_P8_VECTOR"
> +  "xxmrglw %x0,%x1,%x2"
> +  [(set_attr "type" "vecperm")])

Exactly analogous to mrghw comments.

> +/* Helper function to handle the vector merge[hl] built-ins.  The
> + implementation difference between h and l versions for this code are in
> + the values used when building of the permute vector for high word versus
> + low word merge.  The variance is keyed off the use_high parameter.  */

The continuation lines should be indented by three spaces, so that the
text lines up.

> +static void
> +fold_mergehl_helper (gimple_stmt_iterator *gsi, gimple *stmt, int use_high)
> +{
> +  tree arg0 = gimple_call_arg (stmt, 0);
> +  tree arg1 = gimple_call_arg (stmt, 1);
> +  tree lhs = gimple_call_lhs (stmt);
> +  tree lhs_type = TREE_TYPE (lhs);
> +  tree lhs_type_type = TREE_TYPE (lhs_type);
> +  gimple *g;
> +  int n_elts = TYPE_VECTOR_SUBPARTS (lhs_type);
> +  vec<constructor_elt, va_gc> *ctor_elts = NULL;
> +  int midpoint = n_elts / 2;
> +  int offset = 0;
> +  if (use_high == 1)
> +    offset = midpoint;
> +  for (int i = 0; i < midpoint; i++)
> +    {
> +      tree tmp1 = build_int_cst (lhs_type_type, offset + i);
> +      tree tmp2 = build_int_cst (lhs_type_type, offset + n_elts + i);
> +      CONSTRUCTOR_APPEND_ELT (ctor_elts, NULL_TREE, tmp1);
> +      CONSTRUCTOR_APPEND_ELT (ctor_elts, NULL_TREE, tmp2);
> +    }
> +  tree permute = create_tmp_reg_or_ssa_name (lhs_type);
> +  g = gimple_build_assign (permute, build_constructor (lhs_type, ctor_elts));
> +  gimple_set_location (g, gimple_location (stmt));
> +  gsi_insert_before (gsi, g, GSI_SAME_STMT);
> +  g = gimple_build_assign (lhs, VEC_PERM_EXPR, arg0, arg1, permute);
> +  gimple_set_location (g, gimple_location (stmt));
> +  gsi_replace (gsi, g, true);
> +}

Maybe add a few empty lines, to aid reading?

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/builtins-1-be-folded.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile { target { powerpc64-*-* } } } */

powerpc*-*-*, and if you really want to run on 64-bit only test lp64 (but
is there any reason to?)  (Same in the other tests).

We really could use some big_endian / little_endian selectors, too; I'll
see what I can do.  No need to hold up this patch though.


Segher

Reply via email to