Hi Will, On Fri, Jan 12, 2018 at 03:22:06PM -0600, Will Schmidt wrote: > Add support for gimple folding of the mergeh, mergel intrinsics. > Since the merge low and merge high variants are almost identical, a > new helper function has been added so that code can be shared. > > This also adds define_insn for xxmrghw, xxmrglw instructions, allowing us > to generate xxmrglw instead of vmrglw after folding. A few whitespace > fixes have been made to the existing vmrg?w defines. > > The changes introduced here affect the existing target testcases > gcc.target/powerpc/builtins-1-be.c and builtins-1-le.c, such that > a number of the scan-assembler tests would fail due to instruction counts > changing. Since the purpose of that test is to primarily ensure those > intrinsics are accepted by the compiler, I have disabled gimple-folding for > the existing tests that count instructions, and created new variants of those > tests with folding enabled and a higher optimization level, that do not count > instructions.
> 2018-01-12 Will Schmidt <will_schm...@vnet.ibm.com> > > * config/rs6000/rs6000.c: (rs6000_gimple_builtin) Add gimple folding > support for merge[hl]. New line here. > (fold_mergehl_helper): New helper function. > * config/rs6000/altivec.md (altivec_xxmrghw_direct): New. > (altivec_xxmrglw_direct): New. > +(define_insn "altivec_xxmrghw_direct" > + [(set (match_operand:V4SI 0 "register_operand" "=v") > + (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v") > + (match_operand:V4SI 2 "register_operand" "v")] > + UNSPEC_VMRGH_DIRECT))] > + "TARGET_P8_VECTOR" > + "xxmrghw %x0,%x1,%x2" > + [(set_attr "type" "vecperm")]) > + > (define_insn "altivec_vmrghw_direct" > [(set (match_operand:V4SI 0 "register_operand" "=v") > (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v") > (match_operand:V4SI 2 "register_operand" "v")] > UNSPEC_VMRGH_DIRECT))] How do these two differ? The xx variant can write all 64 VSR registers, it needs different constraints (wa?). Can the two patterns be merged? It doesn't need the TARGET_P8_VECTOR condition then: the constraints will handle that. And actually it is a v2.06 insn (p7)? > (define_insn "altivec_vmrglb_direct" > [(set (match_operand:V16QI 0 "register_operand" "=v") > (unspec:V16QI [(match_operand:V16QI 1 "register_operand" "v") This line should start with a tab as well? > - (match_operand:V16QI 2 "register_operand" "v")] > - UNSPEC_VMRGL_DIRECT))] > + (match_operand:V16QI 2 "register_operand" "v")] > + UNSPEC_VMRGL_DIRECT))] > "TARGET_ALTIVEC" > "vmrglb %0,%1,%2" > [(set_attr "type" "vecperm")]) > (define_insn "altivec_vmrglh_direct" > [(set (match_operand:V8HI 0 "register_operand" "=v") > (unspec:V8HI [(match_operand:V8HI 1 "register_operand" "v") Same here. > (match_operand:V8HI 2 "register_operand" "v")] > - UNSPEC_VMRGL_DIRECT))] > + UNSPEC_VMRGL_DIRECT))] > "TARGET_ALTIVEC" > "vmrglh %0,%1,%2" > [(set_attr "type" "vecperm")]) > +(define_insn "altivec_xxmrglw_direct" > + [(set (match_operand:V4SI 0 "register_operand" "=v") > + (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v") > + (match_operand:V4SI 2 "register_operand" "v")] > + UNSPEC_VMRGL_DIRECT))] > + "TARGET_P8_VECTOR" > + "xxmrglw %x0,%x1,%x2" > + [(set_attr "type" "vecperm")]) Exactly analogous to mrghw comments. > +/* Helper function to handle the vector merge[hl] built-ins. The > + implementation difference between h and l versions for this code are in > + the values used when building of the permute vector for high word versus > + low word merge. The variance is keyed off the use_high parameter. */ The continuation lines should be indented by three spaces, so that the text lines up. > +static void > +fold_mergehl_helper (gimple_stmt_iterator *gsi, gimple *stmt, int use_high) > +{ > + tree arg0 = gimple_call_arg (stmt, 0); > + tree arg1 = gimple_call_arg (stmt, 1); > + tree lhs = gimple_call_lhs (stmt); > + tree lhs_type = TREE_TYPE (lhs); > + tree lhs_type_type = TREE_TYPE (lhs_type); > + gimple *g; > + int n_elts = TYPE_VECTOR_SUBPARTS (lhs_type); > + vec<constructor_elt, va_gc> *ctor_elts = NULL; > + int midpoint = n_elts / 2; > + int offset = 0; > + if (use_high == 1) > + offset = midpoint; > + for (int i = 0; i < midpoint; i++) > + { > + tree tmp1 = build_int_cst (lhs_type_type, offset + i); > + tree tmp2 = build_int_cst (lhs_type_type, offset + n_elts + i); > + CONSTRUCTOR_APPEND_ELT (ctor_elts, NULL_TREE, tmp1); > + CONSTRUCTOR_APPEND_ELT (ctor_elts, NULL_TREE, tmp2); > + } > + tree permute = create_tmp_reg_or_ssa_name (lhs_type); > + g = gimple_build_assign (permute, build_constructor (lhs_type, ctor_elts)); > + gimple_set_location (g, gimple_location (stmt)); > + gsi_insert_before (gsi, g, GSI_SAME_STMT); > + g = gimple_build_assign (lhs, VEC_PERM_EXPR, arg0, arg1, permute); > + gimple_set_location (g, gimple_location (stmt)); > + gsi_replace (gsi, g, true); > +} Maybe add a few empty lines, to aid reading? > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/builtins-1-be-folded.c > @@ -0,0 +1,11 @@ > +/* { dg-do compile { target { powerpc64-*-* } } } */ powerpc*-*-*, and if you really want to run on 64-bit only test lp64 (but is there any reason to?) (Same in the other tests). We really could use some big_endian / little_endian selectors, too; I'll see what I can do. No need to hold up this patch though. Segher