On Mon, 2018-01-15 at 09:08 -0600, Segher Boessenkool wrote: > Hi Will, > > On Fri, Jan 12, 2018 at 03:22:06PM -0600, Will Schmidt wrote: > > Add support for gimple folding of the mergeh, mergel intrinsics. > > Since the merge low and merge high variants are almost identical, a > > new helper function has been added so that code can be shared. > > > > This also adds define_insn for xxmrghw, xxmrglw instructions, allowing us > > to generate xxmrglw instead of vmrglw after folding. A few whitespace > > fixes have been made to the existing vmrg?w defines. > > > > The changes introduced here affect the existing target testcases > > gcc.target/powerpc/builtins-1-be.c and builtins-1-le.c, such that > > a number of the scan-assembler tests would fail due to instruction counts > > changing. Since the purpose of that test is to primarily ensure those > > intrinsics are accepted by the compiler, I have disabled gimple-folding for > > the existing tests that count instructions, and created new variants of > > those > > tests with folding enabled and a higher optimization level, that do not > > count > > instructions. > > > 2018-01-12 Will Schmidt <will_schm...@vnet.ibm.com> > > > > * config/rs6000/rs6000.c: (rs6000_gimple_builtin) Add gimple folding > > support for merge[hl]. > > New line here. > > > (fold_mergehl_helper): New helper function. > > * config/rs6000/altivec.md (altivec_xxmrghw_direct): New. > > (altivec_xxmrglw_direct): New. > > > +(define_insn "altivec_xxmrghw_direct" > > + [(set (match_operand:V4SI 0 "register_operand" "=v") > > + (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v") > > + (match_operand:V4SI 2 "register_operand" "v")] > > + UNSPEC_VMRGH_DIRECT))] > > + "TARGET_P8_VECTOR" > > + "xxmrghw %x0,%x1,%x2" > > + [(set_attr "type" "vecperm")]) > > + > > (define_insn "altivec_vmrghw_direct" > > [(set (match_operand:V4SI 0 "register_operand" "=v") > > (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v") > > (match_operand:V4SI 2 "register_operand" "v")] > > UNSPEC_VMRGH_DIRECT))] > > How do these two differ? The xx variant can write all 64 VSR registers, > it needs different constraints (wa?). Can the two patterns be merged? > It doesn't need the TARGET_P8_VECTOR condition then: the constraints > will handle that. And actually it is a v2.06 insn (p7)?
They differ in.. TARGET_P8_VECTOR, versus TARGET_ALTIVEC xxmrghw %x0,%x1,%x2, versus vmrghw %0,%1,%2 Not clear to me if they can be merged. I'm weak in my grasp of the constraints. I can dig into that, (and would accept additional hints too :-) ) xxmrghw does show up in my book as V2.06. In full, for reference: (define_insn "altivec_xxmrghw_direct" [(set (match_operand:V4SI 0 "register_operand" "=v") (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v") (match_operand:V4SI 2 "register_operand" "v")] UNSPEC_VMRGH_DIRECT))] "TARGET_P8_VECTOR" "xxmrghw %x0,%x1,%x2" [(set_attr "type" "vecperm")]) (define_insn "altivec_vmrghw_direct" [(set (match_operand:V4SI 0 "register_operand" "=v") (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v") (match_operand:V4SI 2 "register_operand" "v")] UNSPEC_VMRGH_DIRECT))] "TARGET_ALTIVEC" "vmrghw %0,%1,%2" [(set_attr "type" "vecperm")]) > > > (define_insn "altivec_vmrglb_direct" > > [(set (match_operand:V16QI 0 "register_operand" "=v") > > (unspec:V16QI [(match_operand:V16QI 1 "register_operand" "v") > > This line should start with a tab as well? > > > - (match_operand:V16QI 2 "register_operand" "v")] > > - UNSPEC_VMRGL_DIRECT))] > > + (match_operand:V16QI 2 "register_operand" "v")] > > + UNSPEC_VMRGL_DIRECT))] > > "TARGET_ALTIVEC" > > "vmrglb %0,%1,%2" > > [(set_attr "type" "vecperm")]) > > > > (define_insn "altivec_vmrglh_direct" > > [(set (match_operand:V8HI 0 "register_operand" "=v") > > (unspec:V8HI [(match_operand:V8HI 1 "register_operand" "v") > > Same here. > > > (match_operand:V8HI 2 "register_operand" "v")] > > - UNSPEC_VMRGL_DIRECT))] > > + UNSPEC_VMRGL_DIRECT))] > > "TARGET_ALTIVEC" > > "vmrglh %0,%1,%2" > > [(set_attr "type" "vecperm")]) > > > > +(define_insn "altivec_xxmrglw_direct" > > + [(set (match_operand:V4SI 0 "register_operand" "=v") > > + (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v") > > + (match_operand:V4SI 2 "register_operand" "v")] > > + UNSPEC_VMRGL_DIRECT))] > > + "TARGET_P8_VECTOR" > > + "xxmrglw %x0,%x1,%x2" > > + [(set_attr "type" "vecperm")]) > > Exactly analogous to mrghw comments. > > > +/* Helper function to handle the vector merge[hl] built-ins. The > > + implementation difference between h and l versions for this code are in > > + the values used when building of the permute vector for high word versus > > + low word merge. The variance is keyed off the use_high parameter. */ > > The continuation lines should be indented by three spaces, so that the > text lines up. > > > +static void > > +fold_mergehl_helper (gimple_stmt_iterator *gsi, gimple *stmt, int use_high) > > +{ > > + tree arg0 = gimple_call_arg (stmt, 0); > > + tree arg1 = gimple_call_arg (stmt, 1); > > + tree lhs = gimple_call_lhs (stmt); > > + tree lhs_type = TREE_TYPE (lhs); > > + tree lhs_type_type = TREE_TYPE (lhs_type); > > + gimple *g; > > + int n_elts = TYPE_VECTOR_SUBPARTS (lhs_type); > > + vec<constructor_elt, va_gc> *ctor_elts = NULL; > > + int midpoint = n_elts / 2; > > + int offset = 0; > > + if (use_high == 1) > > + offset = midpoint; > > + for (int i = 0; i < midpoint; i++) > > + { > > + tree tmp1 = build_int_cst (lhs_type_type, offset + i); > > + tree tmp2 = build_int_cst (lhs_type_type, offset + n_elts + i); > > + CONSTRUCTOR_APPEND_ELT (ctor_elts, NULL_TREE, tmp1); > > + CONSTRUCTOR_APPEND_ELT (ctor_elts, NULL_TREE, tmp2); > > + } > > + tree permute = create_tmp_reg_or_ssa_name (lhs_type); > > + g = gimple_build_assign (permute, build_constructor (lhs_type, > > ctor_elts)); > > + gimple_set_location (g, gimple_location (stmt)); > > + gsi_insert_before (gsi, g, GSI_SAME_STMT); > > + g = gimple_build_assign (lhs, VEC_PERM_EXPR, arg0, arg1, permute); > > + gimple_set_location (g, gimple_location (stmt)); > > + gsi_replace (gsi, g, true); > > +} > > Maybe add a few empty lines, to aid reading? > > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/powerpc/builtins-1-be-folded.c > > @@ -0,0 +1,11 @@ > > +/* { dg-do compile { target { powerpc64-*-* } } } */ > > powerpc*-*-*, and if you really want to run on 64-bit only test lp64 (but > is there any reason to?) (Same in the other tests). I think this was a direct copy/paste from the other tests. I'll give it another look and see. > > We really could use some big_endian / little_endian selectors, too; I'll > see what I can do. No need to hold up this patch though. > > > Segher >