On Sat, Dec 2, 2017 at 12:08 AM, Bill Schmidt <wschm...@linux.vnet.ibm.com> wrote: > Hi Will, > >> On Dec 1, 2017, at 3:43 PM, Will Schmidt <will_schm...@vnet.ibm.com> wrote: >> >> On Fri, 2017-12-01 at 18:46 +0100, Richard Biener wrote: >>> On December 1, 2017 6:22:21 PM GMT+01:00, Will Schmidt >>> <will_schm...@vnet.ibm.com> wrote: >>>> Hi, >>>> Add support for folding of vec_msum in GIMPLE. >>>> >>>> This uses the DOT_PROD_EXPR gimple op, which is sensitive to type >>>> mismatches: >>>> error: type mismatch in dot product reduction >>>> __vector signed int >>>> __vector signed char >>>> __vector unsigned char >>>> D.2798 = DOT_PROD_EXPR <vsc2, vuc3, vsi2>; >>>> So for those cases with a signed/unsigned mismatch in the arguments, >>>> this >>>> converts those arguments to their signed type. >>>> >>>> This also adds a define_expand for sdot_prodv16qi. This is based on a >>>> similar >>>> existing entry. >>>> >>>> Testing coverage is handled by the existing >>>> gcc.target/powerpc/fold-vec-msum*.c tests. >>>> >>>> Sniff-tests have passed on P8. full regtests currently running on >>>> other assorted >>>> power systems. >>>> OK for trunk with successful results? >>> >>> Note DOT_PROD_EXPR is only useful when the result is reduced to a scalar >>> later and the reduction order is irrelevant. >>> >>> This is because GIMPLE doesn't specify whether the reduction reduces >>> odd/even or high/low lanes of the argument vectors. Does vec_msum specify >>> that? >> >> Not that I see, but there may be an implied intent here that just isn't >> obvious to me. I'll defer to ... someone. :-) >> >>> That said, it exists as a 'hack' for the vectorizer and isn't otherwise >>> useful for GIMPLE. >> >> OK. With that in mind, should I just try to split this out into >> separate multiply and add steps? > > No. The semantics of vec_msum are very specific and can't be accurately > represented in GIMPLE. This one should be left as a call until expand.
I think some of Richards patches also remove those tree codes in favor of IFNs. More thoroughly specifying them would be nice though - I do expect most targets doing odd/even reduction for them. Fact of the unspecifiedness is that for example we can't even constant-fold those on GIMPLE! Richard. > Thanks! > Bill > >> >> Thanks, >> -Will >> >> >> >> >>> >>> >>> Richard. >>> >>>> Thanks >>>> -Will >>>> >>>> [gcc] >>>> >>>> 2017-12-01 Will Schmidt <will_schm...@vnet.ibm.com> >>>> >>>> * config/rs6000/altivec.md (sdot_prodv16qi): New. >>>> * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add support for >>>> gimple-folding of vec_msum. >>>> (builtin_function_type): Add entries for VMSUMU[BH]M and VMSUMMBM. >>>> >>>> diff --git a/gcc/config/rs6000/altivec.md >>>> b/gcc/config/rs6000/altivec.md >>>> index 7122f99..fa9e121 100644 >>>> --- a/gcc/config/rs6000/altivec.md >>>> +++ b/gcc/config/rs6000/altivec.md >>>> @@ -3349,11 +3349,26 @@ >>>> (match_operand:V8HI 2 "register_operand" "v")] >>>> UNSPEC_VMSUMSHM)))] >>>> "TARGET_ALTIVEC" >>>> " >>>> { >>>> - emit_insn (gen_altivec_vmsumshm (operands[0], operands[1], >>>> operands[2], operands[3])); >>>> + emit_insn (gen_altivec_vmsumshm (operands[0], operands[1], >>>> + operands[2], operands[3])); >>>> + DONE; >>>> +}") >>>> + >>>> +(define_expand "sdot_prodv16qi" >>>> + [(set (match_operand:V4SI 0 "register_operand" "=v") >>>> + (plus:V4SI (match_operand:V4SI 3 "register_operand" "v") >>>> + (unspec:V4SI [(match_operand:V16QI 1 >>>> "register_operand" "v") >>>> + (match_operand:V16QI 2 >>>> "register_operand" "v")] >>>> + UNSPEC_VMSUMM)))] >>>> + "TARGET_ALTIVEC" >>>> + " >>>> +{ >>>> + emit_insn (gen_altivec_vmsummbm (operands[0], operands[1], >>>> + operands[2], operands[3])); >>>> DONE; >>>> }") >>>> >>>> (define_expand "widen_usum<mode>3" >>>> [(set (match_operand:V4SI 0 "register_operand" "=v") >>>> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c >>>> index 551d9c4..552fcdd 100644 >>>> --- a/gcc/config/rs6000/rs6000.c >>>> +++ b/gcc/config/rs6000/rs6000.c >>>> @@ -16614,10 +16614,40 @@ rs6000_gimple_fold_builtin >>>> (gimple_stmt_iterator *gsi) >>>> case VSX_BUILTIN_CMPLE_2DI: >>>> case VSX_BUILTIN_CMPLE_U2DI: >>>> fold_compare_helper (gsi, LE_EXPR, stmt); >>>> return true; >>>> >>>> + /* vec_msum. */ >>>> + case ALTIVEC_BUILTIN_VMSUMUHM: >>>> + case ALTIVEC_BUILTIN_VMSUMSHM: >>>> + case ALTIVEC_BUILTIN_VMSUMUBM: >>>> + case ALTIVEC_BUILTIN_VMSUMMBM: >>>> + { >>>> + arg0 = gimple_call_arg (stmt, 0); >>>> + arg1 = gimple_call_arg (stmt, 1); >>>> + tree arg2 = gimple_call_arg (stmt, 2); >>>> + lhs = gimple_call_lhs (stmt); >>>> + if ( TREE_TYPE (arg0) == TREE_TYPE (arg1)) >>>> + g = gimple_build_assign (lhs, DOT_PROD_EXPR, arg0, arg1, arg2); >>>> + else >>>> + { >>>> + // For the case where we have a mix of signed/unsigned >>>> + // arguments, convert both multiply args to their signed type. >>>> + gimple_seq stmts = NULL; >>>> + location_t loc = gimple_location (stmt); >>>> + tree new_arg_type = signed_type_for (TREE_TYPE (arg0)); >>>> + tree signed_arg0 = gimple_convert (&stmts, loc, new_arg_type, >>>> arg0); >>>> + tree signed_arg1 = gimple_convert (&stmts, loc, new_arg_type, >>>> arg1); >>>> + gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT); >>>> + g = gimple_build_assign (lhs, DOT_PROD_EXPR, >>>> + signed_arg0, signed_arg1, arg2); >>>> + } >>>> + gimple_set_location (g, gimple_location (stmt)); >>>> + gsi_replace (gsi, g, true); >>>> + return true; >>>> + } >>>> + >>>> default: >>>> if (TARGET_DEBUG_BUILTIN) >>>> fprintf (stderr, "gimple builtin intrinsic not matched:%d %s %s\n", >>>> fn_code, fn_name1, fn_name2); >>>> break; >>>> @@ -18080,16 +18110,23 @@ builtin_function_type (machine_mode mode_ret, >>>> machine_mode mode_arg0, >>>> case CRYPTO_BUILTIN_VPERMXOR_V8HI: >>>> case CRYPTO_BUILTIN_VPERMXOR_V16QI: >>>> case CRYPTO_BUILTIN_VSHASIGMAW: >>>> case CRYPTO_BUILTIN_VSHASIGMAD: >>>> case CRYPTO_BUILTIN_VSHASIGMA: >>>> + case ALTIVEC_BUILTIN_VMSUMUHM: >>>> + case ALTIVEC_BUILTIN_VMSUMUBM: >>>> h.uns_p[0] = 1; >>>> h.uns_p[1] = 1; >>>> h.uns_p[2] = 1; >>>> h.uns_p[3] = 1; >>>> break; >>>> >>>> + /* The second parm to this vec_msum variant is unsigned. */ >>>> + case ALTIVEC_BUILTIN_VMSUMMBM: >>>> + h.uns_p[2] = 1; >>>> + break; >>>> + >>>> /* signed permute functions with unsigned char mask. */ >>>> case ALTIVEC_BUILTIN_VPERM_16QI: >>>> case ALTIVEC_BUILTIN_VPERM_8HI: >>>> case ALTIVEC_BUILTIN_VPERM_4SI: >>>> case ALTIVEC_BUILTIN_VPERM_4SF: >>> >> >> >