Hi, When working around the peculiar little-endian semantics of the vperm instruction, our usual fix is to complement the permute control vector and swap the order of the two vector input operands, so that we get a double-wide vector in the proper order. We don't want to swap the operands when we are expanding a mult_highpart operation, however, as the two input operands are not to be interpreted as a double-wide vector. Instead they represent odd and even elements, and swapping the operands gets the odd and even elements reversed in the final result.
The permute for this case is generated by target-neutral code in optabs.c: expand_mult_highpart (). We obviously can't change that code directly. However, we can redirect the logic from the "case 2" method to target-specific code by implementing expansions for the umul<mode>3_highpart and smul<mode>3_highpart operations. I've done this, with the expansions acting exactly as expand_mult_highpart does today, with the exception that it swaps the input operands to the call to expand_vec_perm when we are generating little-endian code. We will later swap them back to their original position in the code in rs6000.c: altivec_expand_vec_perm_const_le (). The change has no intended effect when generating big-endian code. Bootstrapped and tested on powerpc64{,le}-unknown-linux-gnu with no new regressions. This fixes the gcc.dg/vect/pr51581-4.c test failure for little endian. Ok for trunk? Thanks, Bill 2013-10-30 Bill Schmidt <wschm...@linux.vnet.ibm.com> * config/rs6000/rs6000-protos.h (altivec_expand_mul_highpart): New prototype. * config/rs6000/rs6000.c (altivec_expand_mul_highpart): New. * config/rs6000/altivec.md (umul<mode>3_highpart): New. (smul_<mode>3_highpart): New. Index: gcc/config/rs6000/rs6000-protos.h =================================================================== --- gcc/config/rs6000/rs6000-protos.h (revision 204192) +++ gcc/config/rs6000/rs6000-protos.h (working copy) @@ -58,6 +58,7 @@ extern void rs6000_expand_vector_extract (rtx, rtx extern bool altivec_expand_vec_perm_const (rtx op[4]); extern void altivec_expand_vec_perm_le (rtx op[4]); extern bool rs6000_expand_vec_perm_const (rtx op[4]); +extern bool altivec_expand_mul_highpart (rtx op[3], bool); extern void rs6000_expand_extract_even (rtx, rtx, rtx); extern void rs6000_expand_interleave (rtx, rtx, rtx, bool); extern void build_mask64_2_operands (rtx, rtx *); Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (revision 204192) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -29249,6 +29249,58 @@ rs6000_do_expand_vec_perm (rtx target, rtx op0, rt emit_move_insn (target, x); } +/* Expand an Altivec multiply high-part. The logic matches the + general logic in optabs.c:expand_mult_highpart, but swaps the + inputs for little endian. Note that we will swap them again + during the permute; this is the one case where we don't want + the operands swapped, as if we do we get the even and odd values + reversed. */ + +bool +altivec_expand_mul_highpart (rtx operands[3], bool uns_p) +{ + struct expand_operand eops[3]; + rtx m1, m2, perm, tmp; + int i; + + optab tab1 = uns_p ? vec_widen_umult_even_optab : vec_widen_smult_even_optab; + optab tab2 = uns_p ? vec_widen_umult_odd_optab : vec_widen_umult_odd_optab; + enum machine_mode mode = GET_MODE (operands[0]); + enum insn_code icode = optab_handler (tab1, mode); + int nunits = GET_MODE_NUNITS (mode); + enum machine_mode wmode = insn_data[icode].operand[0].mode; + rtvec v = rtvec_alloc (nunits); + + create_output_operand (&eops[0], gen_reg_rtx (wmode), wmode); + create_input_operand (&eops[1], operands[1], mode); + create_input_operand (&eops[2], operands[2], mode); + expand_insn (icode, 3, eops); + m1 = gen_lowpart (mode, eops[0].value); + + create_output_operand (&eops[0], gen_reg_rtx (wmode), wmode); + create_input_operand (&eops[1], operands[1], mode); + create_input_operand (&eops[2], operands[2], mode); + expand_insn (optab_handler (tab2, mode), 3, eops); + m2 = gen_lowpart (mode, eops[0].value); + + for (i = 0; i < nunits; ++i) + RTVEC_ELT (v, i) = GEN_INT (!BYTES_BIG_ENDIAN + (i & ~1) + + ((i & 1) ? nunits : 0)); + + perm = gen_rtx_CONST_VECTOR (mode, v); + + if (!BYTES_BIG_ENDIAN) { + tmp = m1; + m1 = m2; + m2 = tmp; + } + + perm = expand_vec_perm (mode, m1, m2, perm, NULL_RTX); + emit_move_insn (operands[0], perm); + + return true; +} + /* Expand an extract even operation. */ void Index: gcc/config/rs6000/altivec.md =================================================================== --- gcc/config/rs6000/altivec.md (revision 204192) +++ gcc/config/rs6000/altivec.md (working copy) @@ -1401,6 +1401,30 @@ FAIL; }) +(define_expand "umul<mode>3_highpart" + [(match_operand:VIshort 0 "register_operand" "") + (match_operand:VIshort 1 "register_operand" "") + (match_operand:VIshort 2 "register_operand" "")] + "TARGET_ALTIVEC" +{ + if (altivec_expand_mul_highpart (operands, /*uns_p*/true)) + DONE; + else + FAIL; +}) + +(define_expand "smul<mode>3_highpart" + [(match_operand:VIshort 0 "register_operand" "") + (match_operand:VIshort 1 "register_operand" "") + (match_operand:VIshort 2 "register_operand" "")] + "TARGET_ALTIVEC" +{ + if (altivec_expand_mul_highpart (operands, /*uns_p*/false)) + DONE; + else + FAIL; +}) + (define_insn "altivec_vrfip" ; ceil [(set (match_operand:V4SF 0 "register_operand" "=v") (unspec:V4SF [(match_operand:V4SF 1 "register_operand" "v")]