On Thu, May 19, 2016 at 10:33:41AM -0500, Segher Boessenkool wrote:
> On Thu, May 19, 2016 at 10:53:41AM -0400, Michael Meissner wrote:
> > GCC 6.1 added support for the XXPERM instruction for the PowerPC ISA 3.0.
> > The
> > XXPERM instruction is essentially a 4 operand instruction, with only 3
> > operands
> > in the instruction (the target register overlaps with the first input
> > register). The Power9 hardware has fusion support where if the instruction
> > that precedes the XXPERM is a XXLOR move instruction to set the first input
> > argument, it is fused with the XXPERM. I added code to support this fusion.
> >
> > Unfortunately, in running the testsuite on the power9 simulator, we
> > discovered
> > that the test gcc.c-torture/execute/pr56866.c would fail because the fusion
> > alternatives confused the register allocator and/or the passes after the
> > register allocator. This patch removes the explicit fusion support from
> > XXPERM.
>
> Okay. Please keep the PR open until that problem is fixed. It also
> shouldn't be "target" category, if the problem is RA.
>
> > In addition, ISA 3.0 added XXPERMR and VPERMR instructions for little endian
> > support where the permute vector reverses the bytes. This patch adds
> > support
> > for XXPERMR/VPERMR.
>
> Please send that as a separate patch, it has nothing to do with the PR.
>
> > + x = gen_rtx_UNSPEC (mode,
> > + gen_rtvec (3, target, reg,
>
> Trailing space.
>
> > + if (TARGET_P9_VECTOR)
> > + {
> > + unspec = gen_rtx_UNSPEC (mode, gen_rtvec (3, op0, op1, sel),
>
> And another.
>
> > + The VNAND is preferred for future fusion opportunities. */
> > + notx = gen_rtx_NOT (V16QImode, sel);
> > + iorx = (TARGET_P8_VECTOR
> > + ? gen_rtx_IOR (V16QImode, notx, notx)
> > + : gen_rtx_AND (V16QImode, notx, notx));
> > + emit_insn (gen_rtx_SET (norreg, iorx));
> > +
>
> Some more.
>
> > +/* { dg-final { scan-assembler "vpermr\|xxpermr" } } */
>
> Tab in the middle of the line.
Here are the patches for xxpermr/vpermr support that are broken out from fixing
the xxperm fusion bug. I have built a compiler with these patches (and the
xxperm patches) and it bootstraps and does not cause a regression. Are they ok
to add to GCC 7 and eventually to GCC 6.2?
[gcc]
2016-05-23 Michael Meissner <[email protected]>
Kelvin Nilsen <[email protected]>
* config/rs6000/rs6000.c (rs6000_expand_vector_set): Generate
vpermr/xxpermr on ISA 3.0.
(altivec_expand_vec_perm_le): Likewise.
* config/rs6000/altivec.md (UNSPEC_VPERMR): New unspec.
(altivec_vpermr_<mode>_internal): Add VPERMR/XXPERMR support for
ISA 3.0.
[gcc/testsuite]
2016-05-23 Michael Meissner <[email protected]>
Kelvin Nilsen <[email protected]>
* gcc.target/powerpc/p9-vpermr.c: New test for ISA 3.0 vpermr
support.
--
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: [email protected], phone: +1 (978) 899-4797
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c (revision 236608)
+++ gcc/config/rs6000/rs6000.c (working copy)
@@ -6863,21 +6863,29 @@ rs6000_expand_vector_set (rtx target, rt
gen_rtvec (3, target, reg,
force_reg (V16QImode, x)),
UNSPEC_VPERM);
- else
+ else
{
- /* Invert selector. We prefer to generate VNAND on P8 so
- that future fusion opportunities can kick in, but must
- generate VNOR elsewhere. */
- rtx notx = gen_rtx_NOT (V16QImode, force_reg (V16QImode, x));
- rtx iorx = (TARGET_P8_VECTOR
- ? gen_rtx_IOR (V16QImode, notx, notx)
- : gen_rtx_AND (V16QImode, notx, notx));
- rtx tmp = gen_reg_rtx (V16QImode);
- emit_insn (gen_rtx_SET (tmp, iorx));
-
- /* Permute with operands reversed and adjusted selector. */
- x = gen_rtx_UNSPEC (mode, gen_rtvec (3, reg, target, tmp),
- UNSPEC_VPERM);
+ if (TARGET_P9_VECTOR)
+ x = gen_rtx_UNSPEC (mode,
+ gen_rtvec (3, target, reg,
+ force_reg (V16QImode, x)),
+ UNSPEC_VPERMR);
+ else
+ {
+ /* Invert selector. We prefer to generate VNAND on P8 so
+ that future fusion opportunities can kick in, but must
+ generate VNOR elsewhere. */
+ rtx notx = gen_rtx_NOT (V16QImode, force_reg (V16QImode, x));
+ rtx iorx = (TARGET_P8_VECTOR
+ ? gen_rtx_IOR (V16QImode, notx, notx)
+ : gen_rtx_AND (V16QImode, notx, notx));
+ rtx tmp = gen_reg_rtx (V16QImode);
+ emit_insn (gen_rtx_SET (tmp, iorx));
+
+ /* Permute with operands reversed and adjusted selector. */
+ x = gen_rtx_UNSPEC (mode, gen_rtvec (3, reg, target, tmp),
+ UNSPEC_VPERM);
+ }
}
emit_insn (gen_rtx_SET (target, x));
@@ -34365,17 +34373,25 @@ altivec_expand_vec_perm_le (rtx operands
if (!REG_P (target))
tmp = gen_reg_rtx (mode);
- /* Invert the selector with a VNAND if available, else a VNOR.
- The VNAND is preferred for future fusion opportunities. */
- notx = gen_rtx_NOT (V16QImode, sel);
- iorx = (TARGET_P8_VECTOR
- ? gen_rtx_IOR (V16QImode, notx, notx)
- : gen_rtx_AND (V16QImode, notx, notx));
- emit_insn (gen_rtx_SET (norreg, iorx));
+ if (TARGET_P9_VECTOR)
+ {
+ unspec = gen_rtx_UNSPEC (mode, gen_rtvec (3, op0, op1, sel),
+ UNSPEC_VPERMR);
+ }
+ else
+ {
+ /* Invert the selector with a VNAND if available, else a VNOR.
+ The VNAND is preferred for future fusion opportunities. */
+ notx = gen_rtx_NOT (V16QImode, sel);
+ iorx = (TARGET_P8_VECTOR
+ ? gen_rtx_IOR (V16QImode, notx, notx)
+ : gen_rtx_AND (V16QImode, notx, notx));
+ emit_insn (gen_rtx_SET (norreg, iorx));
- /* Permute with operands reversed and adjusted selector. */
- unspec = gen_rtx_UNSPEC (mode, gen_rtvec (3, op1, op0, norreg),
- UNSPEC_VPERM);
+ /* Permute with operands reversed and adjusted selector. */
+ unspec = gen_rtx_UNSPEC (mode, gen_rtvec (3, op1, op0, norreg),
+ UNSPEC_VPERM);
+ }
/* Copy into target, possibly by way of a register. */
if (!REG_P (target))
Index: gcc/config/rs6000/altivec.md
===================================================================
--- gcc/config/rs6000/altivec.md (revision 236608)
+++ gcc/config/rs6000/altivec.md (working copy)
@@ -58,6 +58,7 @@ (define_c_enum "unspec"
UNSPEC_VSUM2SWS
UNSPEC_VSUMSWS
UNSPEC_VPERM
+ UNSPEC_VPERMR
UNSPEC_VPERM_UNS
UNSPEC_VRFIN
UNSPEC_VCFUX
@@ -2032,6 +2033,19 @@ (define_expand "vec_perm_constv16qi"
FAIL;
})
+(define_insn "*altivec_vpermr_<mode>_internal"
+ [(set (match_operand:VM 0 "register_operand" "=v,?wo")
+ (unspec:VM [(match_operand:VM 1 "register_operand" "v,0")
+ (match_operand:VM 2 "register_operand" "v,wo")
+ (match_operand:V16QI 3 "register_operand" "v,wo")]
+ UNSPEC_VPERMR))]
+ "TARGET_P9_VECTOR"
+ "@
+ vpermr %0,%1,%2,%3
+ xxpermr %x0,%x2,%x3"
+ [(set_attr "type" "vecperm")
+ (set_attr "length" "4")])
+
(define_insn "altivec_vrfip" ; ceil
[(set (match_operand:V4SF 0 "register_operand" "=v")
(unspec:V4SF [(match_operand:V4SF 1 "register_operand" "v")]