Hi, This patch adds a small RTL simplification for the case where the first operand to a VEC_SELECT is another VEC_SELECT with an inverse selection function. E.g.,
(vec_select:V4SF (vec_select:V4SF (OP:V4SF) (parallel [2 3 0 1])) (parallel [2 3 0 1])) may be simplified to (OP:V4SF). This comes up in practice on powerpc64le-linux-gnu because of the characteristics of certain loads and stores in the Power7 and later ISAs. When running in little endian modes, one of these loads must be followed by a permute to reorder the vector elements, and one of these stores must be preceded by a permute to once again reorder them. Thus a simple copy may end up with two permutes whose effect cancels each other out. The patch ensures those redundancies are detected and removed. Note that it would be possible to do a more general transformation here, in which any vec_select feeding another could be replaced by a vec_select performing the composite function of the other two. I have not done this because I am unaware of this situation arising in practice. If it's desirable, I can extend the patch in this direction. Bootstrapped and tested on powerpc64[,le]-unknown-linux-gnu with no regressions. Is this ok for trunk? Thanks, Bill [gcc] 2014-04-21 Bill Schmidt <wschm...@linux.vnet.ibm.com> * simplify-rtx.c (simplify_binary_operation_1): Optimize case of nested VEC_SELECTs that are inverses of each other. [gcc/testsuite] 2014-04-21 Bill Schmidt <wschm...@linux.vnet.ibm.com> * gcc.target/powerpc/vsxcopy.c: New test. Index: gcc/simplify-rtx.c =================================================================== --- gcc/simplify-rtx.c (revision 209516) +++ gcc/simplify-rtx.c (working copy) @@ -3673,6 +3673,34 @@ simplify_binary_operation_1 (enum rtx_code code, e } } + /* If we have two nested selects that are inverses of each + other, replace them with the source operand. */ + if (GET_CODE (trueop0) == VEC_SELECT) + { + enum machine_mode reg_mode = GET_MODE (XEXP (trueop0, 0)); + rtx op0_subop1 = XEXP (trueop0, 1); + gcc_assert (VECTOR_MODE_P (reg_mode)); + gcc_assert (GET_MODE_INNER (mode) == GET_MODE_INNER (reg_mode)); + gcc_assert (GET_CODE (op0_subop1) == PARALLEL); + + if (XVECLEN (trueop1, 0) == XVECLEN (op0_subop1, 0)) + { + /* Apply the second ordering vector to the first. + If the result is { 0, 1, ..., n-1 } then the + two VEC_SELECTs cancel. */ + for (int i = 0; i < XVECLEN (trueop1, 0); ++i) + { + rtx x = XVECEXP (trueop1, 0, i); + gcc_assert (CONST_INT_P (x)); + rtx y = XVECEXP (op0_subop1, 0, INTVAL (x)); + gcc_assert (CONST_INT_P (y)); + if (i != INTVAL (y)) + return 0; + } + return XEXP (trueop0, 0); + } + } + return 0; case VEC_CONCAT: { Index: gcc/testsuite/gcc.target/powerpc/vsxcopy.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/vsxcopy.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/vsxcopy.c (working copy) @@ -0,0 +1,15 @@ +/* { dg-do compile { target { powerpc64*-*-* } } } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-O1" } */ +/* { dg-final { scan-assembler "lxvd2x" } } */ +/* { dg-final { scan-assembler "stxvd2x" } } */ +/* { dg-final { scan-assembler-not "xxpermdi" } } */ + +typedef float vecf __attribute__ ((vector_size (16))); +extern vecf j, k; + +void fun (void) +{ + j = k; +} +