The following fixes an issue in the RTL combiner where we correctly combine two vector sign-exxtends with a vector load
Trying 7, 9 -> 10: 7: r106:V4QI=[r119:DI] REG_DEAD r119:DI 9: r108:V4HI=sign_extend(vec_select(r106:V4QI#0,parallel)) 10: r109:V4SI=sign_extend(vec_select(r108:V4HI#0,parallel)) REG_DEAD r108:V4HI to modifying insn i2 9: r109:V4SI=sign_extend([r119:DI]) but since r106 is used we wrongly materialize it using a subreg: modifying insn i3 10: r106:V4QI=r109:V4SI#0 which of course does not work for modes with more than one component. Bootstrap and regtest ongoing on x86_64-unknown-linux-gnu. Note the check allows subreg:V1QI reg:V1SI (which I think is OK). There's no SCALAR_MODE_P, maybe the other checks guarantee it's an integer mode so eventually SCALAR_INT_MODE_P covers everything important (it wouldn't cover V1QI, not that that's important). OK? Or do you prefer a different check - which? Thanks, Richard. PR rtl-optimization/118662 * combine.cc (try_combine): When re-materializing a load from an extended reg by a lowpart subreg make sure we're dealing with single-component modes. * gcc.dg/torture/pr118662.c: New testcase. --- gcc/combine.cc | 5 +++++ gcc/testsuite/gcc.dg/torture/pr118662.c | 18 ++++++++++++++++++ 2 files changed, 23 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/torture/pr118662.c diff --git a/gcc/combine.cc b/gcc/combine.cc index a2d4387cebe..4849603ba5e 100644 --- a/gcc/combine.cc +++ b/gcc/combine.cc @@ -3904,6 +3904,9 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0, copy. This saves at least one insn, more if register allocation can eliminate the copy. + We cannot do this if the involved modes have more than one elements, + like for vector or complex modes. + We cannot do this if the destination of the first assignment is a condition code register. We eliminate this case by making sure the SET_DEST and SET_SRC have the same mode. @@ -3919,6 +3922,8 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0, && GET_CODE (SET_SRC (XVECEXP (newpat, 0, 0))) == SIGN_EXTEND && (GET_MODE (SET_DEST (XVECEXP (newpat, 0, 0))) == GET_MODE (SET_SRC (XVECEXP (newpat, 0, 0)))) + && known_eq (GET_MODE_NUNITS + (GET_MODE (SET_DEST (XVECEXP (newpat, 0, 0)))), 1) && GET_CODE (XVECEXP (newpat, 0, 1)) == SET && rtx_equal_p (SET_SRC (XVECEXP (newpat, 0, 1)), XEXP (SET_SRC (XVECEXP (newpat, 0, 0)), 0)) diff --git a/gcc/testsuite/gcc.dg/torture/pr118662.c b/gcc/testsuite/gcc.dg/torture/pr118662.c new file mode 100644 index 00000000000..b9e8cca0aeb --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/pr118662.c @@ -0,0 +1,18 @@ +/* { dg-do run } */ +/* { dg-additional-options "-ftree-slp-vectorize -fno-vect-cost-model" } */ +/* { dg-additional-options "-msse4" { target sse4_runtime} } */ + +int __attribute__((noipa)) addup(signed char *num) { + int val = num[0] + num[1] + num[2] + num[3]; + if (num[3] >= 0) + val++; + return val; +} + +int main(int, char *[]) +{ + signed char num[4] = {1, 1, 1, -1}; + if (addup(num) != 2) + __builtin_abort(); + return 0; +} -- 2.43.0