"Robin Dapp" <[email protected]> writes:
>> I'm really struggling to reproduce this locally, even I go back to
>> r16-3343 or r16-2382.  I see RTL like the above, and like Jakub quotes
>> in the PR, but don't see CSE mishandling it.  Not sure what I'm doing
>> wrong.
>
> For me what worked is to build from trunk on cfarm424 and test there.

Hah.  *facepalm*  I'd been using the list on the old gcc.gnu.org wiki page,
ignoring the links to cfarm.net, and didn't realise that those machines were
available.  Thanks.  And yeah, I can reproduce it on cfarm425.

(FTR, HAVE_GAS_SHF_MERGE was the difference.)

> Attached is v2, doing roughly what I described.  Further context is in the 
> commit message below.  Does that sound more reasonable?
>
> (simplify_gen_vec_select looks unused with those changes, if the patch is 
> reasonable I should probably remove it as well)
>
> Bootstrapped and regtested as before.

The subreg-of-a-pseudo case was supposed to be relatively important,
although I agree it's suspicious that you didn't see any testsuite
regressions.  Either the testsuite coverage isn't very good or that
case is also handled elsewhere.

I think the bug is specific to the hard-register case though,
where we end up recording two different values for two REGs that
share the same regno.  An artifical reduced testcase is:

unsigned char __RTL (startwith ("vregs")) foo (int *ptr)
{
  (function "foo"
    (param "ptr"
      (DECL_RTL (reg/v:DI <0> [ ptr ]))
      (DECL_RTL_INCOMING (reg:DI x0 [ ptr ]))
    )
    (insn-chain
      (block 2
        (edge-from entry (flags "FALLTHRU"))
        (cnote 1 [bb 2] NOTE_INSN_BASIC_BLOCK)
        (cnote 2 NOTE_INSN_FUNCTION_BEG)
        (insn 3 (set (reg:DI <0>) (reg:DI x0)))
        (insn 4 (set (mem:DI (reg:DI <0>) [1 ptr+0 S8 A16]) (reg:DI <1>)))
        (insn 5 (set (mem:QI (plus:DI (reg:DI <0>) (const_int 8))
                             [1 ptr+8 S1 A16])
                     (subreg:QI (reg:DI <1>) 0)))
        (insn 6 (set (reg:V8QI v1)
                     (const_vector:V8QI [(const_int 3)
                                         (const_int -4)
                                         (const_int 0)
                                         (const_int 0)
                                         (const_int 0)
                                         (const_int 0)
                                         (const_int 0)
                                         (const_int 0)])))
        (insn 7 (set (reg:QI x1)
                     (subreg:QI (reg:DI <1>) 0))
                (expr_list:REG_EQUAL (const_int 3) (nil)))
        (insn 8 (set (reg:QI x0) (const_int -4)))
        (insn 9 (use (reg:QI x0)))
        (insn 10 (use (reg:QI x1)))
        (insn 11 (use (reg:V8QI v1)))
        (edge-to exit (flags "FALLTHRU"))
      ) ;; block 2
    ) ;; insn-chain
    (crtl (return_rtx (reg:QI x0)))
  ) ;; function
}

<1> is deliberately uninitialised.  The testcase works correctly
if <1> is initialised to a sensible value.

In the original testcase, the same effect is achieved through --param
max-cse-insns.  Using the register numbers in my build, (reg:DI 673) is
initialised to (const_int 3), but this is discarded by the time that:

   (set (reg:QI 7 x7)
        (subreg:QI (reg:DI 673) 0))
   (expr_list:REG_EQUAL (const_int 3 [0x3])

is reached.  However, we still retain the equivalence from:

   (set (mem:QI (reg/f:DI 31 sp) [0  S1 A64])
        (subreg:QI (reg:DI 673) 0))

The fact that these two values are equal to 3 is reestablished by
the REG_EQUAL note, so that:

    (subreg:QI (reg:DI 673) 0)
    (mem:QI (reg/f:DI 31 sp) [0  S1 A64])

is merged with:

    (reg:QI 33 v1)
    (const_int 3 [0x3])

Zdenek's testcase is very nicely reduced :)  Bumping the default
max-cse-insns value by just 1 (to 1001) is enough to make the test pass.

If we globally replace v1 with <2> in my RTL testcase above, so that we
get (subreg:QI (reg:V8QI <2>) 0) rather than (reg:QI v1) for element 0,
then everything seems to work as expected.  And AIUI, it's normal to have
subregs of regs in the tables.

So how about keeping:

              rtx y = simplify_gen_vec_select (SET_DEST (x), i);

but skipping the following code if "y" itself is a REG?

Thanks,
Richard

>
> Regards
>  Robin
>
>
> [PATCH v2] cse: Do not simplify vec_select. [PR121649]
>
> When merging classes, cse computes new equivalences for constants.
> In the PR we have
>
>   (insn 1173 1172 1174 2 (set (reg:V8QI 33 v1)
>          (const_vector:V8QI [
>                  (const_int 3 [0x3])
>                  (const_int -4 [0xfffffffffffffffc])
>                  (const_int 0 [0]) repeated x6
>              ])) "pr121649.c":63:3 1325 {*aarch64_simd_movv8qi}
>       (nil))
>
> of which the second element is selected:
>
>   (insn 1178 1177 1179 2 (set (reg:QI 4 x4)
>           (vec_select:QI (reg:V8QI 33 v1)
>               (parallel [
>                       (const_int 1 [0x1])
>                   ]))) "pr121649.c":63:3 2968 {aarch64_get_lanev8qi}
>        (expr_list:REG_EQUAL (const_int -4 [0xfffffffffffffffc])
>           (nil)))
>
> We find (const_int 3 [0x3]) and a few others to be equivalent, among
> them (reg:QI v1).  This is a "fake set" that we create to help CSE extract 
> const_vector elements and reuse them.  Element 0 is special, though. We 
> lowpart-subreg simplify it to (reg:QI v1) directly and, as the register stays 
> the same, consider it equivalent to (reg:V8QI v1).
>
> In merge_equiv_classes, the old (reg:V8QI) equiv is deleted and replaced by
> the new (reg:QI) one, forgetting that the old equiv had 7 other elements.
> Subsequently, extracting element 1 of a zero-extended QImode register results 
> in "0" instead of the correct "-4".
> As far as I can tell, this only happens in the vec_select case.
>
> Therefore, this patch refrains from optimizing
>   (vec_select (reg:V8QI) (parallel [0]))
> into a subreg/reg and keeps it in vec_select form.
>
> vec_select handling was added in r12-4827-g68b48f3f4c4913 and I would have 
> expected the need to adjust the vec_dup case that makes use of the extracted 
> const_vector elements.  However, I manually compiled this commit's example 
> and 
> observed no changes.  aarch64's test suite shows no regressions either.
>
>       PR rtl-optimization/121649
>
> gcc/ChangeLog:
>
>       * cse.cc (find_sets_in_insn):  Do not simplify vec_select for
>       CONST_VECTOR element templates.
>
> gcc/testsuite/ChangeLog:
>
>       * gcc.dg/torture/pr121649.c: New test.
> ---
>  gcc/cse.cc                              |   5 +-
>  gcc/testsuite/gcc.dg/torture/pr121649.c | 130 ++++++++++++++++++++++++
>  2 files changed, 134 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.dg/torture/pr121649.c
>
> diff --git a/gcc/cse.cc b/gcc/cse.cc
> index 4eaef602366..dc720cec0db 100644
> --- a/gcc/cse.cc
> +++ b/gcc/cse.cc
> @@ -4296,7 +4296,10 @@ find_sets_in_insn (rtx_insn *insn, vec<struct set> 
> *psets)
>           {
>             /* These are templates and don't actually get emitted but are
>                used to tell CSE how to get to a particular constant.  */
> -           rtx y = simplify_gen_vec_select (SET_DEST (x), i);
> +           rtx tmp = gen_rtx_PARALLEL (VOIDmode,
> +                                       gen_rtvec (1, GEN_INT (i)));
> +           rtx y = gen_rtx_VEC_SELECT (GET_MODE_INNER (GET_MODE (src)),
> +                                       SET_DEST (x), tmp);
>             gcc_assert (y);
>             rtx set = gen_rtx_SET (y, CONST_VECTOR_ELT (src, i));
>             add_to_set (psets, set, true);
> diff --git a/gcc/testsuite/gcc.dg/torture/pr121649.c 
> b/gcc/testsuite/gcc.dg/torture/pr121649.c
> new file mode 100644
> index 00000000000..8af7e9b60fe
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/torture/pr121649.c
> @@ -0,0 +1,130 @@
> +/* { dg-do run { target bitint } } */
> +/* { dg-additional-options "-Wno-psabi" } */
> +
> +typedef __attribute__((__vector_size__ (8))) char v64s8;
> +typedef __attribute__((__vector_size__ (8))) short v64s16;
> +typedef __attribute__((__vector_size__ (8))) int v64s32;
> +typedef __attribute__((__vector_size__ (16))) char v128s8;
> +typedef __attribute__((__vector_size__ (16))) short v128s16;
> +typedef __attribute__((__vector_size__ (16))) int v128s32;
> +typedef __attribute__((__vector_size__ (16))) long v128s64;
> +typedef __attribute__((__vector_size__ (32))) char v256s8;
> +typedef __attribute__((__vector_size__ (32))) short v256s16;
> +typedef __attribute__((__vector_size__ (32))) int v256s32;
> +typedef __attribute__((__vector_size__ (32))) long v256s64;
> +typedef __attribute__((__vector_size__ (32))) __int128 v256s128;
> +typedef __attribute__((__vector_size__ (64))) char v512s8;
> +typedef __attribute__((__vector_size__ (64))) short v512s16;
> +typedef __attribute__((__vector_size__ (64))) int v512s32;
> +typedef __attribute__((__vector_size__ (64))) long v512s64;
> +typedef __attribute__((__vector_size__ (64))) __int128 v512s128;
> +
> +__attribute__((__noipa__)) __attribute__((__cold__)) void
> +foo0 (int, int, int, int,
> +             _BitInt (3) sb3_0, char, char, char,
> +             char, char, char, char, char,
> +             char,char,char,char,
> +             _BitInt (5), _BitInt (5), short, short, char,
> +             short, short, char, int,
> +             int, int, int, int, char,
> +             long, long, long, long, long,
> +             long, char, char, char, __int128,
> +             __int128, __int128, __int128, __int128,
> +             __int128, char, _BitInt (129), _BitInt (129),
> +             _BitInt (255), _BitInt (255), _BitInt (256), _BitInt (256),
> +             _BitInt (257), _BitInt (257), _BitInt (511), _BitInt (511),
> +             _BitInt (512), _BitInt (512), _BitInt (513), _BitInt (513),
> +             _BitInt (1023), _BitInt (1023), _BitInt (1024),
> +             _BitInt (1024), _BitInt (1025), _BitInt (1025), _BitInt (331),
> +             _BitInt (331), _BitInt (412), _BitInt (412), _BitInt (985),
> +             _BitInt (985), _BitInt (60692), _BitInt (60692), char, char,
> +             char, char, short, short, char, char, short, short, unsigned,
> +             int, v64s8, v64s8, v64s16, v64s16, v64s32, v64s32, long, long,
> +             v128s8, v128s8, short, v128s16, int, v128s32, long, v128s64,
> +             __int128, __int128, v256s8, v256s8, v256s16, v256s16, v256s32,
> +             v256s32, v256s64, v256s64, v256s128, v256s128, v512s8, v512s8,
> +             v512s16, v512s16, v512s32, v512s32, v512s64, v512s64,
> +             v512s128, v512s128, char *ret)
> +{
> +  *ret = sb3_0;
> +}
> +
> +void xprintf(const char *s, ...)
> +{
> +  __builtin_va_list arg;
> +  __builtin_va_start(arg, s);
> +  int x = __builtin_va_arg(arg, int);
> +  if (x != 0xfc)
> +    __builtin_abort();
> +  __builtin_va_end(arg);
> +}
> +
> +int
> +main ()
> +{
> +  char x;
> +  foo0 (0, 0, 0, 0, -4, 0, 0, 3, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, -11, 9, 2, 95,
> +     67, -24064, 0, 5, 6, 7, 0, 8, 9, 20, 4, 3, 6018427387903, 53,
> +     505857273496, 255, 6, 0, 18446744073709551612uwb,
> +     70141183460469231731687303715884105726uwb,
> +     0x1ab7415df78c5d6355b5bf722d2c32wb, 18266202578652419339uwb,
> +     9594516560314871492uwb, 18446744073709551615uwb, 0,
> +     404678514476015038251338572701189uwb, 0,
> +     
> 6044618658097711785492504343953926634992332820282019728792003956564819964uwb,
> +     0, 0,
> +     
> 6044618658097711785492504343953926634992332820282019728792003956564819967uwb,
> +     0,
> +     0x10000000000000000000000000000000000000000000000000000000000000000wb, 
> 0, 1, 0, 606316039853739674053402528525595147334676uwb, 0,
> +     
> 407807929942597099574024998205846127479363981533648751059337645126787177315524892796678197877342278766808798337256961671307362116160553321514205699439425uwb,
> +     0,
> +     
> 40449424473557664318357520289433168951375240783177119330601884005280028469967848339414697442203604155532867238014401323013014363877264556147687333148794044698496604162935270824544179699376777403364901791967663653826658703420189949487003960239619175525152082242122474284732uwb,
> +     
> 90772930519078902473361797697894230657273430081157732675805500963132708477322407536021120113879871393357658789768814416622492847430639474124377767893424865485276302219601246094119453082952085005768838150682342462881473913110540827237163350510684586298239947245938479716304835356329623282435531uwb,
> +     0, 0,
> +     
> 30203714894603546035770925859109268843954143792619895153655326951406405759993601526034894524347802740350892957243539456uwb,
> +     0,
> +     
> 0x24c07bd236a355dac41c3e00fed83a2c01d47d7e7c155128f8474ab8acd075fe3be2661e9ba3b0498c2wb,
> +     0,
> +     
> -0x273d7a44735564a706be6f876e8c09a64c065028fc8719feace6b4ad22b722fc6628809c1bcfda902d563717b88876aa068496bwb,
> +     0, 5, 0, 4, 0, 5, 0, 0115, 3, 92, 0, 1, 840, 839, 35, 4095, (v64s8)
> +     { }, (v64s8)
> +     { 3, -0x4 }, (v64s16)
> +     { }, (v64s16)
> +     { }, (v64s32)
> +     { }, (v64s32)
> +     { }, 7, 40, (v128s8)
> +     { }, (v128s8)
> +     { }, 126, (v128s16)
> +     { }, 95, (v128s32)
> +     { }, 6, (v128s64)
> +     { }, 18446462603027742720uwb, 11842991817553718011uwb, (v256s8)
> +     { 227, 84, 73, 5, 2, 45, 107, 55, 16, 128, 23, 17, 1, 226, 15 },
> +     (v256s8)
> +     { 9, 111, 108, 19, 98, 27, 97, -0xe }, (v256s16)
> +     { 1, 167, 65280, 35, 15286, 9, 182, 13126, 15 }, (v256s16)
> +     { 72, 381, 185, 619, 978, 926, 42 }, (v256s32)
> +     { 20, 72, 905, 207, 63, 40 }, (v256s32)
> +     { 5, 503, 1, 152, 5519391 }, (v256s64)
> +     { 1280 }, (v256s64)
> +     { 94, 8851183291103289765uwb, 526 }, (v256s128)
> +     { 15013856701581711399uwb }, (v256s128)
> +     { 10389555769421265252uwb, 15766594437788011035uwb }, (v512s8)
> +     { 104, 131, 5, 21, 32, 127, 75, 54, 232, 1, 82, 255, 28, 178, 1,
> +     121, 1, 14, 210, 240, 140, 53, 4, 91, 204, 78, 130, 76, 212, 52, 1,
> +     200, 11, 25, 251, 41, 88, 12, 3, 59 }, (v512s8)
> +     { 13, 10, 48, -0x2f, 67, 85, 87, 123, 8, 26, 83, 114, 51, 6, 96,
> +     -0x1c, 115, 49, 86, 103, 93, 31, -0x6, 112, 1, 7, -0xc, 18, 47, 5,
> +     -0xf, 3 }, (v512s16)
> +     { 4094, 64492, 855, 61680, 27, 47569, 55, 65534, 76, -1024, 30,
> +     254, 280, 11, 10, 14, 2, 4095, 3, 111, 65535, 67, 24289, 32, 25604,
> +     26, 65535, 98, 65533, 5 }, (v512s16)
> +     { 615, 2032, 54, 52, 015, 45, 24, 6, 7, 68, 24015, 380, 3315, 5, 8,
> +     23, -0xf, 62, 82, 4 }, (v512s32)
> +     { 54, 4294967294, 6, 16, 9, 13, 300, 80000000, 700, 600, 93, 040,
> +     15 }, (v512s32)
> +     { 515, 3, 14, 10, 2, 80, 39, 215, 7, 3, 4 }, (v512s64)
> +     { 4, 9, 5, 3, 505 }, (v512s64)
> +     { 904293909041052234uwb, 4 }, (v512s128)
> +     { 9223372036854775808uwb, 1, 4073709551612 }, (v512s128)
> +     { 7, 30391054, 18446742974197923807uwb }, &x);
> +  for (unsigned i = 0; i < sizeof(x); i++) xprintf("%02x", (int)(unsigned 
> char)x);
> +}

Reply via email to