https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107627

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org,
                   |                            |law at gcc dot gnu.org,
                   |                            |uros at gcc dot gnu.org,
                   |                            |vmakarov at gcc dot gnu.org
           Priority|P3                          |P1

--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Slightly cleaned up testcase:
static inline unsigned __int128
foo (unsigned long long x, unsigned long long y)
{
  return ((unsigned __int128) x << 64) | y;
}

static inline unsigned long long
bar (unsigned long long x, unsigned long long y, unsigned z)
{
  return foo (x, y) >> (z % 64);
}

void
baz (unsigned long long *x, const unsigned long long *y, unsigned z)
{
  x[0] = bar (y[0], y[1], z);
}

This is pretty serious regression.  Though for:
static inline unsigned long long
qux (unsigned int x, unsigned int y)
{
  return ((unsigned long long) x << 32) | y;
}

static inline unsigned int
corge (unsigned int x, unsigned int y, unsigned z)
{
  return qux (x, y) >> (z % 32);
}

void
garply (unsigned int *x, const unsigned int *y, unsigned z)
{
  x[0] = corge (y[0], y[1], z);
}
gcc has behaved that way with -O2 -m32 for quiet some time (regressed
with r11-6188-g0b76990a9d75d97b84014e37519086b81824c307 from:
        pushl   %ebx
        movl    12(%esp), %ebx
        movl    16(%esp), %ecx
        movl    (%ebx), %edx
        movl    4(%ebx), %eax
        shrdl   %edx, %eax
        movl    8(%esp), %edx
        movl    %eax, (%edx)
        popl    %ebx
        ret
to:
        pushl   %edi
        xorl    %edi, %edi
        pushl   %esi
        pushl   %ebx
        movl    20(%esp), %ebx
        movl    24(%esp), %ecx
        movl    4(%ebx), %esi
        movl    (%ebx), %edx
        movl    %esi, %eax
        orl     %edi, %edx
        shrdl   %edx, %eax
        movl    16(%esp), %edx
        movl    %eax, (%edx)
        popl    %ebx
        popl    %esi
        popl    %edi
        ret
).  And with -O2 -m32 -msse2 this regressed with
r6-3562-g006ba5047cea15ce6f29b0847009ae901b874d50 (addition of STV).
While the r11-6188 case seems unrelated and I should probably file it
as a separate PR against RTL SSA, the addition of STV for ia32 and
the r13-1379 change are very much similar, they turn the ior{di,ti}3 from
being split during expansion to after reload, and there is nothing that can
fix it up afterwards.  For -m64 on trunk we have:
(insn 8 5 23 2 (set (reg:DI 95 [ *y_3(D) ])
        (mem:DI (reg/v/f:DI 92 [ y ]) [1 *y_3(D)+0 S8 A64])) "pr107627.c":4:11
82 {*movdi_internal}
     (nil))
(insn 23 8 24 2 (clobber (reg:TI 96 [ *y_3(D) ])) "pr107627.c":4:33 -1
     (nil))
(insn 24 23 21 2 (set (reg:TI 96 [ *y_3(D) ])
        (const_int 0 [0])) "pr107627.c":4:33 -1
     (nil))
(insn 21 24 22 2 (set (subreg:DI (reg:TI 96 [ *y_3(D) ]) 8)
        (reg:DI 95 [ *y_3(D) ])) "pr107627.c":4:33 82 {*movdi_internal}
     (expr_list:REG_DEAD (reg:DI 95 [ *y_3(D) ])
        (nil)))
(insn 22 21 12 2 (set (subreg:DI (reg:TI 96 [ *y_3(D) ]) 0)
        (const_int 0 [0])) "pr107627.c":4:33 82 {*movdi_internal}
     (nil))
(insn 12 22 25 2 (set (reg:DI 98 [ MEM[(const long long unsigned int *)y_3(D) +
8B] ])
        (mem:DI (plus:DI (reg/v/f:DI 92 [ y ])
                (const_int 8 [0x8])) [1 MEM[(const long long unsigned int
*)y_3(D) + 8B]+0 S8 A64])) "pr107627.c":4:40 82 {*movdi_internal}
     (expr_list:REG_DEAD (reg/v/f:DI 92 [ y ])
        (nil)))
(insn 25 12 26 2 (clobber (reg:TI 97 [ MEM[(const long long unsigned int
*)y_3(D) + 8B] ])) "pr107627.c":4:40 -1
     (nil))
(insn 26 25 13 2 (set (reg:TI 97 [ MEM[(const long long unsigned int *)y_3(D) +
8B] ])
        (const_int 0 [0])) "pr107627.c":4:40 -1
     (nil))
(insn 13 26 14 2 (set (subreg:DI (reg:TI 97 [ MEM[(const long long unsigned int
*)y_3(D) + 8B] ]) 0)
        (reg:DI 98 [ MEM[(const long long unsigned int *)y_3(D) + 8B] ]))
"pr107627.c":4:40 82 {*movdi_internal}
     (expr_list:REG_DEAD (reg:DI 98 [ MEM[(const long long unsigned int
*)y_3(D) + 8B] ])
        (nil)))
(insn 14 13 15 2 (set (subreg:DI (reg:TI 97 [ MEM[(const long long unsigned int
*)y_3(D) + 8B] ]) 8)
        (const_int 0 [0])) "pr107627.c":4:40 82 {*movdi_internal}
     (nil))
(insn 15 14 16 2 (parallel [
            (set (reg:TI 99)
                (ior:TI (reg:TI 96 [ *y_3(D) ])
                    (reg:TI 97 [ MEM[(const long long unsigned int *)y_3(D) +
8B] ])))
            (clobber (reg:CC 17 flags))
        ]) "pr107627.c":4:40 571 {*iordi3_doubleword}
     (expr_list:REG_DEAD (reg:TI 97 [ MEM[(const long long unsigned int
*)y_3(D) + 8B] ])
        (expr_list:REG_DEAD (reg:TI 96 [ *y_3(D) ])
            (expr_list:REG_UNUSED (reg:CC 17 flags)
                (nil)))))
before combine and the combiner punts on this altogether due to the setting of
parts of the pseudos with subregs,
so there is no hope in even recognizing the idiom of building a 2 words integer
from 2 word pieces as special.
Then after RA we have postreload CSE which perhaps could help but at that point
it isn't split yet,
and when we get to split2, where is nothing that would propagate the
const0_rtxs set to hard registers
into the ior instructions and figure out they are useless.

So, shall we do something in the STV pass and recognize these concats of two
parts into one larger register,
turn them into some define_insn_and_split which is then split after reload that
would allow for better
code generation, or do that in some generic pass somewhere before reload,
and/or try to forward propagate
those 0s after RA (after split2 I mean)?

Reply via email to