https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107627
Jakub Jelinek <jakub at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jakub at gcc dot gnu.org, | |law at gcc dot gnu.org, | |uros at gcc dot gnu.org, | |vmakarov at gcc dot gnu.org Priority|P3 |P1 --- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> --- Slightly cleaned up testcase: static inline unsigned __int128 foo (unsigned long long x, unsigned long long y) { return ((unsigned __int128) x << 64) | y; } static inline unsigned long long bar (unsigned long long x, unsigned long long y, unsigned z) { return foo (x, y) >> (z % 64); } void baz (unsigned long long *x, const unsigned long long *y, unsigned z) { x[0] = bar (y[0], y[1], z); } This is pretty serious regression. Though for: static inline unsigned long long qux (unsigned int x, unsigned int y) { return ((unsigned long long) x << 32) | y; } static inline unsigned int corge (unsigned int x, unsigned int y, unsigned z) { return qux (x, y) >> (z % 32); } void garply (unsigned int *x, const unsigned int *y, unsigned z) { x[0] = corge (y[0], y[1], z); } gcc has behaved that way with -O2 -m32 for quiet some time (regressed with r11-6188-g0b76990a9d75d97b84014e37519086b81824c307 from: pushl %ebx movl 12(%esp), %ebx movl 16(%esp), %ecx movl (%ebx), %edx movl 4(%ebx), %eax shrdl %edx, %eax movl 8(%esp), %edx movl %eax, (%edx) popl %ebx ret to: pushl %edi xorl %edi, %edi pushl %esi pushl %ebx movl 20(%esp), %ebx movl 24(%esp), %ecx movl 4(%ebx), %esi movl (%ebx), %edx movl %esi, %eax orl %edi, %edx shrdl %edx, %eax movl 16(%esp), %edx movl %eax, (%edx) popl %ebx popl %esi popl %edi ret ). And with -O2 -m32 -msse2 this regressed with r6-3562-g006ba5047cea15ce6f29b0847009ae901b874d50 (addition of STV). While the r11-6188 case seems unrelated and I should probably file it as a separate PR against RTL SSA, the addition of STV for ia32 and the r13-1379 change are very much similar, they turn the ior{di,ti}3 from being split during expansion to after reload, and there is nothing that can fix it up afterwards. For -m64 on trunk we have: (insn 8 5 23 2 (set (reg:DI 95 [ *y_3(D) ]) (mem:DI (reg/v/f:DI 92 [ y ]) [1 *y_3(D)+0 S8 A64])) "pr107627.c":4:11 82 {*movdi_internal} (nil)) (insn 23 8 24 2 (clobber (reg:TI 96 [ *y_3(D) ])) "pr107627.c":4:33 -1 (nil)) (insn 24 23 21 2 (set (reg:TI 96 [ *y_3(D) ]) (const_int 0 [0])) "pr107627.c":4:33 -1 (nil)) (insn 21 24 22 2 (set (subreg:DI (reg:TI 96 [ *y_3(D) ]) 8) (reg:DI 95 [ *y_3(D) ])) "pr107627.c":4:33 82 {*movdi_internal} (expr_list:REG_DEAD (reg:DI 95 [ *y_3(D) ]) (nil))) (insn 22 21 12 2 (set (subreg:DI (reg:TI 96 [ *y_3(D) ]) 0) (const_int 0 [0])) "pr107627.c":4:33 82 {*movdi_internal} (nil)) (insn 12 22 25 2 (set (reg:DI 98 [ MEM[(const long long unsigned int *)y_3(D) + 8B] ]) (mem:DI (plus:DI (reg/v/f:DI 92 [ y ]) (const_int 8 [0x8])) [1 MEM[(const long long unsigned int *)y_3(D) + 8B]+0 S8 A64])) "pr107627.c":4:40 82 {*movdi_internal} (expr_list:REG_DEAD (reg/v/f:DI 92 [ y ]) (nil))) (insn 25 12 26 2 (clobber (reg:TI 97 [ MEM[(const long long unsigned int *)y_3(D) + 8B] ])) "pr107627.c":4:40 -1 (nil)) (insn 26 25 13 2 (set (reg:TI 97 [ MEM[(const long long unsigned int *)y_3(D) + 8B] ]) (const_int 0 [0])) "pr107627.c":4:40 -1 (nil)) (insn 13 26 14 2 (set (subreg:DI (reg:TI 97 [ MEM[(const long long unsigned int *)y_3(D) + 8B] ]) 0) (reg:DI 98 [ MEM[(const long long unsigned int *)y_3(D) + 8B] ])) "pr107627.c":4:40 82 {*movdi_internal} (expr_list:REG_DEAD (reg:DI 98 [ MEM[(const long long unsigned int *)y_3(D) + 8B] ]) (nil))) (insn 14 13 15 2 (set (subreg:DI (reg:TI 97 [ MEM[(const long long unsigned int *)y_3(D) + 8B] ]) 8) (const_int 0 [0])) "pr107627.c":4:40 82 {*movdi_internal} (nil)) (insn 15 14 16 2 (parallel [ (set (reg:TI 99) (ior:TI (reg:TI 96 [ *y_3(D) ]) (reg:TI 97 [ MEM[(const long long unsigned int *)y_3(D) + 8B] ]))) (clobber (reg:CC 17 flags)) ]) "pr107627.c":4:40 571 {*iordi3_doubleword} (expr_list:REG_DEAD (reg:TI 97 [ MEM[(const long long unsigned int *)y_3(D) + 8B] ]) (expr_list:REG_DEAD (reg:TI 96 [ *y_3(D) ]) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil))))) before combine and the combiner punts on this altogether due to the setting of parts of the pseudos with subregs, so there is no hope in even recognizing the idiom of building a 2 words integer from 2 word pieces as special. Then after RA we have postreload CSE which perhaps could help but at that point it isn't split yet, and when we get to split2, where is nothing that would propagate the const0_rtxs set to hard registers into the ior instructions and figure out they are useless. So, shall we do something in the STV pass and recognize these concats of two parts into one larger register, turn them into some define_insn_and_split which is then split after reload that would allow for better code generation, or do that in some generic pass somewhere before reload, and/or try to forward propagate those 0s after RA (after split2 I mean)?