https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114087
Oliver Kozul <oliver.ko...@rt-rk.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |oliver.ko...@rt-rk.com --- Comment #3 from Oliver Kozul <oliver.ko...@rt-rk.com> --- (In reply to Jeffrey A. Law from comment #2) > True that there's a tradeoff here. But in the spot where we're most likely > to be able to do something with these cases the loop optimizer has already > been run and any profitable LICM has already been realized. > > Unfortunately I don't see good ways to handle #2 and #3. The basic problem > is combine is limited in how many insns it processes at a time and it's not > enough to expose both constants in the equality comparison step. > > Oliver has a change for the first case though. I'm still evaluating it. My patch for the first example has been submitted and it passed all tests on Patchwork CI, so I am hopeful that it is a valid improvement. I have managed to catch the pattern mentioned by the author by searching for a PLUS expression instead of an EQ expression, as I have found that all constants are exposed there. I use this pattern when handling all three examples. This is possible since equality expressions subtract the right hand side of the expression from the left side and equate it with zero. In the meantime I also sent a patch for example two that fails in specific cases of torture testing. The patch can be seen in the following link: https://gcc.gnu.org/pipermail/gcc-patches/2024-December/671015.html The problematic edge cases for my patch are, as far as I can see, unions. When matching my proposed pattern, I do not have information about which bits the union fields occupy, therefore I do not shift them back to their original positions. An example where errors occur can be seen below: ```c extern void abort() __attribute__ ((noreturn)); struct s { unsigned long long f1 : 40; #if(__SIZEOF_INT__ >= 4) unsigned int f2 : 24; #else unsigned long int f2 : 24; #endif } sv; int main() { int f2; sv.f2 = (1 << 24) - 1; __asm__ volatile ("" : : : "memory"); ++sv.f2; f2 = sv.f2; if (f2 != 0) abort(); return 0; } ``` My pattern gets recognized when incrementing sv.f2 and generates the following assembly: ... lui a4,%hi(sv) ld a5,%lo(sv)(a4) li a3,-1 slli a2,a3,40 or a5,a5,a2 sd a5,%lo(sv)(a4) ld a5,%lo(sv)(a4) srli a3,a3,24 srli a2,a5,40 andi a2,a2,-1 addi a2,a2,1 and a5,a5,a3 or a5,a2,a5 sd a5,%lo(sv)(a4) beq a2,zero,.L2 ... Compared to the original assembly: ... lui a3,%hi(sv) ld a5,%lo(sv)(a3) li a2,-1 slli a4,a2,40 or a5,a5,a4 sd a5,%lo(sv)(a3) ld a4,%lo(sv)(a3) srli a2,a2,24 srli a5,a4,40 addiw a5,a5,1 slli a5,a5,40 and a4,a4,a2 or a4,a5,a4 sd a4,%lo(sv)(a3) beq a5,zero,.L2 ... As we can see, my code is lacking a slli instruction that would shift sv.f2 by 40 bits, therefore generating faulty results.