https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115551
--- Comment #2 from Tobias Burnus <burnus at gcc dot gnu.org> --- > Thus we need some range info to do this optimization. Good point. It seems as if for c1 << (c2 * a + c3), C requires a >= -c3/c2 (read as float division; c2 ≠ 0) And the suggested optimization requires c2*a >= 0 and c3 >= 0 to fulfill C requirement of nonnegative shifts. Thus, this is fulfilled for any value of 'a' if c3 >= 0 and abs(c2) > c3. The optimization can also be done for any value of 'a', if the hardware supports c1 << (negative value) (as right shift, fillung with zeros) and popcount(c1) == popcount(c1 << c3). The first condition is fulfilled in this example. I don't know about the second, but observed that Clang/LLVM optimizes the diff mask1-mask2 to 0 on ARM but not x86_64 (not checked why nor whether ARM handles negative shifts in a well-defined way or not).