https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115551

--- Comment #2 from Tobias Burnus <burnus at gcc dot gnu.org> ---
> Thus we need some range info to do this optimization.

Good point.

It seems as if for c1 << (c2 * a + c3),  C requires a >= -c3/c2 (read as float
division; c2 ≠ 0)

And the suggested optimization requires c2*a >= 0 and c3 >= 0 to fulfill C
requirement of nonnegative shifts.

Thus, this is fulfilled for any value of 'a' if c3 >= 0 and abs(c2) > c3.


The optimization can also be done for any value of 'a', if the hardware
supports c1 << (negative value)  (as right shift, fillung with zeros) and
popcount(c1) == popcount(c1 << c3).


The first condition is fulfilled in this example.

I don't know about the second, but observed that Clang/LLVM optimizes the diff
mask1-mask2 to 0 on ARM but not x86_64 (not checked why nor whether ARM handles
negative shifts in a well-defined way or not).

Reply via email to