On Wed, Sep 05, 2018 at 02:42:36PM +0200, Richard Biener wrote:
> IIRC you said we're already doing x % power-of-two == 0 optimized but the 
> new
> code isn't in that place?

For unsigned %, there is no need for anything special, we just
expand that as x % (power-of-two - 1) == 0, as any other % power-of-two.

For signed x % power-of-two == N, Kyrill posted a patch 3 years ago
and it could be handled in the same spot as this too.  x % power-of-two == 0
can be expanded as (unsigned) x % (unsigned) power-of-two == 0, for
x % power-of-two == N where N is in [1 .. power-of-two - 1] we can
expand it (if cheaper) as x & (msb | (power-of-two - 1)) == N and
x % power-of-two == N where N is in [-power-of-two + 1, -1] we could
do x & (msb | (power-of-two - 1)) == (N & (msb | (power-of-two - 1))).

> Specifically you're looking at immediate
> uses during RTL expansion which makes me a bit nervous.  The code

What's wrong about it?  We don't kill the gimple statements until expansion
is done.  Plus it is just a heuristic, not affecting emitted code
functionality, it just picks whether we emit it one way or another
(and only checks uses in the current bb); cfgexpand.c also uses
FOR_EACH_IMM_USE_FAST in multiple spots.

> looks like it performs a GIMPLE transform so I think it should
> be implemented as such (and thus also not need TER restrictions).

It isn't really a GIMPLE transform, just needs to provide the callers
with a tree.  Specifically, if it passes several initial checks, it
expands the X expression unconditionally and then just uses the result of
that.  What I could and probably should do is to just emit the cheaper
of the two sequences and make_tree the result thereof, so the caller
would emit only the actual comparison, rather than the modulo or its
replacement again too as the patch does.

> Our current GIMPLE kitchen-sink for pre-expand insn recog is
> pass_optimize_widening_mul.

We don't do anything close to this there though, use the expander to expand
quite complex tree expressions into RTL.  I'm afraid the expander isn't even
sufficiently initialized there, we'd need to rewrite the expressions into
trees using magic VAR_DECLs that would map to some virtual pseudos, etc.
The only similar case to what we'd need is in ivopts, and that has quite
some code to make that work.

> I'm not sure where regular jump RTL expansion happens but I
> think I remeber some TER pattern stuff there as well.

expand_gimple_cond has:
    We're sometimes presented with such code:
       D.123_1 = x < y;
       if (D.123_1 != 0)
... TER stuff in it, and this patch is done next to it (with do_store_flag
being another spot).  Sticking the optimization later would be much harder
(I mean e.g. into jumpif*_*.

        Jakub

Reply via email to