On Wed, Sep 05, 2018 at 02:42:36PM +0200, Richard Biener wrote: > IIRC you said we're already doing x % power-of-two == 0 optimized but the > new > code isn't in that place?
For unsigned %, there is no need for anything special, we just expand that as x % (power-of-two - 1) == 0, as any other % power-of-two. For signed x % power-of-two == N, Kyrill posted a patch 3 years ago and it could be handled in the same spot as this too. x % power-of-two == 0 can be expanded as (unsigned) x % (unsigned) power-of-two == 0, for x % power-of-two == N where N is in [1 .. power-of-two - 1] we can expand it (if cheaper) as x & (msb | (power-of-two - 1)) == N and x % power-of-two == N where N is in [-power-of-two + 1, -1] we could do x & (msb | (power-of-two - 1)) == (N & (msb | (power-of-two - 1))). > Specifically you're looking at immediate > uses during RTL expansion which makes me a bit nervous. The code What's wrong about it? We don't kill the gimple statements until expansion is done. Plus it is just a heuristic, not affecting emitted code functionality, it just picks whether we emit it one way or another (and only checks uses in the current bb); cfgexpand.c also uses FOR_EACH_IMM_USE_FAST in multiple spots. > looks like it performs a GIMPLE transform so I think it should > be implemented as such (and thus also not need TER restrictions). It isn't really a GIMPLE transform, just needs to provide the callers with a tree. Specifically, if it passes several initial checks, it expands the X expression unconditionally and then just uses the result of that. What I could and probably should do is to just emit the cheaper of the two sequences and make_tree the result thereof, so the caller would emit only the actual comparison, rather than the modulo or its replacement again too as the patch does. > Our current GIMPLE kitchen-sink for pre-expand insn recog is > pass_optimize_widening_mul. We don't do anything close to this there though, use the expander to expand quite complex tree expressions into RTL. I'm afraid the expander isn't even sufficiently initialized there, we'd need to rewrite the expressions into trees using magic VAR_DECLs that would map to some virtual pseudos, etc. The only similar case to what we'd need is in ivopts, and that has quite some code to make that work. > I'm not sure where regular jump RTL expansion happens but I > think I remeber some TER pattern stuff there as well. expand_gimple_cond has: We're sometimes presented with such code: D.123_1 = x < y; if (D.123_1 != 0) ... TER stuff in it, and this patch is done next to it (with do_store_flag being another spot). Sticking the optimization later would be much harder (I mean e.g. into jumpif*_*. Jakub