On Wed, 5 Sep 2018, Jakub Jelinek wrote: > On Wed, Sep 05, 2018 at 02:42:36PM +0200, Richard Biener wrote: > > IIRC you said we're already doing x % power-of-two == 0 optimized but the > > new > > code isn't in that place? > > For unsigned %, there is no need for anything special, we just > expand that as x % (power-of-two - 1) == 0, as any other % power-of-two. > > For signed x % power-of-two == N, Kyrill posted a patch 3 years ago > and it could be handled in the same spot as this too. x % power-of-two == 0 > can be expanded as (unsigned) x % (unsigned) power-of-two == 0, for > x % power-of-two == N where N is in [1 .. power-of-two - 1] we can > expand it (if cheaper) as x & (msb | (power-of-two - 1)) == N and > x % power-of-two == N where N is in [-power-of-two + 1, -1] we could > do x & (msb | (power-of-two - 1)) == (N & (msb | (power-of-two - 1))). > > > Specifically you're looking at immediate > > uses during RTL expansion which makes me a bit nervous. The code > > What's wrong about it? We don't kill the gimple statements until expansion > is done. Plus it is just a heuristic, not affecting emitted code > functionality, it just picks whether we emit it one way or another > (and only checks uses in the current bb); cfgexpand.c also uses > FOR_EACH_IMM_USE_FAST in multiple spots.
OK, it just looked misplaced... but I see we don't kill gimple stmts while expanding so stuff should be still valid. > > looks like it performs a GIMPLE transform so I think it should > > be implemented as such (and thus also not need TER restrictions). > > It isn't really a GIMPLE transform, just needs to provide the callers > with a tree. Specifically, if it passes several initial checks, it > expands the X expression unconditionally and then just uses the result of > that. What I could and probably should do is to just emit the cheaper > of the two sequences and make_tree the result thereof, so the caller > would emit only the actual comparison, rather than the modulo or its > replacement again too as the patch does. > > > Our current GIMPLE kitchen-sink for pre-expand insn recog is > > pass_optimize_widening_mul. > > We don't do anything close to this there though, use the expander to expand > quite complex tree expressions into RTL. I'm afraid the expander isn't even > sufficiently initialized there, we'd need to rewrite the expressions into > trees using magic VAR_DECLs that would map to some virtual pseudos, etc. > The only similar case to what we'd need is in ivopts, and that has quite > some code to make that work. :/ > > I'm not sure where regular jump RTL expansion happens but I > > think I remeber some TER pattern stuff there as well. > > expand_gimple_cond has: > We're sometimes presented with such code: > D.123_1 = x < y; > if (D.123_1 != 0) > ... TER stuff in it, and this patch is done next to it (with do_store_flag > being another spot). Sticking the optimization later would be much harder > (I mean e.g. into jumpif*_*. OK, I see. I was really hoping it would turn out like synth_mult or friends but given it's a more complex operation all we could do on GIMPLE would be recognizing the pattern to end up with _1 = __IFN_MOD_CMP (x, c1, c2); if (_1 != 0) ... where we then end up with the issue of efficiently combining the IFN expansion and the CC user of the result... Which would mean we'd have to handle the above in jumpif*, TERing in the __IFN (I think TER might not even handle that right now). Not pretty either. So I guess your patch is OK (with the small followup you posted). Thanks, Richard.