On Thu, Nov 30, 2017 at 10:26:33AM +0100, Richard Biener wrote: > On Wed, 29 Nov 2017, Jakub Jelinek wrote: > > > Hi! > > > > Even if target has an umulv4_optab pattern like i?86 mul[lq]; jo, > > if one argument is constant power of two, using two shifts, or > > for multiplication by 2 just shl with js on previous value is faster > > (though, for -Os mul[lq]; jo wins). > > > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > So this is because we don't go through the standard multiplication > expander, right?
We sometimes go through the standard multiplication expander, sometimes widening multiplication expander, sometimes highpart multiplication expander, also sometimes with the arguments forced into registers because we pick various subregs of them etc. expand_mul_overflow already now has a lots of cases. > But for icode == CODE_FOR_nothing we probably > should do this and then emit the compare-and-jump? The icode == CODE_FOR_nothing check is there because typically the code when backend doesn't provide anything is larger or much larger, just see how many different expansions there are. The patch handles the stuff fully, i.e. emits two shifts plus compare-and-jump, and does that for the power of 2 constants for icode == CODE_FOR_nothing always and otherwise (i.e. i?86/x86_64) only if not optimizing for size. > And for constant multiplicators in general as well? See the PR, I'm not convinced that is at least generally a win. The case of two constant arguments should be handled earlier in gimple optimizers, and case of one constant argument that is not power of two means that either we'll compute the overflow by dividing by the constant again (if that ends up really a divide, I'm pretty sure it is always a loss, if it turns into multiplication, also, and if it is multiplication by some large constant and shifts, we can end up with a really large code) and compare against the original value. > As you do it you assume that the targets optab is less efficient > than two shifts and a compare but how do you know? Shouldn't we > do regular mult expansion and cost the resulting sequence against > the one from the umulv expander? > > That said, the patch looks x86 specific but done in generic code... Well, right now x86 is the only target that has umulv<mode>4 patterns, and from recent discussions with Segher and ARM folks it doesn't look like that is going to change at least for power/arm/aarch64 any time soon. So, not sure if it is worth to doing expensive and hard to maintain expansion of both variants and pick the cheaper one right now, it is not about correctness. If some further targets add these patterns and the current strategy doesn't work well there, we can reconsider. Jakub