On Thu, 30 Nov 2017, Jakub Jelinek wrote: > On Thu, Nov 30, 2017 at 10:26:33AM +0100, Richard Biener wrote: > > On Wed, 29 Nov 2017, Jakub Jelinek wrote: > > > > > Hi! > > > > > > Even if target has an umulv4_optab pattern like i?86 mul[lq]; jo, > > > if one argument is constant power of two, using two shifts, or > > > for multiplication by 2 just shl with js on previous value is faster > > > (though, for -Os mul[lq]; jo wins). > > > > > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > > > So this is because we don't go through the standard multiplication > > expander, right? > > We sometimes go through the standard multiplication expander, sometimes > widening multiplication expander, sometimes highpart multiplication > expander, also sometimes with the arguments forced into registers because > we pick various subregs of them etc. expand_mul_overflow already now has > a lots of cases. > > > But for icode == CODE_FOR_nothing we probably > > should do this and then emit the compare-and-jump? > > The icode == CODE_FOR_nothing check is there because typically the > code when backend doesn't provide anything is larger or much larger, just > see how many different expansions there are. > > The patch handles the stuff fully, i.e. emits two shifts plus > compare-and-jump, and does that for the power of 2 constants for > icode == CODE_FOR_nothing always and otherwise (i.e. i?86/x86_64) > only if not optimizing for size. > > > And for constant multiplicators in general as well? > > See the PR, I'm not convinced that is at least generally a win. > > The case of two constant arguments should be handled earlier in gimple > optimizers, and case of one constant argument that is not power of two > means that either we'll compute the overflow by dividing by the constant > again (if that ends up really a divide, I'm pretty sure it is always a loss, > if it turns into multiplication, also, and if it is multiplication by some > large constant and shifts, we can end up with a really large code) > and compare against the original value. > > > As you do it you assume that the targets optab is less efficient > > than two shifts and a compare but how do you know? Shouldn't we > > do regular mult expansion and cost the resulting sequence against > > the one from the umulv expander? > > > > That said, the patch looks x86 specific but done in generic code... > > Well, right now x86 is the only target that has umulv<mode>4 patterns, > and from recent discussions with Segher and ARM folks it doesn't look like > that is going to change at least for power/arm/aarch64 any time soon. > > So, not sure if it is worth to doing expensive and hard to maintain > expansion of both variants and pick the cheaper one right now, it is not > about correctness. If some further targets add these patterns and the > current strategy doesn't work well there, we can reconsider.
Ok, thanks for the elaborate answer. Can you add a comment indicating that? The patch is ok with that. Richard.