Re: [PATCH] Improve __builtin_mul_overflow (uns, 1U << y, &res) (PR target/83210)

Richard Biener Thu, 30 Nov 2017 01:56:28 -0800

On Thu, 30 Nov 2017, Jakub Jelinek wrote:

> On Thu, Nov 30, 2017 at 10:26:33AM +0100, Richard Biener wrote:
> > On Wed, 29 Nov 2017, Jakub Jelinek wrote:
> > 
> > > Hi!
> > > 
> > > Even if target has an umulv4_optab pattern like i?86 mul[lq]; jo,
> > > if one argument is constant power of two, using two shifts, or
> > > for multiplication by 2 just shl with js on previous value is faster
> > > (though, for -Os mul[lq]; jo wins).
> > > 
> > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> > 
> > So this is because we don't go through the standard multiplication
> > expander, right?
> 
> We sometimes go through the standard multiplication expander, sometimes
> widening multiplication expander, sometimes highpart multiplication
> expander, also sometimes with the arguments forced into registers because
> we pick various subregs of them etc.  expand_mul_overflow already now has
> a lots of cases.
> 
> >  But for icode == CODE_FOR_nothing we probably
> > should do this and then emit the compare-and-jump?
> 
> The icode == CODE_FOR_nothing check is there because typically the
> code when backend doesn't provide anything is larger or much larger, just
> see how many different expansions there are.
> 
> The patch handles the stuff fully, i.e. emits two shifts plus
> compare-and-jump, and does that for the power of 2 constants for
> icode == CODE_FOR_nothing always and otherwise (i.e. i?86/x86_64)
> only if not optimizing for size.
> 
> > And for constant multiplicators in general as well?
> 
> See the PR, I'm not convinced that is at least generally a win.
> 
> The case of two constant arguments should be handled earlier in gimple
> optimizers, and case of one constant argument that is not power of two
> means that either we'll compute the overflow by dividing by the constant
> again (if that ends up really a divide, I'm pretty sure it is always a loss,
> if it turns into multiplication, also, and if it is multiplication by some
> large constant and shifts, we can end up with a really large code)
> and compare against the original value.
> 
> > As you do it you assume that the targets optab is less efficient
> > than two shifts and a compare but how do you know?  Shouldn't we
> > do regular mult expansion and cost the resulting sequence against
> > the one from the umulv expander?
> > 
> > That said, the patch looks x86 specific but done in generic code...
> 
> Well, right now x86 is the only target that has umulv<mode>4 patterns,
> and from recent discussions with Segher and ARM folks it doesn't look like
> that is going to change at least for power/arm/aarch64 any time soon.
> 
> So, not sure if it is worth to doing expensive and hard to maintain
> expansion of both variants and pick the cheaper one right now, it is not
> about correctness.  If some further targets add these patterns and the
> current strategy doesn't work well there, we can reconsider.


Ok, thanks for the elaborate answer.

Can you add a comment indicating that?

The patch is ok with that.

Richard.

Re: [PATCH] Improve __builtin_mul_overflow (uns, 1U << y, &res) (PR target/83210)

Reply via email to