Re: [PATCH] Improve __builtin_mul_overflow (uns, 1U << y, &res) (PR target/83210)

Jakub Jelinek Thu, 30 Nov 2017 01:49:05 -0800

On Thu, Nov 30, 2017 at 10:26:33AM +0100, Richard Biener wrote:
> On Wed, 29 Nov 2017, Jakub Jelinek wrote:
> 
> > Hi!
> > 
> > Even if target has an umulv4_optab pattern like i?86 mul[lq]; jo,
> > if one argument is constant power of two, using two shifts, or
> > for multiplication by 2 just shl with js on previous value is faster
> > (though, for -Os mul[lq]; jo wins).
> > 
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> So this is because we don't go through the standard multiplication
> expander, right?


We sometimes go through the standard multiplication expander, sometimes
widening multiplication expander, sometimes highpart multiplication
expander, also sometimes with the arguments forced into registers because
we pick various subregs of them etc.  expand_mul_overflow already now has
a lots of cases.

>  But for icode == CODE_FOR_nothing we probably
> should do this and then emit the compare-and-jump?

The icode == CODE_FOR_nothing check is there because typically the
code when backend doesn't provide anything is larger or much larger, just
see how many different expansions there are.

The patch handles the stuff fully, i.e. emits two shifts plus
compare-and-jump, and does that for the power of 2 constants for
icode == CODE_FOR_nothing always and otherwise (i.e. i?86/x86_64)
only if not optimizing for size.

> And for constant multiplicators in general as well?

See the PR, I'm not convinced that is at least generally a win.

The case of two constant arguments should be handled earlier in gimple
optimizers, and case of one constant argument that is not power of two
means that either we'll compute the overflow by dividing by the constant
again (if that ends up really a divide, I'm pretty sure it is always a loss,
if it turns into multiplication, also, and if it is multiplication by some
large constant and shifts, we can end up with a really large code)
and compare against the original value.

> As you do it you assume that the targets optab is less efficient
> than two shifts and a compare but how do you know?  Shouldn't we
> do regular mult expansion and cost the resulting sequence against
> the one from the umulv expander?
> 
> That said, the patch looks x86 specific but done in generic code...

Well, right now x86 is the only target that has umulv<mode>4 patterns,
and from recent discussions with Segher and ARM folks it doesn't look like
that is going to change at least for power/arm/aarch64 any time soon.

So, not sure if it is worth to doing expensive and hard to maintain
expansion of both variants and pick the cheaper one right now, it is not
about correctness.  If some further targets add these patterns and the
current strategy doesn't work well there, we can reconsider.

        Jakub

Re: [PATCH] Improve __builtin_mul_overflow (uns, 1U << y, &res) (PR target/83210)

Reply via email to