wmul-1.c regression on aarch64 after r257077

ktkachov at gcc dot gnu.org Mon, 29 Jan 2018 02:31:18 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84067


--- Comment #6 from ktkachov at gcc dot gnu.org ---
(In reply to rguent...@suse.de from comment #5)
> On Mon, 29 Jan 2018, ktkachov at gcc dot gnu.org wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84067
> > 
> > --- Comment #3 from ktkachov at gcc dot gnu.org ---
> > (In reply to Richard Biener from comment #2)
> > > So any hint on whether the code after r257077 is better or worse than 
> > > before?
> > 
> > Looks worse unfortunately:
> > For aarch64 at -O2 it generates:
> > foo:
> >         mov     w3, 44
> >         mov     w2, 40
> >         mov     w5, 1
> >         mov     w4, 2
> >         smull   x3, w1, w3
> >         smull   x2, w1, w2
> >         str     w5, [x0, x3]
> >         add     x2, x2, 400
> >         add     x1, x2, x1, sxtw 2
> >         str     w4, [x0, x1]
> >         ret
> > 
> > whereas with r257077 it generates the shorter:
> > foo:
> >         mov     w3, 40
> >         sxtw    x2, w1
> >         mov     w4, 1
> >         smaddl  x0, w1, w3, x0
> >         mov     w3, 2
> >         add     x1, x0, x2, lsl 2
> >         str     w4, [x0, x2, lsl 2]
> >         str     w3, [x1, 400]
> >         ret
> 
> So shorter is worse?  Might be because I don't understand the
> difference between the 'lsl 2' and the 'sxtw 2' or the cost
> of the [x1, 400] addressing.

Sorry, I messed up the writeup. Let me try again.
The shorter sequence (with the smaddl) is the good one and is produced
*without* r257077. After r257077 we generate the longer and worse sequence with
two smull.

[Bug middle-end/84067] [8 regression] gcc.dg/wmul-1.c regression on aarch64 after r257077

Reply via email to