https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79665

--- Comment #16 from wilco at gcc dot gnu.org ---
(In reply to wilco from comment #14)
> (In reply to PeteVine from comment #13)
> > Still, the 5% regression must have happened very recently. The fast gcc was
> > built on 20170220 and the slow one yesterday, using the original patch. Once
> > again, switching away from Cortex-A53 codegen restores the expected
> > performance.
> 
> The issue is due to inefficient code generated for unsigned modulo:
> 
>         umull   x0, w0, w4
>         umull   x1, w1, w4
>         lsr     x0, x0, 32
>         lsr     x1, x1, 32
>         lsr     w0, w0, 6
>         lsr     w1, w1, 6
> 
> It seems the Cortex-A53 scheduler isn't modelling this correctly. When I
> manually remove the redundant shifts I get a 15% speedup. I'll have a look.

See https://gcc.gnu.org/ml/gcc-patches/2017-04/msg01415.html

Reply via email to