Re: RFC: ARM Cortex-A8 and floating point performance

Andrew Pinski Wed, 16 Jun 2010 08:09:39 -0700


Sent from my iPhone

On Jun 16, 2010, at 6:04 AM, Richard Guenther <richard.guent...@gmail.com> wrote:

On Wed, Jun 16, 2010 at 5:52 PM, Siarhei Siamashka
<siarhei.siamas...@gmail.com> wrote:
Hello,
Currently gcc (at least version 4.5.0) does a very poor jobgenerating single
precision floating point code for ARM Cortex-A8.
The source of this problem is the use of VFP instructions which arerun on aslow nonpipelined VFP Lite unit in Cortex-A8. Even turning onRunFast mode(flush denormals to zero, disable exceptions) just provides arelatively minor
performance gain.
The right solution seems to be the use of NEON instructions fordoing most of
the single precision calculations.
I wonder if it would be difficult to introduce the followingchanges to the
gcc generated code when optimizing for cortex-a8:
1. Allocate single precision variables only to evenly or oddlynumbered
s-registers.
2. Instead of using 'fadds s0, s0, s2' or similar instructions, do
'vadd.f32 d0, d0, d1' instead.
The number of single precision floating point registers getseffectively
halved this way. Supporting '-mfloat-abi=hard' may be a bit tricky
(packing/unpacking of register pairs may be needed to ensure properparameterspassing to functions). Also there may be other problems, likedealing withstrict IEEE-754 compliance (maybe a special variable attribute forrelaxingcompliance requirements could be useful). But this looks like theonly
solution to fix poor performance on ARM Cortex-A8 processor.

Actually clang 2.7 seems to be working exactly this way. And it is
outperforming gcc 4.5.0 by up to a factor of 2 or 3 on some singleprecision
floating point tests that I tried on ARM Cortex-A8.
On i?86 we have -mfpmath={sse,x87}, I suppose you could add
-mfpmath=neon for arm (properly conflicting with -mfloat-abi=hard
and requiring neon support).

Except unlike sse, neon does not fully support IEEE support. So thisshould only be done with -ffast-math :). The point that it is slow isnot good enough to change it to be something that is wrong and fast.


Richard.

--
Best regards,
Siarhei Siamashka

Re: RFC: ARM Cortex-A8 and floating point performance

Reply via email to