Sent from my iPhone
On Jun 16, 2010, at 6:04 AM, Richard Guenther <richard.guent...@gmail.com
> wrote:
On Wed, Jun 16, 2010 at 5:52 PM, Siarhei Siamashka
<siarhei.siamas...@gmail.com> wrote:
Hello,
Currently gcc (at least version 4.5.0) does a very poor job
generating single
precision floating point code for ARM Cortex-A8.
The source of this problem is the use of VFP instructions which are
run on a
slow nonpipelined VFP Lite unit in Cortex-A8. Even turning on
RunFast mode
(flush denormals to zero, disable exceptions) just provides a
relatively minor
performance gain.
The right solution seems to be the use of NEON instructions for
doing most of
the single precision calculations.
I wonder if it would be difficult to introduce the following
changes to the
gcc generated code when optimizing for cortex-a8:
1. Allocate single precision variables only to evenly or oddly
numbered
s-registers.
2. Instead of using 'fadds s0, s0, s2' or similar instructions, do
'vadd.f32 d0, d0, d1' instead.
The number of single precision floating point registers gets
effectively
halved this way. Supporting '-mfloat-abi=hard' may be a bit tricky
(packing/unpacking of register pairs may be needed to ensure proper
parameters
passing to functions). Also there may be other problems, like
dealing with
strict IEEE-754 compliance (maybe a special variable attribute for
relaxing
compliance requirements could be useful). But this looks like the
only
solution to fix poor performance on ARM Cortex-A8 processor.
Actually clang 2.7 seems to be working exactly this way. And it is
outperforming gcc 4.5.0 by up to a factor of 2 or 3 on some single
precision
floating point tests that I tried on ARM Cortex-A8.
On i?86 we have -mfpmath={sse,x87}, I suppose you could add
-mfpmath=neon for arm (properly conflicting with -mfloat-abi=hard
and requiring neon support).
Except unlike sse, neon does not fully support IEEE support. So this
should only be done with -ffast-math :). The point that it is slow is
not good enough to change it to be something that is wrong and fast.
Richard.
--
Best regards,
Siarhei Siamashka