On Wed, 2010-06-16 at 15:52 +0000, Siarhei Siamashka wrote: > Hello, > > Currently gcc (at least version 4.5.0) does a very poor job generating single > precision floating point code for ARM Cortex-A8. > > The source of this problem is the use of VFP instructions which are run on a > slow nonpipelined VFP Lite unit in Cortex-A8. Even turning on RunFast mode > (flush denormals to zero, disable exceptions) just provides a relatively > minor > performance gain.
> The right solution seems to be the use of NEON instructions for doing most of > the single precision calculations. Only in situations that the user is aware about -ffast-math. I will point out that single precision floating point operations on NEON are not completely IEEE compliant. cheers Ramana