On Fri, Oct 24, 2008 at 6:27 AM, Eric Blossom <[EMAIL PROTECTED]> wrote: > On Thu, Oct 23, 2008 at 09:38:26PM -0700, Philip Balister wrote: >> I have some NEON code in the fff dotproduct routine, the qa code passes: >> >> [EMAIL >> PROTECTED]:/home/balister/oe/tmp/work/armv7a-angstrom-linux-gnueabi/gnuradio-3.1.3+svnr9809-r4.1/trunk/gnuradio-core/src/tests# >> ./test_filter >> . [generic] [cortex_a8] >> . [generic] [cortex_a8] >> . [generic] >> . [generic] >> . [generic] >> . [generic] >> .>>> gr_fir_fff: using cortex_a8 >> .. >> OK (9 tests) >> >> >> [EMAIL >> PROTECTED]:/home/balister/oe/tmp/work/armv7a-angstrom-linux-gnueabi/gnuradio-3.1.3+svnr9809-r4.1/trunk/gnuradio-core/src/tests# >> ./benchmark_dotprod_fff >> generic: taps: 256 input: 4e+07 cpu: 968.586 taps/sec: 1.057e+07 >> cortex_a8: taps: 256 input: 4e+07 cpu: 45.703 taps/sec: 2.241e+08 >> >> Philip > > Cool! > > The good news / bad news is that the spread is worse than on the P4! > > Is there a way to get the compiler to use the NEON instruction set in > scalar mode? E.g., something like -mfpmath=sse on x86? Maybe -mfp=vfp? > Are you providing the -mcpu=cortex-a8 gcc option?
The Cortex-A8 numbers use assembler to unroll the inner loop 8 times. I think this code can get better. I'll have to double check the flags, but I do not think gcc does a good job generating code for the vfp/NEON unit. (We are happy gcc can generate anything supporting NEON and not crash ...) Remember, this is clocked at 600 MHz and consumes about 1 Watt. Philip _______________________________________________ Discuss-gnuradio mailing list Discuss-gnuradio@gnu.org http://lists.gnu.org/mailman/listinfo/discuss-gnuradio