David Gilbert <david.gilb...@linaro.org> writes: > Hi Kiko, > > On 5 May 2011 15:21, Christian Robottom Reis <k...@linaro.org> wrote: >> Hey there, >> >> I was asked today in the board meeting about the use of NEON >> routines in the kernel; I said we had looked into this but hadn't done >> it because a) it wasn't conclusively better and b) if better, it would >> need to be done conditionally per-platform. But I wanted to double-check >> that's actually true (and I'm copying Vijay to keep me honest). I have >> some references: > > Not quite: > a) Neon memcpy/memset is worse on A9 than non-neon versions (better > on A8 typically)
That is not my experience at all. On the contrary, I've seen memcpy throughput on A9 roughly double with use of NEON for large copies. For small copies, plain ARM is might be faster since the overhead of preparing for a properly aligned NEON loop is avoided. What do you base your claims on? > b) In general I don't believe fpu or Neon code can be used > internally to the kernel. That is true. There is currently no support for the context save and restore it would require. >> >> http://lists.linaro.org/pipermail/linaro-toolchain/2011-January/000722.html >> >> >> http://groups.google.com/group/beagleboard/browse_thread/thread/12c7bd415fbc0993/c54dde7b9d55cf99?pli=1 >> >> http://www.spinics.net/lists/arm-kernel/msg106503.html >> >> http://dev.gentoo.org/~armin76/arm/memcpy-neon_result.txt >> >> >> https://wiki.linaro.org/WorkingGroups/ToolChain/Benchmarks/InitialMemcpy?highlight=%28memcpy%29 >> >> https://wiki.linaro.org/WorkingGroups/ToolChain/StringRoutines?highlight=%28memcpy%29 > > There may be the potential still for non-neon optimised memcpy/memset > for Cortex a9; however the kernel routines are pretty good. > >> Incidentally, this ties into the question sent earlier this week which >> had to do with Nico's work item in: >> >> https://blueprints.launchpad.net/linux-linaro/+spec/other-kernel-thumb2 >> >> Which IIRC Nico says probably isn't worth it, right? > > I thought dmart had done a lot of that? I don't see the connection between Thumb2 and memcpy performance. Thumb2 can do anything 32-bit ARM can. -- Måns Rullgård m...@mansr.com _______________________________________________ linaro-dev mailing list linaro-dev@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-dev