On Thu, 5 May 2011, David Gilbert wrote:
> Yes, while I've not actually looked at coding CRC32 or the crypto things
> I agree that they feel like they have much more room for working with;
> it's outside of the scope of what I was asked to look at however.
Well, you said that the current memcpy code in the kernel is quite good,
which is nice not only because I wrote it :-) but that might indicate
that the Neon optimization efforts might have a bigger return on
the investment elsewhere.
> > The memcpy case is not interesting. Not at all. Most kernel memcpy
> > calls are for small size copies. The large copy instances are just bad
> > and misdesigned in the first place if they rely on memcpy (maybe they
> > should simply have a custom copy function, maybe implemented with Neon).
>
> Even outside the kernel vast memcpy's are fairly rare as far as I can
> tell - everyone knows they're going to hurt so people try and avoid
> them; the other thing is that people have been optimising ARM memcpy
> for decades and it appears to me to be hitting cache/bus bandwidths
> somewhere (although I don't have any figures for what those bandwidths
> are) - there may be some scope for optimising the smaller memcpy cases
> (e.g. taking advantage of things like the newer cbz to cut a few
> instructions out) - from my graphs the slope up to the point at which
> the non-neon code plateaus is quite gradual, which suggests it might
> be possible to optimise it a bit.
Indeed.
Nicolas
_______________________________________________
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev