On Thu, 5 May 2011, David Gilbert wrote:

> Yes, while I've not actually looked at coding CRC32 or the crypto things
> I agree that they feel like they have much more room for working with;
> it's outside of the scope of what I was asked to look at however.

Well, you said that the current memcpy code in the kernel is quite good, 
which is nice not only because I wrote it :-) but that might indicate 
that the Neon optimization efforts might have a bigger return on 
the investment elsewhere.

> > The memcpy case is not interesting.  Not at all.  Most kernel memcpy
> > calls are for small size copies.  The large copy instances are just bad
> > and misdesigned in the first place if they rely on memcpy (maybe they
> > should simply have a custom copy function, maybe implemented with Neon).
> 
> Even outside the kernel vast memcpy's are fairly rare as far as I can 
> tell - everyone knows they're going to hurt so people try and avoid 
> them; the other thing is that people have been optimising ARM memcpy 
> for decades and it appears to me to be hitting cache/bus bandwidths 
> somewhere (although I don't have any figures for what those bandwidths 
> are) - there may be some scope for optimising the smaller memcpy cases 
> (e.g. taking advantage of things like the newer cbz to cut a few 
> instructions out) - from my graphs the slope up to the point at which 
> the non-neon code plateaus is quite gradual, which suggests it might 
> be possible to optimise it a bit.

Indeed.


Nicolas
_______________________________________________
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Reply via email to