Re: Optimized kernel memcpy/memset

2012-05-21 Thread jackiele
Hi there, I would like to do the same things with you guys in my devkit8000 board and see the performance. But since the kernel does not support NEON by default, how can we enable that SIMD? Could someone give me a point to figure out? ___ linaro-d

Re: Optimized kernel memcpy/memset

2011-05-06 Thread Konstantinos Margaritis
On 6 May 2011 19:57, David Gilbert wrote: > 2011/5/6 Christian Robottom Reis : > I don't think there are that many things that are vastly useful for the > kernel, > but here is a summary (I intend to write a full report at some point but > am still fighting SPEC for some benchmark stats and some

Re: Optimized kernel memcpy/memset

2011-05-06 Thread David Gilbert
2011/5/6 Christian Robottom Reis : > On Thu, May 05, 2011 at 04:08:01PM +0100, Måns Rullgård wrote: >> >> Incidentally, this ties into the question sent earlier this week which >> >> had to do with Nico's work item in: >> >> >> >>    https://blueprints.launchpad.net/linux-linaro/+spec/other-kernel-

Re: Optimized kernel memcpy/memset

2011-05-06 Thread Christian Robottom Reis
On Thu, May 05, 2011 at 04:08:01PM +0100, Måns Rullgård wrote: > >> Incidentally, this ties into the question sent earlier this week which > >> had to do with Nico's work item in: > >> > >>    https://blueprints.launchpad.net/linux-linaro/+spec/other-kernel-thumb2 > >> > >> Which IIRC Nico says pro

Re: Optimized kernel memcpy/memset

2011-05-06 Thread Dave Martin
Hi, On Thu, May 05, 2011 at 03:47:08PM +0100, David Gilbert wrote: > Hi Kiko, > > On 5 May 2011 15:21, Christian Robottom Reis wrote: > > Hey there, > > > >    I was asked today in the board meeting about the use of NEON > > routines in the kernel; I said we had looked into this but hadn't done

Re: Optimized kernel memcpy/memset

2011-05-05 Thread Måns Rullgård
David Gilbert writes: >> The memcpy case is not interesting.  Not at all.  Most kernel memcpy >> calls are for small size copies.  The large copy instances are just bad >> and misdesigned in the first place if they rely on memcpy (maybe they >> should simply have a custom copy function, maybe imp

Re: Optimized kernel memcpy/memset

2011-05-05 Thread Nicolas Pitre
On Thu, 5 May 2011, Christian Robottom Reis wrote: > Hey there, > > I was asked today in the board meeting about the use of NEON > routines in the kernel; I said we had looked into this but hadn't done > it because a) it wasn't conclusively better and b) if better, it would > need to be done

Re: Optimized kernel memcpy/memset

2011-05-05 Thread Nicolas Pitre
On Thu, 5 May 2011, David Gilbert wrote: > Yes, while I've not actually looked at coding CRC32 or the crypto things > I agree that they feel like they have much more room for working with; > it's outside of the scope of what I was asked to look at however. Well, you said that the current memcpy c

Re: Optimized kernel memcpy/memset

2011-05-05 Thread Nicolas Pitre
On Thu, 5 May 2011, Måns Rullgård wrote: > David Gilbert writes: > > >> The memcpy case is not interesting.  Not at all.  Most kernel memcpy > >> calls are for small size copies.  The large copy instances are just bad > >> and misdesigned in the first place if they rely on memcpy (maybe they > >

Re: Optimized kernel memcpy/memset

2011-05-05 Thread Måns Rullgård
David Gilbert writes: > On 5 May 2011 18:44, Måns Rullgård wrote: > >> The relative performance of NEON vs non-NEON seems to depend a lot on >> the size (relative to cache), alignment, and whether or not any >> prefetching (explicit PLD, automatic, or preload engine) is used. > > Yes, agreed - N

Re: Optimized kernel memcpy/memset

2011-05-05 Thread David Gilbert
On 5 May 2011 18:59, Nicolas Pitre wrote: > On Thu, 5 May 2011, David Gilbert wrote: > >> If people believe it's worth breaking the context-switching taboo and >> putting a neon version into the kernel then yes I agree it's something >> you'd want to do as a build and/or runtime selection - but th

Re: Optimized kernel memcpy/memset

2011-05-05 Thread David Gilbert
On 5 May 2011 18:44, Måns Rullgård wrote: > The relative performance of NEON vs non-NEON seems to depend a lot on > the size (relative to cache), alignment, and whether or not any > prefetching (explicit PLD, automatic, or preload engine) is used. Yes, agreed - Neon does very well in non-aligned

Re: Optimized kernel memcpy/memset

2011-05-05 Thread Nicolas Pitre
On Thu, 5 May 2011, Måns Rullgård wrote: > David Gilbert writes: > > > On 5 May 2011 17:45, Deepak Saxena wrote: > >> On May 05 2011, at 16:46, David Gilbert was caught saying: > >>> On 5 May 2011 16:08, Måns Rullgård wrote: > >>> > David Gilbert writes: > >>> >> Not quite: > >>> >>   a) Neon

Re: Optimized kernel memcpy/memset

2011-05-05 Thread Nicolas Pitre
On Thu, 5 May 2011, David Gilbert wrote: > If people believe it's worth breaking the context-switching taboo and > putting a neon version into the kernel then yes I agree it's something > you'd want to do as a build and/or runtime selection - but that's > quite a big taboo to break. There is n

Re: Optimized kernel memcpy/memset

2011-05-05 Thread Deepak Saxena
On May 05 2011, at 16:46, David Gilbert was caught saying: > On 5 May 2011 16:08, Måns Rullgård wrote: > > David Gilbert writes: > >> Not quite: > >>   a) Neon memcpy/memset is worse on A9 than non-neon versions (better > >> on A8 typically) > > > > That is not my experience at all.  On the contr

Re: Optimized kernel memcpy/memset

2011-05-05 Thread Måns Rullgård
David Gilbert writes: > On 5 May 2011 18:17, Måns Rullgård wrote: >> David Gilbert writes: >> >>> On 5 May 2011 16:08, Måns Rullgård wrote: David Gilbert writes: > Not quite: >   a) Neon memcpy/memset is worse on A9 than non-neon versions (better > on A8 typically)

Re: Optimized kernel memcpy/memset

2011-05-05 Thread Måns Rullgård
David Gilbert writes: > On 5 May 2011 16:08, Måns Rullgård wrote: >> David Gilbert writes: >>> Not quite: >>>   a) Neon memcpy/memset is worse on A9 than non-neon versions (better >>> on A8 typically) >> >> That is not my experience at all.  On the contrary, I've seen memcpy >> throughput on A9

Re: Optimized kernel memcpy/memset

2011-05-05 Thread Måns Rullgård
David Gilbert writes: > On 5 May 2011 17:45, Deepak Saxena wrote: >> On May 05 2011, at 16:46, David Gilbert was caught saying: >>> On 5 May 2011 16:08, Måns Rullgård wrote: >>> > David Gilbert writes: >>> >> Not quite: >>> >>   a) Neon memcpy/memset is worse on A9 than non-neon versions (bett

Re: Optimized kernel memcpy/memset

2011-05-05 Thread David Gilbert
On 5 May 2011 18:17, Måns Rullgård wrote: > David Gilbert writes: > >> On 5 May 2011 16:08, Måns Rullgård wrote: >>> David Gilbert writes: Not quite:   a) Neon memcpy/memset is worse on A9 than non-neon versions (better on A8 typically) >>> >>> That is not my experience at all.  

Re: Optimized kernel memcpy/memset

2011-05-05 Thread David Gilbert
On 5 May 2011 17:45, Deepak Saxena wrote: > On May 05 2011, at 16:46, David Gilbert was caught saying: >> On 5 May 2011 16:08, Måns Rullgård wrote: >> > David Gilbert writes: >> >> Not quite: >> >>   a) Neon memcpy/memset is worse on A9 than non-neon versions (better >> >> on A8 typically) >> >

Re: Optimized kernel memcpy/memset

2011-05-05 Thread David Gilbert
On 5 May 2011 16:08, Måns Rullgård wrote: > David Gilbert writes: >> Not quite: >>   a) Neon memcpy/memset is worse on A9 than non-neon versions (better >> on A8 typically) > > That is not my experience at all.  On the contrary, I've seen memcpy > throughput on A9 roughly double with use of NEON

Re: Optimized kernel memcpy/memset

2011-05-05 Thread Måns Rullgård
David Gilbert writes: > Hi Kiko, > > On 5 May 2011 15:21, Christian Robottom Reis wrote: >> Hey there, >> >>    I was asked today in the board meeting about the use of NEON >> routines in the kernel; I said we had looked into this but hadn't done >> it because a) it wasn't conclusively better an

Re: Optimized kernel memcpy/memset

2011-05-05 Thread Konstantinos Margaritis
On 5 May 2011 17:57, Steve McIntyre wrote: > Technically it *can*, but you'll then have to be responsible for > dealing with all the extra register save/restores for context > switches. Normal wisdom is that it's just not worth that cost unless > you're doing an extended amount of such code (e.g.

Re: Optimized kernel memcpy/memset

2011-05-05 Thread Konstantinos Margaritis
On 5 May 2011 17:21, Christian Robottom Reis wrote: > Hey there, > >    I was asked today in the board meeting about the use of NEON > routines in the kernel; I said we had looked into this but hadn't done > it because a) it wasn't conclusively better and b) if better, it would > need to be done c

Re: Optimized kernel memcpy/memset

2011-05-05 Thread Steve McIntyre
On Thu, May 05, 2011 at 03:47:08PM +0100, David Gilbert wrote: >Hi Kiko, > >On 5 May 2011 15:21, Christian Robottom Reis wrote: >> Hey there, >> >>    I was asked today in the board meeting about the use of NEON >> routines in the kernel; I said we had looked into this but hadn't done >> it becaus

Re: Optimized kernel memcpy/memset

2011-05-05 Thread David Gilbert
Hi Kiko, On 5 May 2011 15:21, Christian Robottom Reis wrote: > Hey there, > >    I was asked today in the board meeting about the use of NEON > routines in the kernel; I said we had looked into this but hadn't done > it because a) it wasn't conclusively better and b) if better, it would > need to

Optimized kernel memcpy/memset

2011-05-05 Thread Christian Robottom Reis
Hey there, I was asked today in the board meeting about the use of NEON routines in the kernel; I said we had looked into this but hadn't done it because a) it wasn't conclusively better and b) if better, it would need to be done conditionally per-platform. But I wanted to double-check that's