Re: Efficient memcpy()/memmove() for G2/G3 cores...

2008-09-04 Thread prodyut hazarika
> I would be careful about adding overhead to memcpy. I found that in > the kernel, almost all calls to memcpy are for less than 128 bytes (1 > cache line on most 64-bit machines). So, adding a lot of code to > detect cacheability and do prefetching is just going to slow down the > common case, w

Re: Efficient memcpy()/memmove() for G2/G3 cores...

2008-09-04 Thread David Jander
On Thursday 04 September 2008 17:01:21 Gunnar Von Boehn wrote: >[...] > Regarding the 5121. > David, you did create a very special memcopy for the 5121e CPU. > Your test showed us that the normal glibc memcopy is about 10 times > slower than expected on the 5121. > > I really wonder why this is the

Re: Efficient memcpy()/memmove() for G2/G3 cores...

2008-09-04 Thread David Jander
Hi Steven, On Thursday 04 September 2008 16:31:13 Steven Munroe wrote: >[...] > > Yes, I admit my testcase is focussing on optimizing memcpy() of uncached > > data, and that interest stems from the fact that I was testing X11 > > performance (using xorg kdrive and xorg-server), and wondering why

Re: Efficient memcpy()/memmove() for G2/G3 cores...

2008-09-04 Thread Gunnar Von Boehn
Hi Steve, > I have personally optimized memcpy for power4/5/6 and they are all > different. There are dozens of different PPC implementations from > different manufacturers and design, every one is different! With painful > negotiation I was able to get the --with-cpu= framework added to glibc > b

Re: Efficient memcpy()/memmove() for G2/G3 cores...

2008-09-04 Thread Gunnar Von Boehn
Hi David, Regarding your testcase. I think we all agree with you that improving the performance for PPC is a noble quest and we should all try do improve the performance where possible. Regarding the 5200B and 5221 CPUs. As we all know the 5200B is a G2 PowerPC from Freescale. The factor for

Re: Efficient memcpy()/memmove() for G2/G3 cores...

2008-09-04 Thread Gunnar Von Boehn
Steve, I think we should be grateful for people being interested in improving performance for PPC, and we should not bash them. The proposal to optimize the memcopy for the 5200 is good. Steve, you said that you've heard about the 5200.. Maybe I can refresh your memory: I did send you an optimi

Re: Efficient memcpy()/memmove() for G2/G3 cores...

2008-09-04 Thread Steven Munroe
On Thu, 2008-09-04 at 14:59 +0200, David Jander wrote: > On Thursday 04 September 2008 14:19:26 Josh Boyer wrote: > >[...] > > >(I have edited the output of this tool to fit into an e-mail without > > > wrapping lines for readability). > > >Please tell me how on earth there can be such a big differ

Re: Efficient memcpy()/memmove() for G2/G3 cores...

2008-09-04 Thread David Jander
On Thursday 04 September 2008 14:19:26 Josh Boyer wrote: >[...] > >$ ./memcpyspeed > >Fully aligned: > >10 chunks of 5 bytes : 3.48 Mbyte/s ( throughput: 6.96 Mbytes/s) > >5 chunks of 16 bytes : 14.3 Mbyte/s ( throughput: 28.6 Mbytes/s) > >1 chunks of 100 bytes : 14.4 Mbyte/s

Re: Efficient memcpy()/memmove() for G2/G3 cores...

2008-09-04 Thread Josh Boyer
On Thu, Sep 04, 2008 at 02:05:16PM +0200, David Jander wrote: >> I would be careful about adding overhead to memcpy. I found that in >> the kernel, almost all calls to memcpy are for less than 128 bytes (1 >> cache line on most 64-bit machines). So, adding a lot of code to >> detect cacheability

Re: Efficient memcpy()/memmove() for G2/G3 cores...

2008-09-04 Thread David Jander
On Thursday 04 September 2008 04:04:58 Paul Mackerras wrote: > prodyut hazarika writes: > > glibc memxxx for powerpc are horribly inefficient. For optimal > > performance, we should should dcbt instruction to establish the source > > address in cache, and dcbz to establish the destination address i

Re: Efficient memcpy()/memmove() for G2/G3 cores...

2008-09-03 Thread Paul Mackerras
prodyut hazarika writes: > glibc memxxx for powerpc are horribly inefficient. For optimal performance, > we should should dcbt instruction to establish the source address in cache, > and > dcbz to establish the destination address in cache. We should do > dcbt and dcbz such that the touches happe

Re: Efficient memcpy()/memmove() for G2/G3 cores...

2008-09-03 Thread prodyut hazarika
Hi all, > These could probably go to glibc > as new general purpose memxxx() routines. You will probably see > a big increase once dcbz is added to the copy/memset functions. glibc memxxx for powerpc are horribly inefficient. For optimal performance, we should should dcbt instruction to establis

Re: Efficient memcpy()/memmove() for G2/G3 cores...

2008-09-02 Thread Joakim Tjernlund
On Tue, 2008-09-02 at 15:12 +0200, David Jander wrote: > I have made some astonishing discoveries, and I'd like to post the > used source-code somewhere in the meantime, any suggestions? To this list? Yes, mail it. I got a mpc8323/8321 board I want to try. > For the MPC5121e, 16-register strides

Re: Efficient memcpy()/memmove() for G2/G3 cores...

2008-09-02 Thread David Jander
On Monday 01 September 2008 11:36:15 Joakim Tjernlund wrote: >[...] > > Then I started my test program with LD_PRELOAD=... > > > > My test program only copies big chunks of aligned memory, so it will only > > test for maximum throughput (such as copying video frames). I will make a > > better one,

Re: Efficient memcpy()/memmove() for G2/G3 cores...

2008-09-01 Thread Joakim Tjernlund
On Mon, 2008-09-01 at 09:23 +0200, David Jander wrote: > On Friday 29 August 2008 14:20:33 Joakim Tjernlund wrote: > >[...] > > > The problem is: I have very little experience with powerpc assembly and > > > only very limited time to dedicate to this and I am looking for others > > > who have > > >

Re: Efficient memcpy()/memmove() for G2/G3 cores...

2008-09-01 Thread David Jander
On Friday 29 August 2008 22:34:21 Steven Munroe wrote: > > I am not complaining. I was only wondering if it is just me or there > > really is very little that has been done (for either uClibc, glibc, or > > whatever for powerpc) to improve performance of (linux-) applications on > > "lower"-power p

Re: Efficient memcpy()/memmove() for G2/G3 cores...

2008-09-01 Thread David Jander
On Friday 29 August 2008 14:20:33 Joakim Tjernlund wrote: >[...] > > The problem is: I have very little experience with powerpc assembly and > > only very limited time to dedicate to this and I am looking for others > > who have > > I improved the PowerPC memcpy and friends in uClibc a while ago. I

Re: Efficient memcpy()/memmove() for G2/G3 cores...

2008-08-31 Thread David Jander
On Sunday 31 August 2008 10:28:43 Benjamin Herrenschmidt wrote: > O> > It would be useful of somebody interested in getting things things > > > > > into glibc did the necessary FSF copyright assignment stuff and > > > > worked toward integrating them. > > > > > > Ben makes a very good point! > > >

Re: Efficient memcpy()/memmove() for G2/G3 cores...

2008-08-31 Thread Benjamin Herrenschmidt
O> > It would be useful of somebody interested in getting things things > > > into glibc did the necessary FSF copyright assignment stuff and worked > > > toward integrating them. > > > > Ben makes a very good point! > > Sounds reasonable... but I am still wondering about what you mean > with "th

Re: Efficient memcpy()/memmove() for G2/G3 cores...

2008-08-29 Thread Steven Munroe
On Fri, 2008-08-29 at 13:48 +0200, David Jander wrote: > On Wednesday 27 August 2008 23:04:39 Steven Munroe wrote: > > On Tue, 2008-08-26 at 08:28 +1000, Benjamin Herrenschmidt wrote: > > > On Mon, 2008-08-25 at 15:06 +0200, David Jander wrote: > > > > Hi Matt, > > > > > > > > On Monday 25 August 2

Re: Efficient memcpy()/memmove() for G2/G3 cores...

2008-08-29 Thread Joakim Tjernlund
On Fri, 2008-08-29 at 13:48 +0200, David Jander wrote: > On Wednesday 27 August 2008 23:04:39 Steven Munroe wrote: > > On Tue, 2008-08-26 at 08:28 +1000, Benjamin Herrenschmidt wrote: > > > On Mon, 2008-08-25 at 15:06 +0200, David Jander wrote: > > > > Hi Matt, [SNIP] > I am not complaining. I wa

Re: Efficient memcpy()/memmove() for G2/G3 cores...

2008-08-29 Thread David Jander
On Wednesday 27 August 2008 23:04:39 Steven Munroe wrote: > On Tue, 2008-08-26 at 08:28 +1000, Benjamin Herrenschmidt wrote: > > On Mon, 2008-08-25 at 15:06 +0200, David Jander wrote: > > > Hi Matt, > > > > > > On Monday 25 August 2008 13:00:10 Matt Sealey wrote: > > > > The focus has definitely be

Re: Efficient memcpy()/memmove() for G2/G3 cores...

2008-08-27 Thread Steven Munroe
On Tue, 2008-08-26 at 08:28 +1000, Benjamin Herrenschmidt wrote: > On Mon, 2008-08-25 at 15:06 +0200, David Jander wrote: > > Hi Matt, > > > > On Monday 25 August 2008 13:00:10 Matt Sealey wrote: > > > The focus has definitely been on VMX but that's not to say lower power > > > processors were for

Re: Efficient memcpy()/memmove() for G2/G3 cores...

2008-08-25 Thread Benjamin Herrenschmidt
On Mon, 2008-08-25 at 15:06 +0200, David Jander wrote: > Hi Matt, > > On Monday 25 August 2008 13:00:10 Matt Sealey wrote: > > The focus has definitely been on VMX but that's not to say lower power > > processors were forgotten :) > > lower-power (pun intended) is coming strong these days, as ene

Re: Efficient memcpy()/memmove() for G2/G3 cores...

2008-08-25 Thread David Jander
Hi Matt, On Monday 25 August 2008 13:00:10 Matt Sealey wrote: > The focus has definitely been on VMX but that's not to say lower power > processors were forgotten :) lower-power (pun intended) is coming strong these days, as energy-efficiency is getteing more important every day. And the MPC512

Re: Efficient memcpy()/memmove() for G2/G3 cores...

2008-08-25 Thread Matt Sealey
Hi David, The focus has definitely been on VMX but that's not to say lower power processors were forgotten :) Gunnar von Boehn did some benchmarking with an assembly optimized routine, for Cell, 603e and so on (basically the whole gamut from embedded up to sever class IBM chips) and got some pre