> I would be careful about adding overhead to memcpy. I found that in
> the kernel, almost all calls to memcpy are for less than 128 bytes (1
> cache line on most 64-bit machines). So, adding a lot of code to
> detect cacheability and do prefetching is just going to slow down the
> common case, w
On Thursday 04 September 2008 17:01:21 Gunnar Von Boehn wrote:
>[...]
> Regarding the 5121.
> David, you did create a very special memcopy for the 5121e CPU.
> Your test showed us that the normal glibc memcopy is about 10 times
> slower than expected on the 5121.
>
> I really wonder why this is the
Hi Steven,
On Thursday 04 September 2008 16:31:13 Steven Munroe wrote:
>[...]
> > Yes, I admit my testcase is focussing on optimizing memcpy() of uncached
> > data, and that interest stems from the fact that I was testing X11
> > performance (using xorg kdrive and xorg-server), and wondering why
Hi Steve,
> I have personally optimized memcpy for power4/5/6 and they are all
> different. There are dozens of different PPC implementations from
> different manufacturers and design, every one is different! With painful
> negotiation I was able to get the --with-cpu= framework added to glibc
> b
Hi David,
Regarding your testcase.
I think we all agree with you that improving the performance for PPC
is a noble quest
and we should all try do improve the performance where possible.
Regarding the 5200B and 5221 CPUs.
As we all know the 5200B is a G2 PowerPC from Freescale.
The factor for
Steve,
I think we should be grateful for people being interested in improving
performance for PPC,
and we should not bash them.
The proposal to optimize the memcopy for the 5200 is good.
Steve, you said that you've heard about the 5200..
Maybe I can refresh your memory:
I did send you an optimi
On Thu, 2008-09-04 at 14:59 +0200, David Jander wrote:
> On Thursday 04 September 2008 14:19:26 Josh Boyer wrote:
> >[...]
> > >(I have edited the output of this tool to fit into an e-mail without
> > > wrapping lines for readability).
> > >Please tell me how on earth there can be such a big differ
On Thursday 04 September 2008 14:19:26 Josh Boyer wrote:
>[...]
> >$ ./memcpyspeed
> >Fully aligned:
> >10 chunks of 5 bytes : 3.48 Mbyte/s ( throughput: 6.96 Mbytes/s)
> >5 chunks of 16 bytes : 14.3 Mbyte/s ( throughput: 28.6 Mbytes/s)
> >1 chunks of 100 bytes : 14.4 Mbyte/s
On Thu, Sep 04, 2008 at 02:05:16PM +0200, David Jander wrote:
>> I would be careful about adding overhead to memcpy. I found that in
>> the kernel, almost all calls to memcpy are for less than 128 bytes (1
>> cache line on most 64-bit machines). So, adding a lot of code to
>> detect cacheability
On Thursday 04 September 2008 04:04:58 Paul Mackerras wrote:
> prodyut hazarika writes:
> > glibc memxxx for powerpc are horribly inefficient. For optimal
> > performance, we should should dcbt instruction to establish the source
> > address in cache, and dcbz to establish the destination address i
prodyut hazarika writes:
> glibc memxxx for powerpc are horribly inefficient. For optimal performance,
> we should should dcbt instruction to establish the source address in cache,
> and
> dcbz to establish the destination address in cache. We should do
> dcbt and dcbz such that the touches happe
Hi all,
> These could probably go to glibc
> as new general purpose memxxx() routines. You will probably see
> a big increase once dcbz is added to the copy/memset functions.
glibc memxxx for powerpc are horribly inefficient. For optimal performance,
we should should dcbt instruction to establis
On Tue, 2008-09-02 at 15:12 +0200, David Jander wrote:
> I have made some astonishing discoveries, and I'd like to post the
> used source-code somewhere in the meantime, any suggestions? To this list?
Yes, mail it.
I got a mpc8323/8321 board I want to try.
> For the MPC5121e, 16-register strides
On Monday 01 September 2008 11:36:15 Joakim Tjernlund wrote:
>[...]
> > Then I started my test program with LD_PRELOAD=...
> >
> > My test program only copies big chunks of aligned memory, so it will only
> > test for maximum throughput (such as copying video frames). I will make a
> > better one,
On Mon, 2008-09-01 at 09:23 +0200, David Jander wrote:
> On Friday 29 August 2008 14:20:33 Joakim Tjernlund wrote:
> >[...]
> > > The problem is: I have very little experience with powerpc assembly and
> > > only very limited time to dedicate to this and I am looking for others
> > > who have
> >
>
On Friday 29 August 2008 22:34:21 Steven Munroe wrote:
> > I am not complaining. I was only wondering if it is just me or there
> > really is very little that has been done (for either uClibc, glibc, or
> > whatever for powerpc) to improve performance of (linux-) applications on
> > "lower"-power p
On Friday 29 August 2008 14:20:33 Joakim Tjernlund wrote:
>[...]
> > The problem is: I have very little experience with powerpc assembly and
> > only very limited time to dedicate to this and I am looking for others
> > who have
>
> I improved the PowerPC memcpy and friends in uClibc a while ago. I
On Sunday 31 August 2008 10:28:43 Benjamin Herrenschmidt wrote:
> O> > It would be useful of somebody interested in getting things things
>
> > > > into glibc did the necessary FSF copyright assignment stuff and
> > > > worked toward integrating them.
> > >
> > > Ben makes a very good point!
> >
>
O> > It would be useful of somebody interested in getting things things
> > > into glibc did the necessary FSF copyright assignment stuff and worked
> > > toward integrating them.
> >
> > Ben makes a very good point!
>
> Sounds reasonable... but I am still wondering about what you mean
> with "th
On Fri, 2008-08-29 at 13:48 +0200, David Jander wrote:
> On Wednesday 27 August 2008 23:04:39 Steven Munroe wrote:
> > On Tue, 2008-08-26 at 08:28 +1000, Benjamin Herrenschmidt wrote:
> > > On Mon, 2008-08-25 at 15:06 +0200, David Jander wrote:
> > > > Hi Matt,
> > > >
> > > > On Monday 25 August 2
On Fri, 2008-08-29 at 13:48 +0200, David Jander wrote:
> On Wednesday 27 August 2008 23:04:39 Steven Munroe wrote:
> > On Tue, 2008-08-26 at 08:28 +1000, Benjamin Herrenschmidt wrote:
> > > On Mon, 2008-08-25 at 15:06 +0200, David Jander wrote:
> > > > Hi Matt,
[SNIP]
> I am not complaining. I wa
On Wednesday 27 August 2008 23:04:39 Steven Munroe wrote:
> On Tue, 2008-08-26 at 08:28 +1000, Benjamin Herrenschmidt wrote:
> > On Mon, 2008-08-25 at 15:06 +0200, David Jander wrote:
> > > Hi Matt,
> > >
> > > On Monday 25 August 2008 13:00:10 Matt Sealey wrote:
> > > > The focus has definitely be
On Tue, 2008-08-26 at 08:28 +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2008-08-25 at 15:06 +0200, David Jander wrote:
> > Hi Matt,
> >
> > On Monday 25 August 2008 13:00:10 Matt Sealey wrote:
> > > The focus has definitely been on VMX but that's not to say lower power
> > > processors were for
On Mon, 2008-08-25 at 15:06 +0200, David Jander wrote:
> Hi Matt,
>
> On Monday 25 August 2008 13:00:10 Matt Sealey wrote:
> > The focus has definitely been on VMX but that's not to say lower power
> > processors were forgotten :)
>
> lower-power (pun intended) is coming strong these days, as ene
Hi Matt,
On Monday 25 August 2008 13:00:10 Matt Sealey wrote:
> The focus has definitely been on VMX but that's not to say lower power
> processors were forgotten :)
lower-power (pun intended) is coming strong these days, as energy-efficiency
is getteing more important every day. And the MPC512
Hi David,
The focus has definitely been on VMX but that's not to say lower power
processors were forgotten :)
Gunnar von Boehn did some benchmarking with an assembly optimized routine,
for Cell, 603e and so on (basically the whole gamut from embedded up to
sever class IBM chips) and got some pre
26 matches
Mail list logo