Le 17/08/2015 13:00, leroy christophe a écrit :
Le 17/08/2015 12:56, leroy christophe a écrit :
Le 07/08/2015 01:25, Segher Boessenkool a écrit :
On Thu, Aug 06, 2015 at 05:45:45PM -0500, Scott Wood wrote:
If this makes performance non-negligibly worse on other 32-bit
chips, and is
an im
Le 17/08/2015 12:56, leroy christophe a écrit :
Le 07/08/2015 01:25, Segher Boessenkool a écrit :
On Thu, Aug 06, 2015 at 05:45:45PM -0500, Scott Wood wrote:
If this makes performance non-negligibly worse on other 32-bit
chips, and is
an important improvement on 8xx, then we can use an ifde
Le 07/08/2015 01:25, Segher Boessenkool a écrit :
On Thu, Aug 06, 2015 at 05:45:45PM -0500, Scott Wood wrote:
If this makes performance non-negligibly worse on other 32-bit chips, and is
an important improvement on 8xx, then we can use an ifdef since 8xx already
requires its own kernel build.
On Thu, Aug 06, 2015 at 05:45:45PM -0500, Scott Wood wrote:
> > The original loop was already optimal, as the comment said.
>
> The comment says that bdnz has zero overhead. That doesn't mean the adde
> won't stall waiting for the load result.
adde is execution serialising on those cores; it *a
On Wed, 2015-08-05 at 23:39 -0500, Segher Boessenkool wrote:
> On Wed, Aug 05, 2015 at 09:31:41PM -0500, Scott Wood wrote:
> > On Wed, 2015-08-05 at 19:30 -0500, Segher Boessenkool wrote:
> > > On Wed, Aug 05, 2015 at 03:29:35PM +0200, Christophe Leroy wrote:
> > > > On the 8xx, load latency is 2 c
On Wed, Aug 05, 2015 at 09:31:41PM -0500, Scott Wood wrote:
> On Wed, 2015-08-05 at 19:30 -0500, Segher Boessenkool wrote:
> > On Wed, Aug 05, 2015 at 03:29:35PM +0200, Christophe Leroy wrote:
> > > On the 8xx, load latency is 2 cycles and taking branches also takes
> > > 2 cycles. So let's unroll
On Wed, 2015-08-05 at 19:30 -0500, Segher Boessenkool wrote:
> On Wed, Aug 05, 2015 at 03:29:35PM +0200, Christophe Leroy wrote:
> > On the 8xx, load latency is 2 cycles and taking branches also takes
> > 2 cycles. So let's unroll the loop.
>
> This is not true for most other 32-bit PowerPC; this
On Wed, Aug 05, 2015 at 03:29:35PM +0200, Christophe Leroy wrote:
> On the 8xx, load latency is 2 cycles and taking branches also takes
> 2 cycles. So let's unroll the loop.
This is not true for most other 32-bit PowerPC; this patch makes
performance worse on e.g. 6xx/7xx/7xxx. Let's not!
Seghe
On the 8xx, load latency is 2 cycles and taking branches also takes
2 cycles. So let's unroll the loop.
Signed-off-by: Christophe Leroy
---
v2: Only use lwzu for the last load as lwzu has undocumented
additional latency
arch/powerpc/lib/checksum_32.S | 16 +++-
1 file chang
9 matches
Mail list logo