On Wed, Mar 07, 2007 at 00:24:35 +0200, Sami Farin wrote:
> On Tue, Mar 06, 2007 at 23:53:49 +0200, Sami Farin wrote:
> ...
> > And I found bug in gcc-4.1.2, it gave 0 for ncubic results
> > when doing 1000 loops test... gcc-4.0.3 works.
>
> Found it.
>
> --- cbrt-test.c~ 2007-03-07 00:20:54
On Wed, Mar 07, 2007 at 11:11:49 -0500, Chuck Ebbert wrote:
> Sami Farin wrote:
> > On Tue, Mar 06, 2007 at 23:53:49 +0200, Sami Farin wrote:
> > ...
> >> And I found bug in gcc-4.1.2, it gave 0 for ncubic results
> >> when doing 1000 loops test... gcc-4.0.3 works.
> >
> > Found it.
> >
> > --- c
Sami Farin wrote:
> On Tue, Mar 06, 2007 at 23:53:49 +0200, Sami Farin wrote:
> ...
>> And I found bug in gcc-4.1.2, it gave 0 for ncubic results
>> when doing 1000 loops test... gcc-4.0.3 works.
>
> Found it.
>
> --- cbrt-test.c~ 2007-03-07 00:20:54.735248105 +0200
> +++ cbrt-test.c 2
From: Stephen Hemminger <[EMAIL PROTECTED]>
Date: Tue, 6 Mar 2007 16:00:55 -0800
> On Wed, 7 Mar 2007 00:24:35 +0200
> Sami Farin <[EMAIL PROTECTED]> wrote:
>
> > On Tue, Mar 06, 2007 at 23:53:49 +0200, Sami Farin wrote:
> > ...
> > > And I found bug in gcc-4.1.2, it gave 0 for ncubic results
> >
On Tue, Mar 06, 2007 at 16:00:55 -0800, Stephen Hemminger wrote:
...
> > Now Linux 2.6 does not have "memory" in fls, maybe it causes
> > some gcc funnies some people are seeing.
> >
>
> That code was copy-paste from:
> include/asm-x86_64/bitops.h
>
> So shouldn't both fls() and ffs() be f
On Wed, 7 Mar 2007 00:24:35 +0200
Sami Farin <[EMAIL PROTECTED]> wrote:
> On Tue, Mar 06, 2007 at 23:53:49 +0200, Sami Farin wrote:
> ...
> > And I found bug in gcc-4.1.2, it gave 0 for ncubic results
> > when doing 1000 loops test... gcc-4.0.3 works.
>
> Found it.
>
> --- cbrt-test.c~ 2007
On Tue, Mar 06, 2007 at 23:53:49 +0200, Sami Farin wrote:
...
> And I found bug in gcc-4.1.2, it gave 0 for ncubic results
> when doing 1000 loops test... gcc-4.0.3 works.
Found it.
--- cbrt-test.c~2007-03-07 00:20:54.735248105 +0200
+++ cbrt-test.c 2007-03-07 00:21:03.964864343 +0200
@@
From: Stephen Hemminger <[EMAIL PROTECTED]>
Date: Tue, 6 Mar 2007 10:29:41 -0800
> /* calculate the cubic root of x using Newton-Raphson */
> static uint32_t ncubic(uint64_t a)
> {
> uint64_t x;
>
> /* Initial estimate is based on:
>* cbrt(x) = exp(log(x) / 3)
>*/
>
On Tue, Mar 06, 2007 at 10:29:41 -0800, Stephen Hemminger wrote:
> Don't count the existing Newton-Raphson out. It turns out that to get enough
> precision for 32 bits, only 4 iterations are needed. By unrolling those, it
> gets much better timing.
>
> Slightly gross test program (with original cu
On Tue, 6 Mar 2007 20:48:41 +0100
Andi Kleen <[EMAIL PROTECTED]> wrote:
> On Tue, Mar 06, 2007 at 10:29:41AM -0800, Stephen Hemminger wrote:
> > Don't count the existing Newton-Raphson out. It turns out that to get enough
> > precision for 32 bits, only 4 iterations are needed. By unrolling those,
On Tue, Mar 06, 2007 at 10:29:41AM -0800, Stephen Hemminger wrote:
> Don't count the existing Newton-Raphson out. It turns out that to get enough
> precision for 32 bits, only 4 iterations are needed. By unrolling those, it
> gets much better timing.
But did you fix the >2^43 bug too?
SGI has alr
Andi Kleen wrote:
Let me see... You throw code like that and expect someone to actually
understand it in one year, and be able to correct a bug ?
To be honest I don't expect any bugs in this function.
Please add something, an URL or even better a nice explanation, per favor...
It's stra
Andi Kleen wrote:
The problem with these algorithms that tradoff one or more
multiplies in order to avoid a divide is that they don't
give anything and often lose when both multiplies and
divides are emulated in software.
Actually on rereading this: is there really any Linux port
that emulates
Don't count the existing Newton-Raphson out. It turns out that to get enough
precision for 32 bits, only 4 iterations are needed. By unrolling those, it
gets much better timing.
Slightly gross test program (with original cubic wraparound bug fixed).
---
/* Test and measure perf of cube root algor
From: [EMAIL PROTECTED] (Dagfinn Ilmari Mannsåker)
Date: Tue, 06 Mar 2007 18:43:14 +0100
> Andi Kleen <[EMAIL PROTECTED]> writes:
>
> > Actually on rereading this: is there really any Linux port
> > that emulates multiplies in software? I thought that was only
> > done on really small microcontro
Andi Kleen <[EMAIL PROTECTED]> writes:
> Actually on rereading this: is there really any Linux port
> that emulates multiplies in software? I thought that was only
> done on really small microcontrollers or smart cards; but anything
> 32bit+ that runs Linux should have hardware multiply, shouldn't
Hi Andi!
On 6 Mar 2007, at 15:45, Andi Kleen wrote:
Let me see... You throw code like that and expect someone to actually
understand it in one year, and be able to correct a bug ?
To be honest I don't expect any bugs in this function.
Please add something, an URL or even better a nice ex
>
> Let me see... You throw code like that and expect someone to actually
> understand it in one year, and be able to correct a bug ?
To be honest I don't expect any bugs in this function.
>
>
> Please add something, an URL or even better a nice explanation, per favor...
It's straight out of
On Tuesday 06 March 2007 14:34, Andi Kleen wrote:
> - return x;
> + int s;
> + u32 y;
> + u64 b;
> + u64 bs;
> +
> + y = 0;
> + for (s = 63; s >= 0; s -= 3) {
> + y = 2 * y;
> + b = 3 * y * (y+1) + 1;
> + bs = b << s;
> +
> The problem with these algorithms that tradoff one or more
> multiplies in order to avoid a divide is that they don't
> give anything and often lose when both multiplies and
> divides are emulated in software.
Actually on rereading this: is there really any Linux port
that emulates multiplies in
On Mon, Mar 05, 2007 at 04:25:51PM -0800, David Miller wrote:
> Another thing is that the non-Hacker's Delight version iterates
> differently for different input values, so the input value space is
> very important to consider when comparing these two pieces of code.
I did some stochastic testing
On Mon, Mar 05, 2007 at 03:57:14PM -0800, Stephen Hemminger wrote:
> On 03 Mar 2007 03:31:52 +0100
> Andi Kleen <[EMAIL PROTECTED]> wrote:
>
> > Stephen Hemminger <[EMAIL PROTECTED]> writes:
> >
> > > Here is another way to handle the 64 bit divide case.
> > > It allows full 64 bit divide by addi
From: Stephen Hemminger <[EMAIL PROTECTED]>
Date: Mon, 5 Mar 2007 15:57:14 -0800
> I tried the code from Hacker's Delight.
> It is cool, but performance is CPU (and data) dependent:
>
> Average # of usecs per operation:
Interesting results.
The problem with these algorithms that tradoff one or
On 03 Mar 2007 03:31:52 +0100
Andi Kleen <[EMAIL PROTECTED]> wrote:
> Stephen Hemminger <[EMAIL PROTECTED]> writes:
>
> > Here is another way to handle the 64 bit divide case.
> > It allows full 64 bit divide by adding the support routine
> > GCC needs.
>
> Not supplying that was intentional by
Stephen Hemminger <[EMAIL PROTECTED]> writes:
> Here is another way to handle the 64 bit divide case.
> It allows full 64 bit divide by adding the support routine
> GCC needs.
Not supplying that was intentional by Linus so that people
think twice (or more often) before they using such expensive
o
On 2/26/07, Stephen Hemminger <[EMAIL PROTECTED]> wrote:
Here is another way to handle the 64 bit divide case.
It allows full 64 bit divide by adding the support routine
GCC needs.
I know ARM already went through the process of removing __udivdi3 support:
http://www.arm.linux.org.uk/developer
I thought the motivation for div64() was that a 64:32->32 divide could
be done a lot faster on a number of platforms (including the important
x86) than a generic 64:64->64 divide, but gcc doesn't handle the
devolution automatically -- there is no such libgcc function.
That there's no such func
Stephen Hemminger wrote:
Hmm. Those are the GCC internal versions, that are picked up but
doing divide in place. Do we want to allow general 64 bit in kernel to
be easily used? It could cause sloppy slow code, but it would look
cleaner.
... and it would handle datatypes which may be architect
On Feb 26 2007 16:07, Stephen Hemminger wrote:
>> On Feb 26 2007 15:44, Stephen Hemminger wrote:
>> >> >-x = (2 * x + (uint32_t) div64_64(a, x*x)) / 3;
>> >> >+x = (2 * x + (u32) (a / x*x)) / 3;
>> >>
>> >> Previously there was div64_64(a, x*x) which is equivalent
On Tue, 27 Feb 2007 01:05:26 +0100 (MET)
Jan Engelhardt <[EMAIL PROTECTED]> wrote:
>
> On Feb 26 2007 15:44, Stephen Hemminger wrote:
> >> >- x = (2 * x + (uint32_t) div64_64(a, x*x)) / 3;
> >> >+ x = (2 * x + (u32) (a / x*x)) / 3;
> >>
> >> Previously there was div64_64(a, x*x)
On Feb 26 2007 15:44, Stephen Hemminger wrote:
>> >- x = (2 * x + (uint32_t) div64_64(a, x*x)) / 3;
>> >+ x = (2 * x + (u32) (a / x*x)) / 3;
>>
>> Previously there was div64_64(a, x*x) which is equivalent to
>> (a)/(x*x), or just: a/(x^2). But now you do a/x*x, which is
>> equ
On Tue, 27 Feb 2007 00:02:50 +0100 (MET)
Jan Engelhardt <[EMAIL PROTECTED]> wrote:
>
> On Feb 26 2007 13:28, Stephen Hemminger wrote:
> >>
> >> ./arch/arm26/lib/udivdi3.c
> >> ./arch/sh/lib/udivdi3.c
> >> ./arch/sparc/lib/udivdi3.S
> >>
> >> should not this be consolidated too?
> >
> >Hmm. Thos
On Feb 26 2007 13:28, Stephen Hemminger wrote:
>>
>> ./arch/arm26/lib/udivdi3.c
>> ./arch/sh/lib/udivdi3.c
>> ./arch/sparc/lib/udivdi3.S
>>
>> should not this be consolidated too?
>
>Hmm. Those are the GCC internal versions, that are picked up but
>doing divide in place. Do we want to allow gene
Here is another way to handle the 64 bit divide case.
It allows full 64 bit divide by adding the support routine
GCC needs.
---
arch/alpha/Kconfig |4
arch/arm/Kconfig |4
arch/arm26/Kconfig |4
arch/avr32/Kconfig |4
a
On Mon, 26 Feb 2007 21:09:26 +0100 (MET)
Jan Engelhardt <[EMAIL PROTECTED]> wrote:
>
> On Feb 23 2007 17:05, Stephen Hemminger wrote:
> >
> >Since there already two users of full 64 bit division in the kernel,
> >and other places maybe hiding out as well. Add a full 64/64 bit divide.
> >
> >Yes t
On Feb 23 2007 17:05, Stephen Hemminger wrote:
>
>Since there already two users of full 64 bit division in the kernel,
>and other places maybe hiding out as well. Add a full 64/64 bit divide.
>
>Yes this expensive, but there are places where it is necessary.
>It is not clear if doing the scaling b
From: Stephen Hemminger <[EMAIL PROTECTED]>
Date: Mon, 26 Feb 2007 11:28:18 -0800
> asm-i386.h/div64 and div64.o needs to move in Makefile to get this to work
> on i386.
This looks great to me:
Signed-off-by: David S. Miller <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "uns
asm-i386.h/div64 and div64.o needs to move in Makefile to get this to work
on i386.
---
include/asm-arm/div64.h |2 ++
include/asm-generic/div64.h |8
include/asm-i386/div64.h |5 +
include/asm-m68k/div64.h |2 ++
include/asm-mips/div64.h |8 ++
On Fri, Feb 23, 2007 at 17:05:27 -0800, Stephen Hemminger wrote:
> Since there already two users of full 64 bit division in the kernel,
> and other places maybe hiding out as well. Add a full 64/64 bit divide.
>
> Yes this expensive, but there are places where it is necessary.
> It is not clear if
Since there already two users of full 64 bit division in the kernel,
and other places maybe hiding out as well. Add a full 64/64 bit divide.
Yes this expensive, but there are places where it is necessary.
It is not clear if doing the scaling buys any advantage on 64 bit platforms,
so for them a fu
40 matches
Mail list logo