On Saturday, 18 June 2016 17:54:45 UTC+7, ⚛ wrote:
>
> On Saturday, June 18, 2016 at 12:21:21 AM UTC+2, gordo...@gmail.com wrote:
>>
>> On Friday, June 17, 2016 at 4:48:06 PM UTC+2, gordo...@gmail.com wrote:
>> Have you tried compiling
>
>
> eratspeed with a new version of GCC to see how it
On Friday, 17 June 2016 09:23:18 UTC+7, Keith Randall wrote:
>
> Looks like something is wrong with immediate loading for the 1 << ...
> operation. Could you open a bug with repro instructions? I can look at it
> when 1.8 opens.
>>
>>
@Keith, now that I have found a combination that makes the
On Monday, 20 June 2016 16:35:09 UTC+7, Michael Jones wrote:
>
> Your code (with slightly changed print formatting) on my MacBook Pro:
>
> $ go install
> $ time fasterEratspeed
> Init: 2.885352ms
> Search: 284.849689ms, found 8621475 primes up to 1018429650
> Total: 287.735041ms
>
> real 0m
On Monday, June 20, 2016 at 6:33:29 AM UTC-7, gordo...@gmail.com wrote:
Further to the subject of compiler efficiency, the following is the assembler
code output with array bounds checking turned off (-B) the the inner tight
composite culling loop of FasterEratspeed above (generated with go tool
On Monday, June 20, 2016 at 6:33:29 AM UTC-7, gordo...@gmail.com wrote:
Further to the subject of compiler efficiency, the following is the assembler
code output with array bounds checking turned off (-B) the the inner tight
composite culling loop of FasterEratspeed above (generated with go tool
Further to the subject of compiler efficiency, the following is the assembler
code output with array bounds checking turned off (-B) the the inner tight
composite culling loop of FasterEratspeed above (generated with go tool compile
-B -S FasterEratspeed.go > FasterEratspeed.asm):
0x005
Further to the subject of compiler efficiency, the following is the assembler
code output with array bounds checking turned off (-B) the the inner tight
composite culling loop of FasterEratspeed above (generated with go tool compile
-B -S FasterEratspeed.go > FasterEratspeed.asm):
0x005
Edit: Corrected, forgot to change the new count routine to 48 from 8...
We have proven with the straight translation of Daniel Bernstein's C code to
golang that the Sieve of Eratosthenes (SoE) eratspeed runs about the same speed
as the Sieve of Atkin (SoA) primespeed for the same range with the
We have proven with the straight translation of Daniel Bernstein's C code to
golang that the Sieve of Eratosthenes (SoE) eratspeed rans about the same speed
as the Sieve of Atkin (SoA) primespeed for the same range with the same
optimizations and with John Barham's multi-treading for primespeed
On Saturday, June 18, 2016 at 12:21:21 AM UTC+2, gordo...@gmail.com wrote:
On Friday, June 17, 2016 at 4:48:06 PM UTC+2, gordo...@gmail.com wrote:
> > I am not on a high end Intel CPU now, but when I was I found that with a
> > buffer size adjusted to the L1 cache size (8192 32-bit words or 32
>
Here is a golang version of Daniel Bernstein's "eratspeed", which is a straight
translation of his C code to golang without any changes other than as necessary
to make it work: https://play.golang.org/p/Sd6qlMQcHF.
It won't run on the golang playground because it takes too long unless one
chan
On Friday, June 17, 2016 at 4:48:06 PM UTC+2, gordo...@gmail.com wrote:
> > I don't see the point to the exercise as far as optimizing golang is
> > concerned.
> It is a general rule that using more registers results in faster code.
Yes, except when they're wasted as in using the extra register
I don't see the point to the exercise as far as optimizing golang is concerned.
Your experiment just shows that Your compiler (GCC?) missed an optimization as
far as reducing backend latency goes.
You may also find that swapping the order of some of the instructions such as
the second and the
I don't see the point to the exercise as far as optimizing golang is concerned.
Your experiment just shows that Your compiler (GCC?) missed an optimization as
far as reducing backend latency goes.
You may also find that swapping the order of some of the instructions such as
the second and the
I don't see the point of the exercise other than it proves that not putting the
result of an operation back into the same register reduces the latency slightly
for your processor (whatever it is?); I suspect that if you used any other
register such as the unused AX register rather then the R11 r
> Looks like something is wrong with immediate loading for the 1 << ...
> operation. Could you open a bug with repro instructions? I can look at it
> when 1.8 opens.
I have opened a golang issue #16092 as follows:
https://github.com/golang/go/issues/16092
I may have over-complicated it, but
> Modern x86 CPUs don't work like that.
> In general, optimally scheduled assembly code which uses more registers has
> higher performance than optimally scheduled assembly code which uses smaller
> number of registers. Assuming both assembly codes correspond to the same
> source code.
> Regis
>. So for this tight loop, golang is still slower than optimized C/C++
>. code, but not by very much if array bounds checks are disabled.
> It appears you're well versed in the x86 process instruction set and
> code generation. Could you may be offer help to the folks working on
> improving t
No real surprises with the no bounds checks option (-B), it just eliminated the
array bounds checks with the rest of the code the same (version 1.7beta1):
0x00dd 00221 (main.go:37) MOVQDI, CX
0x00e0 00224 (main.go:37) SHRQ$5, DI
0x00e4 00228 (main.go:37
No real surprises with the no bounds checks option (-B), it just eliminated the
array bounds checks with the rest of the code the same (version 1.7beta1):
0x00dd 00221 (main.go:37) MOVQDI, CX
0x00e0 00224 (main.go:37) SHRQ$5, DI
0x00e4 00228 (main.go:37
On Thursday, 16 June 2016 10:11:12 UTC+7, Nigel Tao wrote:
>
> On Thu, Jun 16, 2016 at 11:22 AM, >
> wrote:
> > it no longer does the array bounds check twice
>
> It's not safe, but you might want to try disabling bounds checking by
> "go tool compile -B" or "go build -gcflags=-B"
>
In fact,
golang version 1.7beta1 does indeed help, and the time is now not much worse
than C#/Java, but still not as good as C/C++ due to the single array bounds
check:
Using the same procedure to obtain an assembly listing (go tool compiler -S
PrimeTest.go > PrimeTest.s):
line 36for j := (
While I am sure you are correct that some numeric code with golang can run as
fast as C/C++, it seems that bit-packed tight loops such as used for culling
composites in the Sieve of Eratosthenes (or the Sieve of Atkin for that matter)
are not as optimized. Of course one of the problems that gol
That's interesting but your time or 0.7 seconds on your specified machine isn't
very informative: you should at least specify how long the original version of
primegen (which is not multithreaded) takes, which we assume as about 0.7
seconds, and also how long your golang version takes without mu
24 matches
Mail list logo