Re: [go-nuts] Re: [ANN] primegen.go: Sieve of Atkin prime number generator

2016-06-24 Thread gordonbgood
On Saturday, 18 June 2016 17:54:45 UTC+7, ⚛ wrote: > > On Saturday, June 18, 2016 at 12:21:21 AM UTC+2, gordo...@gmail.com wrote: >> >> On Friday, June 17, 2016 at 4:48:06 PM UTC+2, gordo...@gmail.com wrote: >> Have you tried compiling > > > eratspeed with a new version of GCC to see how it

Re: [go-nuts] Re: [ANN] primegen.go: Sieve of Atkin prime number generator

2016-06-23 Thread gordonbgood
On Friday, 17 June 2016 09:23:18 UTC+7, Keith Randall wrote: > > Looks like something is wrong with immediate loading for the 1 << ... > operation. Could you open a bug with repro instructions? I can look at it > when 1.8 opens. >> >> @Keith, now that I have found a combination that makes the

Re: [go-nuts] [ANN] primegen.go: Sieve of Atkin prime number generator

2016-06-23 Thread gordonbgood
On Monday, 20 June 2016 16:35:09 UTC+7, Michael Jones wrote: > > Your code (with slightly changed print formatting) on my MacBook Pro: > > $ go install > $ time fasterEratspeed > Init: 2.885352ms > Search: 284.849689ms, found 8621475 primes up to 1018429650 > Total: 287.735041ms > > real 0m

Re: [go-nuts] [ANN] primegen.go: Sieve of Atkin prime number generator

2016-06-20 Thread gordonbgood
On Monday, June 20, 2016 at 6:33:29 AM UTC-7, gordo...@gmail.com wrote: Further to the subject of compiler efficiency, the following is the assembler code output with array bounds checking turned off (-B) the the inner tight composite culling loop of FasterEratspeed above (generated with go tool

Re: [go-nuts] [ANN] primegen.go: Sieve of Atkin prime number generator

2016-06-20 Thread gordonbgood
On Monday, June 20, 2016 at 6:33:29 AM UTC-7, gordo...@gmail.com wrote: Further to the subject of compiler efficiency, the following is the assembler code output with array bounds checking turned off (-B) the the inner tight composite culling loop of FasterEratspeed above (generated with go tool

Re: [go-nuts] [ANN] primegen.go: Sieve of Atkin prime number generator

2016-06-20 Thread gordonbgood
Further to the subject of compiler efficiency, the following is the assembler code output with array bounds checking turned off (-B) the the inner tight composite culling loop of FasterEratspeed above (generated with go tool compile -B -S FasterEratspeed.go > FasterEratspeed.asm): 0x005

Re: [go-nuts] [ANN] primegen.go: Sieve of Atkin prime number generator

2016-06-20 Thread gordonbgood
Further to the subject of compiler efficiency, the following is the assembler code output with array bounds checking turned off (-B) the the inner tight composite culling loop of FasterEratspeed above (generated with go tool compile -B -S FasterEratspeed.go > FasterEratspeed.asm): 0x005

Re: [go-nuts] [ANN] primegen.go: Sieve of Atkin prime number generator

2016-06-20 Thread gordonbgood
Edit: Corrected, forgot to change the new count routine to 48 from 8... We have proven with the straight translation of Daniel Bernstein's C code to golang that the Sieve of Eratosthenes (SoE) eratspeed runs about the same speed as the Sieve of Atkin (SoA) primespeed for the same range with the

Re: [go-nuts] Re: [ANN] primegen.go: Sieve of Atkin prime number generator

2016-06-20 Thread gordonbgood
We have proven with the straight translation of Daniel Bernstein's C code to golang that the Sieve of Eratosthenes (SoE) eratspeed rans about the same speed as the Sieve of Atkin (SoA) primespeed for the same range with the same optimizations and with John Barham's multi-treading for primespeed

Re: [go-nuts] Re: [ANN] primegen.go: Sieve of Atkin prime number generator

2016-06-18 Thread gordonbgood
On Saturday, June 18, 2016 at 12:21:21 AM UTC+2, gordo...@gmail.com wrote: On Friday, June 17, 2016 at 4:48:06 PM UTC+2, gordo...@gmail.com wrote: > > I am not on a high end Intel CPU now, but when I was I found that with a > > buffer size adjusted to the L1 cache size (8192 32-bit words or 32 >

Re: [go-nuts] Re: [ANN] primegen.go: Sieve of Atkin prime number generator

2016-06-17 Thread gordonbgood
Here is a golang version of Daniel Bernstein's "eratspeed", which is a straight translation of his C code to golang without any changes other than as necessary to make it work: https://play.golang.org/p/Sd6qlMQcHF. It won't run on the golang playground because it takes too long unless one chan

Re: [go-nuts] Re: [ANN] primegen.go: Sieve of Atkin prime number generator

2016-06-17 Thread gordonbgood
On Friday, June 17, 2016 at 4:48:06 PM UTC+2, gordo...@gmail.com wrote: > > I don't see the point to the exercise as far as optimizing golang is > > concerned. > It is a general rule that using more registers results in faster code. Yes, except when they're wasted as in using the extra register

Re: [go-nuts] Re: [ANN] primegen.go: Sieve of Atkin prime number generator

2016-06-17 Thread gordonbgood
I don't see the point to the exercise as far as optimizing golang is concerned. Your experiment just shows that Your compiler (GCC?) missed an optimization as far as reducing backend latency goes. You may also find that swapping the order of some of the instructions such as the second and the

Re: [go-nuts] Re: [ANN] primegen.go: Sieve of Atkin prime number generator

2016-06-17 Thread gordonbgood
I don't see the point to the exercise as far as optimizing golang is concerned. Your experiment just shows that Your compiler (GCC?) missed an optimization as far as reducing backend latency goes. You may also find that swapping the order of some of the instructions such as the second and the

Re: [go-nuts] Re: [ANN] primegen.go: Sieve of Atkin prime number generator

2016-06-17 Thread gordonbgood
I don't see the point of the exercise other than it proves that not putting the result of an operation back into the same register reduces the latency slightly for your processor (whatever it is?); I suspect that if you used any other register such as the unused AX register rather then the R11 r

Re: [go-nuts] Re: [ANN] primegen.go: Sieve of Atkin prime number generator

2016-06-17 Thread gordonbgood
> Looks like something is wrong with immediate loading for the 1 << ... > operation. Could you open a bug with repro instructions? I can look at it > when 1.8 opens. I have opened a golang issue #16092 as follows: https://github.com/golang/go/issues/16092 I may have over-complicated it, but

Re: [go-nuts] Re: [ANN] primegen.go: Sieve of Atkin prime number generator

2016-06-16 Thread gordonbgood
> Modern x86 CPUs don't work like that. > In general, optimally scheduled assembly code which uses more registers has > higher performance than optimally scheduled assembly code which uses smaller > number of registers. Assuming both assembly codes correspond to the same > source code. > Regis

Re: [go-nuts] Re: [ANN] primegen.go: Sieve of Atkin prime number generator

2016-06-16 Thread gordonbgood
>. So for this tight loop, golang is still slower than optimized C/C++ >. code, but not by very much if array bounds checks are disabled. > It appears you're well versed in the x86 process instruction set and > code generation. Could you may be offer help to the folks working on > improving t

Re: [go-nuts] Re: [ANN] primegen.go: Sieve of Atkin prime number generator

2016-06-16 Thread gordonbgood
No real surprises with the no bounds checks option (-B), it just eliminated the array bounds checks with the rest of the code the same (version 1.7beta1): 0x00dd 00221 (main.go:37) MOVQDI, CX 0x00e0 00224 (main.go:37) SHRQ$5, DI 0x00e4 00228 (main.go:37

Re: [go-nuts] Re: [ANN] primegen.go: Sieve of Atkin prime number generator

2016-06-16 Thread gordonbgood
No real surprises with the no bounds checks option (-B), it just eliminated the array bounds checks with the rest of the code the same (version 1.7beta1): 0x00dd 00221 (main.go:37) MOVQDI, CX 0x00e0 00224 (main.go:37) SHRQ$5, DI 0x00e4 00228 (main.go:37

Re: [go-nuts] Re: [ANN] primegen.go: Sieve of Atkin prime number generator

2016-06-15 Thread gordonbgood
On Thursday, 16 June 2016 10:11:12 UTC+7, Nigel Tao wrote: > > On Thu, Jun 16, 2016 at 11:22 AM, > > wrote: > > it no longer does the array bounds check twice > > It's not safe, but you might want to try disabling bounds checking by > "go tool compile -B" or "go build -gcflags=-B" > In fact,

[go-nuts] Re: [ANN] primegen.go: Sieve of Atkin prime number generator

2016-06-15 Thread gordonbgood
golang version 1.7beta1 does indeed help, and the time is now not much worse than C#/Java, but still not as good as C/C++ due to the single array bounds check: Using the same procedure to obtain an assembly listing (go tool compiler -S PrimeTest.go > PrimeTest.s): line 36for j := (

[go-nuts] Re: [ANN] primegen.go: Sieve of Atkin prime number generator

2016-06-15 Thread gordonbgood
While I am sure you are correct that some numeric code with golang can run as fast as C/C++, it seems that bit-packed tight loops such as used for culling composites in the Sieve of Eratosthenes (or the Sieve of Atkin for that matter) are not as optimized. Of course one of the problems that gol

[go-nuts] [ANN] primegen.go: Sieve of Atkin prime number generator

2016-06-14 Thread gordonbgood
That's interesting but your time or 0.7 seconds on your specified machine isn't very informative: you should at least specify how long the original version of primegen (which is not multithreaded) takes, which we assume as about 0.7 seconds, and also how long your golang version takes without mu