Re: [go-nuts] Re: [ANN] primegen.go: Sieve of Atkin prime number generator

gordonbgood Fri, 24 Jun 2016 05:18:24 -0700


On Saturday, 18 June 2016 17:54:45 UTC+7, ⚛ wrote:
>
> On Saturday, June 18, 2016 at 12:21:21 AM UTC+2, gordo...@gmail.com wrote:
>>
>> On Friday, June 17, 2016 at 4:48:06 PM UTC+2, gordo...@gmail.com wrote: 
>> Have you tried compiling 
>
>


> eratspeed with a new version of GCC to see how it compares to Clang? 
>>
>
> gGcc-6.1 is slower. clang-3.9 is able to translate "bits = buf[pos]; bits 
> |= data; buf[pos] = bits;" into a single x86 instruction equivalent to 
> "buf[pos] |= data".
>

You are right:  Clang/LLVM is able to turn separate read and modify and 
write code into a single read/modify/write instruction; however, we don't 
have to write the code as Daniel Bernstein did and can write it as follows, 
in which case  GCC retains the read/modify/write single instruction loop:

                 for ; k <= lngthb; k += q {
                        buf[k>>5] |= 1 << (k & 31)
                 }

In fact, for 32-bit operations of 'k', lngthb, and 'q' and of course the 
buffer contents, we can skip the "& 31" (as the CPU limits the shift to a 
maximum of 31 by itself for 32-bit bit register ops) for an extra slight 
saving in time.

Golang version 1.7beta2 actually does pretty well with the above format 
other than not retaining the read/modify/write for this particular 
instance, and the loop speed is the same as the C generated code with 
Bernstein's hand optmizations, which did not use the single instruction.

Thus, this loop in golang1.7beta2 is as fast as the C Bernstein loop or 
maybe a bit faster, but that is just for this one case.   When I started 
writing extreme loop unrolling code to optimize this loop, which for C 
reduces the cycle time per culling loop to about 1.375 cycles on a 
non-stalling high end Intel CPU running amd64 mode, I can't get it down to 
less that about 4 cycles per loop for golang under those same conditions 
(x86 code in this case is slower because there aren't enough available 
registers, requiring register spills and reloads).  Note that as the number 
of cycles is reduced, the number of instructions per cycle is also reduced 
because of the more complex instruction that does more, without hardly any 
of the simple register to register instructions which are very fast to 
execute.

So golang is inconsistent in its optimizations, sometimes getting about the 
same speed as C, but in some other similar cases completely blowing it by 
as much as over three times as slow.  It seems that the C/C++'s are much 
more consistent in their speed, but then they have had much longer to 
mature.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [go-nuts] Re: [ANN] primegen.go: Sieve of Atkin prime number generator

Reply via email to