Re: [go-nuts] Re: [ANN] primegen.go: Sieve of Atkin prime number generator

⚛ Thu, 16 Jun 2016 08:46:54 -0700

On Thursday, June 16, 2016 at 11:13:12 AM UTC+2, gordo...@gmail.com wrote:
>
> No real surprises with the no bounds checks option (-B), it just 
> eliminated the array bounds checks with the rest of the code the same 
> (version 1.7beta1): 
>
>         0x00dd 00221 (main.go:37)        MOVQ        DI, CX 
>         0x00e0 00224 (main.go:37)        SHRQ        $5, DI 
>         0x00e4 00228 (main.go:37)        MOVL        (AX)(DI*4), R9 
>         0x00e8 00232 (main.go:37)        MOVQ        CX, R10 
>         0x00eb 00235 (main.go:37)        ANDQ        $31, CX 
>         0x00ef 00239 (main.go:37)        MOVL        R8, R11 
>         0x00f2 00242 (main.go:37)        SHLL        CX, R8 
>         0x00f5 00245 (main.go:37)        ORL        R8, R9 
>         0x00f8 00248 (main.go:37)        MOVL        R9, (AX)(DI*4) 
>         0x00fc 00252 (main.go:36)        LEAQ        3(R10)(SI*2), DI 
>         0x0101 00257 (main.go:37)        MOVL        R11, R8 
>         0x0104 00260 (main.go:36)        CMPQ        DI, DX 
>         0x0107 00263 (main.go:36)        JLS        $0, 221 
>
> It is now almost as fast as C/C++ code, and isn't for the same reasons as 
> explained before:  excessively using registers to store things and not 
> using the read/modify/write instruction (which also saves the use of a 
> register). 
>
> The current beta will work not too badly with amd64 code but still doesn't 
> use registers efficiently enough to support x86 code as it uses too many 
> register.  optimized C/C++ code only uses six or at most 7 registers, which 
> the x86 architecture has, but not the nine registers that the above 
> requires. 
>
> So for this tight loop, golang is still slower than optimized C/C++ code, 
> but not by very much if array bounds checks are disabled.
>


Modern x86 CPUs don't work like that.

In general, optimally scheduled assembly code which uses more registers has 
higher performance than optimally scheduled assembly code which uses 
smaller number of registers. Assuming both assembly codes correspond to the 
same source code.

Register renaming: since Intel Pentium Pro and AMD K5.

Suggestion for reading: http://www.agner.org/optimize/microarchitecture.pdf

An excerpt from the above PDF document (Section 10 about Haswell and 
Broadwell pipeline): "... the register file has 168 integer registers and 
168 vector registers ..."

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [go-nuts] Re: [ANN] primegen.go: Sieve of Atkin prime number generator

Reply via email to