Re: [go-nuts] [ANN] primegen.go: Sieve of Atkin prime number generator

gordonbgood Mon, 20 Jun 2016 18:40:37 -0700

On Monday, June 20, 2016 at 6:33:29 AM UTC-7, gordo...@gmail.com wrote:
Further to the subject of compiler efficiency, the following is the assembler 
code output with array bounds checking turned off (-B) the the inner tight 
composite culling loop of FasterEratspeed above (generated with go tool compile 
-B -S FasterEratspeed.go > FasterEratspeed.asm):


        0x0051 00081 (main.go:426)        MOVL        R11, CX 
        0x0054 00084 (main.go:426)        SHRL        $5, R11 
        0x0058 00088 (main.go:428)        MOVL        (R9)(R11*4), R13 
        0x005c 00092 (main.go:430)        MOVL        $1, R14 
        0x0062 00098 (main.go:430)        SHLL        CX, R14 
        0x0065 00101 (main.go:430)        ORL        R13, R14 
        0x0068 00104 (main.go:431)        MOVL        R14, (R9)(R11*4) 
        0x006c 00108 (main.go:429)        LEAL        (R12)(CX*1), R11 
        0x0070 00112 (main.go:425)        CMPL        R11, R8 
        0x0073 00115 (main.go:425)        JCS        $0, 81 

At 10 instructions, this is about as tight as it gets other than for using the 
more complex read/modify/write version of the ORL instruction, but that doesn't 
seem to save much if any time given instruction latencies.  Note that this code 
has eliminated the "k & 31" for the shift, seeming to recognize that it isn't 
necessary as a long shift can't be greater than 31

Getting rid of the &31 is easy and I'll do that in 1.8.
 
anyway, that unlike the simple PrimeSpeed program, this properly uses the 
immediate load of '1',

I don't know what the issue is yet, but it shouldn't be hard to fix in 1.8.
 
that it cleverly uses the LEAL instruction to add the prime value 'q' in R12 to 
the unmodified 'k' value in CX to produce the sum to the original location of 
'j' in R11 to save another instruction to move the results from CX to R11. 

The current SSA backend should do this also.
 
No, Keith, you seem to have misunderstood, I wasn't complaining above the above 
assembler codeas produced by the 1.7beta1 compiler, and I was wondering why it 
always isn't this good, which is about as good as it gets for this loop and 
already properly gets rid of &31, does a proper immediate load of 1, and the 
clever use of the LEA instruction without the misuse of the LEA instruction to 
continuously recalculate 'p'.  The assembler code above is produced by either 
of the below loop variations:

1) as it is in FasterEratspeed:

                                for k < lngthb {
                                        pos := k >> 5
                                        data := k & 31
                                        bits := buf[pos]
                                        k += q
                                        bits |= 1 << data // two[data]
                                        buf[pos] = bits
                                }

2) I get the same assembler code if I change this to the simpler:

                                for ; k < lngthb; k += q {
                                        buf[k>>5] |= 1 << (k & 31)
                                }

where all variables and buffers are uint32.

My question was, why did the compiler produce this very good code for both 
variations, yet produced something much worse for the same variation two loop 
in the simple PrimeSpeed code, with the main difference that PrimeSpeed uses 
64-bit uint for the loop variables and loop limit.  Does that give you a clue 
where the problem might be?  Converting PrimeSpeed to use uint32's as here 
fixed the continuous recalculation of 'p' but not the other problems.

It seems that sometimes the compiler erroneously tries to reduce register use 
without applying the cost in execution speed to the decision.  It is 
inconsistent, sometimes producing great code as here, and sometimes not so 
great as in PrimeSpeed.

I was looking for some general advice on how to format loops so they produce 
code as good as this?

Do you plan to include SSA for the x86 version as well?

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [go-nuts] [ANN] primegen.go: Sieve of Atkin prime number generator

Reply via email to