golang version 1.7beta1 does indeed help, and the time is now not much worse 
than C#/Java, but still not as good as C/C++ due to the single array bounds 
check:

Using the same procedure to obtain an assembly listing (go tool compiler -S 
PrimeTest.go > PrimeTest.s):

line 36            for j := (p*p - 3) >> 1; j <= lmtndx; j += p { 
line 37                cmpsts[j>>5] |= 1 << (j & 31) 
line 38            } 

        0x00f1 00241 (Main.go:37)       MOVQ    R8, CX  ;; move 'j' in r8 to r
        0x00f4 00244 (Main.go:37)       SHRQ    $5, R8  ;; shift cx right by 5 
to get word address
        0x00f8 00248 (Main.go:37)       CMPQ    R8, DX  ;; array bounds check 
to array length stored in dx
        0x00fb 00251 (Main.go:37)       JCC     $0, 454 ;; panic if fail bounds 
check
        0x0101 00257 (Main.go:37)       MOVL    (AX)(R8*4), R10 ;; get element 
to r10 in one step
        0x0105 00261 (Main.go:37)       MOVQ    CX, R11 ;; save 'j' for later 
in r11
        0x0108 00264 (Main.go:37)       ANDQ    $31, CX ;; leave 'j' & 31 in cx
        0x010c 00268 (Main.go:37)       MOVL    R9, R12 ;; save r9 to r12 to 
preserve the 1 it contains - WHY NOT JUST MAKE R12 CONTAIN 1 AT ALL TIMES IF 
USING IT IS QUICKER THAN AN IMMEDIATE LOAD
        0x010f 00271 (Main.go:37)       SHLL    CX, R9  ;; R9 SHOULD JUST BE 
LOADED WITH 1 ABOVE - now cx contains 1 << ('j' & 31)
        0x0112 00274 (Main.go:37)       ORL     R10, R9 ;; r9 contains cmpsts[j 
>> 5] | (1 << ('j' & 31)) - the bit or is done here
        0x0115 00277 (Main.go:37)       MOVL    R9, (AX)(R8*4) ;; element now 
contains the modified value
        0x0119 00281 (Main.go:36)       LEAQ    3(R11)(DI*2), R8 ;; tricky way 
to calculate 'j' + 2 * 'j' + 3 where 2 * 'j' + 3 is p, answer to r8, saves a 
register
        0x011e 00286 (Main.go:37)       MOVL    R12, R9 ;; RESTORE R9 FROM R12 
- SHOULD NOT BE NECESSARY, but doesn't really cost in time as CPU is waiting 
for results of LEAQ operation
        0x0121 00289 (Main.go:36)       CMPQ    R8, BX  ;; check if 'j' in r8 
is up to limit stored in bx
        0x0124 00292 (Main.go:36)       JLS     $0, 241 ;; loop if not complete

This is much better than the 1.6.2 code in that it no longer does the array 
bounds check twice, although there is still the minor use of an extra r12 
register used to store 1 instead of using an immediate load of 1 into the r9 
register as above, where it could have been used to store 'p' to save a slight 
amount of time instead of the tricky code to calculate 'p' (quickly) every loop 
(the tricky bit is still about a half cycle slower than just using a 
pre-calculated 'p' value).  The C/C++ code will still be quicker, mainly 
because of no array bounds check for a couple of CPU clock cycles, but also 
because it is more efficient to use the single read/modify/write version of the 
ORL instruction instead of MOVL from the array element to a register, ORL with 
the bit modifier, then MOVL from the register back to the array element.  It 
seems it is now almost trying too hard to save registers at the cost of time in 
the tricky 'p' calculation, but costing registers for no gain or an actual loss 
in saving the 1 to a register.

So it is good to see that golang compiler optimization is taking some steps 
forward, but it isn't quite there yet.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to