I don't see the point to the exercise as far as optimizing golang is concerned. Your experiment just shows that Your compiler (GCC?) missed an optimization as far as reducing backend latency goes.
You may also find that swapping the order of some of the instructions such as the second and the third in the loop may also reduce backend latency further. I am not on a high end Intel CPU now, but when I was I found that with a buffer size adjusted to the L1 cache size (8192 32-bit words or 32 Kilobytes) that eratspeed ran on an Intel 2700K @ 3.5 GHz at about 3.5 clock cycles per loop (about 405,000,000 loops for this range). My current AMD Bulldozer CPU has a severe cache bottleneck and can't come close to this speed by a factor of about two. -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.