You're not really testing what you think you are testing.

When you do "_ = load something", the compiler just throws away the load. 
 You have to use the result somehow to keep the load in the final assembly.

What you are actually timing is the speed of the modulo operator (%).

For the pointer case, you're doing unsigned i % 256, which the compiler 
reduces to i&255.
For the array case, you're doing signed i % 256, which the compiler reduces 
to a multiply/few shift combo.
For the slice case, you're doing signed i % j, as the compiler can't assume 
the length of the slice is always 256 (as it is a mutable global).  That 
requires an actual hardware divide instruction plus some fixup code.

Bottom line - microbenchmarking is hard.

On Thursday, January 5, 2017 at 4:02:09 AM UTC-8, Uli Kunitz wrote:
>
> A few comments:
>
> For such microbenchmarks you need to check the assembler for 
> optimizations. The C code probably removed the complete loop.
>
> The Go version and machine architecture is relevant. There were 
> significant changes recently, particular with the introduction of SSA 
> (static single assignment) for amd64.
>
> Usally for _, x := range <slice/array> is much faster than direct access. 
>
> The difference between array and slice is probably that the slice access 
> must read the pointer to the backing array before the actual value can be 
> accessed. This step is not required for arrays.
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to