and at the end of the day, the difference per access is 1ns, 3-4 
instructions on a modern Intel processor.

On Friday, 6 January 2017 11:21:32 UTC+11, Keith Randall wrote:
>
> You're not really testing what you think you are testing.
>
> When you do "_ = load something", the compiler just throws away the load. 
>  You have to use the result somehow to keep the load in the final assembly.
>
> What you are actually timing is the speed of the modulo operator (%).
>
> For the pointer case, you're doing unsigned i % 256, which the compiler 
> reduces to i&255.
> For the array case, you're doing signed i % 256, which the compiler 
> reduces to a multiply/few shift combo.
> For the slice case, you're doing signed i % j, as the compiler can't 
> assume the length of the slice is always 256 (as it is a mutable global). 
>  That requires an actual hardware divide instruction plus some fixup code.
>
> Bottom line - microbenchmarking is hard.
>
> On Thursday, January 5, 2017 at 4:02:09 AM UTC-8, Uli Kunitz wrote:
>>
>> A few comments:
>>
>> For such microbenchmarks you need to check the assembler for 
>> optimizations. The C code probably removed the complete loop.
>>
>> The Go version and machine architecture is relevant. There were 
>> significant changes recently, particular with the introduction of SSA 
>> (static single assignment) for amd64.
>>
>> Usally for _, x := range <slice/array> is much faster than direct 
>> access. 
>>
>> The difference between array and slice is probably that the slice access 
>> must read the pointer to the backing array before the actual value can be 
>> accessed. This step is not required for arrays.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to