and at the end of the day, the difference per access is 1ns, 3-4 instructions on a modern Intel processor.
On Friday, 6 January 2017 11:21:32 UTC+11, Keith Randall wrote: > > You're not really testing what you think you are testing. > > When you do "_ = load something", the compiler just throws away the load. > You have to use the result somehow to keep the load in the final assembly. > > What you are actually timing is the speed of the modulo operator (%). > > For the pointer case, you're doing unsigned i % 256, which the compiler > reduces to i&255. > For the array case, you're doing signed i % 256, which the compiler > reduces to a multiply/few shift combo. > For the slice case, you're doing signed i % j, as the compiler can't > assume the length of the slice is always 256 (as it is a mutable global). > That requires an actual hardware divide instruction plus some fixup code. > > Bottom line - microbenchmarking is hard. > > On Thursday, January 5, 2017 at 4:02:09 AM UTC-8, Uli Kunitz wrote: >> >> A few comments: >> >> For such microbenchmarks you need to check the assembler for >> optimizations. The C code probably removed the complete loop. >> >> The Go version and machine architecture is relevant. There were >> significant changes recently, particular with the introduction of SSA >> (static single assignment) for amd64. >> >> Usally for _, x := range <slice/array> is much faster than direct >> access. >> >> The difference between array and slice is probably that the slice access >> must read the pointer to the backing array before the actual value can be >> accessed. This step is not required for arrays. >> > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.