On Sunday, April 30, 2017 at 7:33:38 AM UTC+8, Ian Lance Taylor wrote: > > On Sat, Apr 29, 2017 at 1:43 AM, T L <tapi...@gmail.com <javascript:>> > wrote: > > > > package main > > > > import ( > > "testing" > > ) > > > > const N = 4096 > > type T int64 > > var a [N]T > > > > var globalSum T > > > > func sumByLoopArray_a(p *[N]T) T { > > var sum T > > for i := 0; i < len(p); i++ { > > sum += T(p[i]) > > } > > return sum > > } > > > > func sumByLoopArray_b(p *[N]T) T { > > var sum T > > for i := 0; i < len(*p); i++ { > > sum += T((*p)[i]) > > } > > return sum > > } > > > > //============================================ > > > > func Benchmark_LoopArray_0a(b *testing.B) { > > for i := 0; i < b.N; i++ { > > var sum T > > for i := 0; i < len(a); i++ { > > sum += T(a[i]) > > } > > globalSum = sum > > } > > } > > > > func Benchmark_LoopArray_0b(b *testing.B) { > > for i := 0; i < b.N; i++ { > > var sum T > > p := &a > > for i := 0; i < len(p); i++ { > > sum += T(p[i]) > > } > > globalSum = sum > > } > > } > > > > func Benchmark_LoopArray_1a(b *testing.B) { > > for i := 0; i < b.N; i++ { > > globalSum = sumByLoopArray_a(&a) > > } > > } > > > > func Benchmark_LoopArray_1b(b *testing.B) { > > for i := 0; i < b.N; i++ { > > globalSum = sumByLoopArray_b(&a) > > } > > } > > > > /* output: > > > > $ go test . -bench=. > > Benchmark_LoopArray_0a-4 300000 5248 ns/op > > Benchmark_LoopArray_0b-4 300000 5240 ns/op > > Benchmark_LoopArray_1a-4 500000 3942 ns/op > > Benchmark_LoopArray_1b-4 300000 3936 ns/op > > */ > > > > why? > > Benchmarking is hard. > > If you really want to know why, you will have to look at the generated > assembly code. There are likely to be differences there. For > example, your 1a and 1b loops use a pointer to a global variable, but > your 0a and 0b loops use a global variable directly. In general > references through a pointer are more efficient than references to a > named global variable. I don't know if that is the difference here, > but it could be. > > It could also be something difficult to control for, like loop alignment. > > Ian >
Thanks, Ian. Yes, for 0a, it is reasonable that the reference to a global array really matters. But for 0b, the p pointer is a local variable, so it shouldn't be much different with 1a and 1b. If I create a local array for 0a and 0b, then their results will become much better, but still a little slower than 1a and 1b. I write a small program and use "-gcflags -S" to check the assembly output: package main const N = 4096 type T int64 func fffffff (p *[N]T) T { var sum T for i := 0; i < len(p); i++ { sum += T(p[i]) } return sum } func ggggggg () T { var a [N]T var sum T for i := 0; i < len(a); i++ { sum += T(a[i]) } return sum } func main() { } The assembly code for fffffff and ggggggg are much different. I am not familiar with go assembly now, so I can't make a conclusion why ggggggg is slower. -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.