Re: [go-nuts] strange benchmark result: inline is slower than function calls

T L Sat, 29 Apr 2017 23:00:47 -0700


On Sunday, April 30, 2017 at 7:33:38 AM UTC+8, Ian Lance Taylor wrote:
>
> On Sat, Apr 29, 2017 at 1:43 AM, T L <tapi...@gmail.com <javascript:>> 
> wrote: 
> > 
> > package main 
> > 
> > import ( 
> >     "testing" 
> > ) 
> > 
> > const N = 4096 
> > type T int64 
> > var a [N]T 
> > 
> > var globalSum T 
> > 
> > func sumByLoopArray_a(p *[N]T) T { 
> >     var sum T 
> >     for i := 0; i < len(p); i++ { 
> >         sum += T(p[i]) 
> >     } 
> >     return sum 
> > } 
> > 
> > func sumByLoopArray_b(p *[N]T) T { 
> >     var sum T 
> >     for i := 0; i < len(*p); i++ { 
> >         sum += T((*p)[i]) 
> >     } 
> >     return sum 
> > } 
> > 
> > //============================================ 
> > 
> > func Benchmark_LoopArray_0a(b *testing.B) { 
> >     for i := 0; i < b.N; i++ { 
> >         var sum T 
> >         for i := 0; i < len(a); i++ { 
> >             sum += T(a[i]) 
> >         } 
> >         globalSum = sum 
> >     } 
> > } 
> > 
> > func Benchmark_LoopArray_0b(b *testing.B) { 
> >     for i := 0; i < b.N; i++ { 
> >         var sum T 
> >         p := &a 
> >         for i := 0; i < len(p); i++ { 
> >             sum += T(p[i]) 
> >         } 
> >         globalSum = sum 
> >     } 
> > } 
> > 
> > func Benchmark_LoopArray_1a(b *testing.B) { 
> >     for i := 0; i < b.N; i++ { 
> >         globalSum = sumByLoopArray_a(&a) 
> >     } 
> > } 
> > 
> > func Benchmark_LoopArray_1b(b *testing.B) { 
> >     for i := 0; i < b.N; i++ { 
> >         globalSum = sumByLoopArray_b(&a) 
> >     } 
> > } 
> > 
> > /* output: 
> > 
> > $ go test . -bench=. 
> > Benchmark_LoopArray_0a-4         300000          5248 ns/op 
> > Benchmark_LoopArray_0b-4         300000          5240 ns/op 
> > Benchmark_LoopArray_1a-4         500000          3942 ns/op 
> > Benchmark_LoopArray_1b-4         300000          3936 ns/op 
> > */ 
> > 
> > why? 
>
> Benchmarking is hard. 
>
> If you really want to know why, you will have to look at the generated 
> assembly code.  There are likely to be differences there.  For 
> example, your 1a and 1b loops use a pointer to a global variable, but 
> your 0a and 0b loops use a global variable directly.  In general 
> references through a pointer are more efficient than references to a 
> named global variable.  I don't know if that is the difference here, 
> but it could be. 
>
> It could also be something difficult to control for, like loop alignment. 
>
> Ian 
>


Thanks, Ian.

Yes, for 0a, it is reasonable that the reference to a global array really 
matters.
But for 0b, the p pointer is a local variable, so it shouldn't be much 
different with 1a and 1b.

If I create a local array for 0a and 0b, then their results will become 
much better, but still a little slower than 1a and 1b.

I write a small program and use "-gcflags -S" to check the assembly output:

package main

const N = 4096
type T int64
    
func fffffff (p *[N]T) T {
    var sum T
    for i := 0; i < len(p); i++ {
        sum += T(p[i])
    }
    return sum
}

func ggggggg () T {
    var a [N]T
    var sum T
    for i := 0; i < len(a); i++ {
        sum += T(a[i])
    }
    return sum    
}

func main() {
}

The assembly code for fffffff and ggggggg are much different.
I am not familiar with go assembly now, so I can't make a conclusion why 
ggggggg is slower.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [go-nuts] strange benchmark result: inline is slower than function calls

Reply via email to