T L,

ffff.go : https://play.golang.org/p/GPGO7YnLxh

Peter

On Monday, May 1, 2017 at 1:15:38 PM UTC-4, peterGo wrote:
>
> T L,
>
> First, keep things simple. Delete the main function; it's an irrelevant 
> complication. Use shorter function names. Start simply and build one 
> feature at a time. Function gggg is exactly like ffff except that variables 
> are substituted for the function parameter. Function hhhh is exactly like 
> gggg except that the parameter variables are local variables not file 
> variables. Function iiii is exactly like hhhh except that variables are 
> used directly instead of indirectly through pointers.
>
> For example: https://play.golang.org/p/TsPPKv14Va
>
> It shouldn't be too hard to see what is happening and why function ffff is 
> fast (registers versus memory in loop):
>
> package tl
>
> const N = 4096
>
> type T int64
>
> func ffff(p *[N]T) T {
>     var sum T
>     for i := 0; i < len(p); i++ {
>         sum += T(p[i])
>     }
>     return sum
> }
>
> $ go tool compile -S ffff.go
>
> "".ffff t=1 size=49 args=0x10 locals=0x0
>     0x0000 00000 (ffff.go:7)    TEXT    "".ffff(SB), $0-16
>     0x0000 00000 (ffff.go:7)    FUNCDATA    $0, 
> gclocals·aef1f7ba6e2630c93a51843d99f5a28a(SB)
>     0x0000 00000 (ffff.go:7)    FUNCDATA    $1, 
> gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
>     0x0000 00000 (ffff.go:9)    MOVQ    "".p+8(FP), AX
>     0x0005 00005 (ffff.go:9)    MOVQ    $0, CX
>     0x0007 00007 (ffff.go:7)    MOVQ    $0, DX
>     0x0009 00009 (ffff.go:9)    CMPQ    CX, $4096
>     0x0010 00016 (ffff.go:9)    JGE    $0, 43
>     0x0012 00018 (ffff.go:10)    TESTB    AL, (AX)
>     0x0014 00020 (ffff.go:9)    LEAQ    1(CX), BX
>     0x0018 00024 (ffff.go:10)    MOVQ    (AX)(CX*8), SI
>     0x001c 00028 (ffff.go:10)    ADDQ    SI, DX
>     0x001f 00031 (ffff.go:9)    MOVQ    BX, CX
>     0x0022 00034 (ffff.go:9)    CMPQ    CX, $4096
>     0x0029 00041 (ffff.go:9)    JLT    $0, 18
>     0x002b 00043 (ffff.go:12)    MOVQ    DX, "".~r1+16(FP)
>     0x0030 00048 (ffff.go:12)    RET
>     0x0000 48 8b 44 24 08 31 c9 31 d2 48 81 f9 00 10 00 00  
> H.D$.1.1.H......
>     0x0010 7d 19 84 00 48 8d 59 01 48 8b 34 c8 48 01 f2 48  
> }...H.Y.H.4.H..H
>     0x0020 89 d9 48 81 f9 00 10 00 00 7c e7 48 89 54 24 10  
> ..H......|.H.T$.
>     0x0030 c3                                               .
> gclocals·33cdeccccebe80329f1fdbee7f5874cb t=8 dupok size=8
>     0x0000 01 00 00 00 00 00 00 00                          ........
> gclocals·aef1f7ba6e2630c93a51843d99f5a28a t=8 dupok size=9
>     0x0000 01 00 00 00 02 00 00 00 01                       .........
>
> $ go tool objdump ffff.o
>
> TEXT %22%22.ffff(SB) ffff.go
>     ffff.go:9    0x325    488b442408    MOVQ 0x8(SP), AX    
>     ffff.go:9    0x32a    31c9        XORL CX, CX        
>     ffff.go:7    0x32c    31d2        XORL DX, DX        
>     ffff.go:9    0x32e    4881f900100000    CMPQ $0x1000, CX    
>     ffff.go:9    0x335    7d19        JGE 0x350        
>     ffff.go:10    0x337    8400        TESTB AL, 0(AX)        
>     ffff.go:9    0x339    488d5901    LEAQ 0x1(CX), BX    
>     ffff.go:10    0x33d    488b34c8    MOVQ 0(AX)(CX*8), SI    
>     ffff.go:10    0x341    4801f2        ADDQ SI, DX        
>     ffff.go:9    0x344    4889d9        MOVQ BX, CX        
>     ffff.go:9    0x347    4881f900100000    CMPQ $0x1000, CX    
>     ffff.go:9    0x34e    7ce7        JL 0x337        
>     ffff.go:12    0x350    4889542410    MOVQ DX, 0x10(SP)    
>     ffff.go:12    0x355    c3        RET            
>
> References:
>
> Names
> https://research.swtch.com/names
>
> A Quick Guide to Go's Assembler
> https://golang.org/doc/asm
>
> GopherCon 2016: Rob Pike - The Design of the Go Assembler
> https://www.youtube.com/watch?v=KINIAgRpkDA
>
> Command compile
> https://golang.org/cmd/compile/
>
> Command nm
> https://golang.org/cmd/nm/
>
> Command objdump
> https://golang.org/cmd/objdump/
>
> Command asm
> https://golang.org/cmd/asm/
>
> A Foray Into Go Assembly Programming
> https://blog.sgmansfield.com/2017/04/a-foray-into-go-assembly-programming/
>
> Intel® 64 and IA-32 Architectures Software Developer Manuals
> https://software.intel.com/en-us/articles/intel-sdm
>
> Peter
>
> On Sunday, April 30, 2017 at 2:00:03 AM UTC-4, T L wrote:
>>
>>
>>
>> On Sunday, April 30, 2017 at 7:33:38 AM UTC+8, Ian Lance Taylor wrote:
>>>
>>> On Sat, Apr 29, 2017 at 1:43 AM, T L <tapi...@gmail.com> wrote: 
>>> > 
>>> > package main 
>>> > 
>>> > import ( 
>>> >     "testing" 
>>> > ) 
>>> > 
>>> > const N = 4096 
>>> > type T int64 
>>> > var a [N]T 
>>> > 
>>> > var globalSum T 
>>> > 
>>> > func sumByLoopArray_a(p *[N]T) T { 
>>> >     var sum T 
>>> >     for i := 0; i < len(p); i++ { 
>>> >         sum += T(p[i]) 
>>> >     } 
>>> >     return sum 
>>> > } 
>>> > 
>>> > func sumByLoopArray_b(p *[N]T) T { 
>>> >     var sum T 
>>> >     for i := 0; i < len(*p); i++ { 
>>> >         sum += T((*p)[i]) 
>>> >     } 
>>> >     return sum 
>>> > } 
>>> > 
>>> > //============================================ 
>>> > 
>>> > func Benchmark_LoopArray_0a(b *testing.B) { 
>>> >     for i := 0; i < b.N; i++ { 
>>> >         var sum T 
>>> >         for i := 0; i < len(a); i++ { 
>>> >             sum += T(a[i]) 
>>> >         } 
>>> >         globalSum = sum 
>>> >     } 
>>> > } 
>>> > 
>>> > func Benchmark_LoopArray_0b(b *testing.B) { 
>>> >     for i := 0; i < b.N; i++ { 
>>> >         var sum T 
>>> >         p := &a 
>>> >         for i := 0; i < len(p); i++ { 
>>> >             sum += T(p[i]) 
>>> >         } 
>>> >         globalSum = sum 
>>> >     } 
>>> > } 
>>> > 
>>> > func Benchmark_LoopArray_1a(b *testing.B) { 
>>> >     for i := 0; i < b.N; i++ { 
>>> >         globalSum = sumByLoopArray_a(&a) 
>>> >     } 
>>> > } 
>>> > 
>>> > func Benchmark_LoopArray_1b(b *testing.B) { 
>>> >     for i := 0; i < b.N; i++ { 
>>> >         globalSum = sumByLoopArray_b(&a) 
>>> >     } 
>>> > } 
>>> > 
>>> > /* output: 
>>> > 
>>> > $ go test . -bench=. 
>>> > Benchmark_LoopArray_0a-4         300000          5248 ns/op 
>>> > Benchmark_LoopArray_0b-4         300000          5240 ns/op 
>>> > Benchmark_LoopArray_1a-4         500000          3942 ns/op 
>>> > Benchmark_LoopArray_1b-4         300000          3936 ns/op 
>>> > */ 
>>> > 
>>> > why? 
>>>
>>> Benchmarking is hard. 
>>>
>>> If you really want to know why, you will have to look at the generated 
>>> assembly code.  There are likely to be differences there.  For 
>>> example, your 1a and 1b loops use a pointer to a global variable, but 
>>> your 0a and 0b loops use a global variable directly.  In general 
>>> references through a pointer are more efficient than references to a 
>>> named global variable.  I don't know if that is the difference here, 
>>> but it could be. 
>>>
>>> It could also be something difficult to control for, like loop 
>>> alignment. 
>>>
>>> Ian 
>>>
>>
>> Thanks, Ian.
>>
>> Yes, for 0a, it is reasonable that the reference to a global array really 
>> matters.
>> But for 0b, the p pointer is a local variable, so it shouldn't be much 
>> different with 1a and 1b.
>>
>> If I create a local array for 0a and 0b, then their results will become 
>> much better, but still a little slower than 1a and 1b.
>>
>> I write a small program and use "-gcflags -S" to check the assembly 
>> output:
>>
>> package main
>>
>> const N = 4096
>> type T int64
>>     
>> func fffffff (p *[N]T) T {
>>     var sum T
>>     for i := 0; i < len(p); i++ {
>>         sum += T(p[i])
>>     }
>>     return sum
>> }
>>
>> func ggggggg () T {
>>     var a [N]T
>>     var sum T
>>     for i := 0; i < len(a); i++ {
>>         sum += T(a[i])
>>     }
>>     return sum    
>> }
>>
>> func main() {
>> }
>>
>> The assembly code for fffffff and ggggggg are much different.
>> I am not familiar with go assembly now, so I can't make a conclusion why 
>> ggggggg is slower.
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to