T L, First, keep things simple. Delete the main function; it's an irrelevant complication. Use shorter function names. Start simply and build one feature at a time. Function gggg is exactly like ffff except that variables are substituted for the function parameter. Function hhhh is exactly like gggg except that the parameter variables are local variables not file variables. Function iiii is exactly like hhhh except that variables are used directly instead of indirectly through pointers.
For example: https://play.golang.org/p/TsPPKv14Va It shouldn't be too hard to see what is happening and why function ffff is fast (registers versus memory in loop): package tl const N = 4096 type T int64 func ffff(p *[N]T) T { var sum T for i := 0; i < len(p); i++ { sum += T(p[i]) } return sum } $ go tool compile -S ffff.go "".ffff t=1 size=49 args=0x10 locals=0x0 0x0000 00000 (ffff.go:7) TEXT "".ffff(SB), $0-16 0x0000 00000 (ffff.go:7) FUNCDATA $0, gclocals·aef1f7ba6e2630c93a51843d99f5a28a(SB) 0x0000 00000 (ffff.go:7) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (ffff.go:9) MOVQ "".p+8(FP), AX 0x0005 00005 (ffff.go:9) MOVQ $0, CX 0x0007 00007 (ffff.go:7) MOVQ $0, DX 0x0009 00009 (ffff.go:9) CMPQ CX, $4096 0x0010 00016 (ffff.go:9) JGE $0, 43 0x0012 00018 (ffff.go:10) TESTB AL, (AX) 0x0014 00020 (ffff.go:9) LEAQ 1(CX), BX 0x0018 00024 (ffff.go:10) MOVQ (AX)(CX*8), SI 0x001c 00028 (ffff.go:10) ADDQ SI, DX 0x001f 00031 (ffff.go:9) MOVQ BX, CX 0x0022 00034 (ffff.go:9) CMPQ CX, $4096 0x0029 00041 (ffff.go:9) JLT $0, 18 0x002b 00043 (ffff.go:12) MOVQ DX, "".~r1+16(FP) 0x0030 00048 (ffff.go:12) RET 0x0000 48 8b 44 24 08 31 c9 31 d2 48 81 f9 00 10 00 00 H.D$.1.1.H...... 0x0010 7d 19 84 00 48 8d 59 01 48 8b 34 c8 48 01 f2 48 }...H.Y.H.4.H..H 0x0020 89 d9 48 81 f9 00 10 00 00 7c e7 48 89 54 24 10 ..H......|.H.T$. 0x0030 c3 . gclocals·33cdeccccebe80329f1fdbee7f5874cb t=8 dupok size=8 0x0000 01 00 00 00 00 00 00 00 ........ gclocals·aef1f7ba6e2630c93a51843d99f5a28a t=8 dupok size=9 0x0000 01 00 00 00 02 00 00 00 01 ......... $ go tool objdump ffff.o TEXT %22%22.ffff(SB) ffff.go ffff.go:9 0x325 488b442408 MOVQ 0x8(SP), AX ffff.go:9 0x32a 31c9 XORL CX, CX ffff.go:7 0x32c 31d2 XORL DX, DX ffff.go:9 0x32e 4881f900100000 CMPQ $0x1000, CX ffff.go:9 0x335 7d19 JGE 0x350 ffff.go:10 0x337 8400 TESTB AL, 0(AX) ffff.go:9 0x339 488d5901 LEAQ 0x1(CX), BX ffff.go:10 0x33d 488b34c8 MOVQ 0(AX)(CX*8), SI ffff.go:10 0x341 4801f2 ADDQ SI, DX ffff.go:9 0x344 4889d9 MOVQ BX, CX ffff.go:9 0x347 4881f900100000 CMPQ $0x1000, CX ffff.go:9 0x34e 7ce7 JL 0x337 ffff.go:12 0x350 4889542410 MOVQ DX, 0x10(SP) ffff.go:12 0x355 c3 RET References: Names https://research.swtch.com/names A Quick Guide to Go's Assembler https://golang.org/doc/asm GopherCon 2016: Rob Pike - The Design of the Go Assembler https://www.youtube.com/watch?v=KINIAgRpkDA Command compile https://golang.org/cmd/compile/ Command nm https://golang.org/cmd/nm/ Command objdump https://golang.org/cmd/objdump/ Command asm https://golang.org/cmd/asm/ A Foray Into Go Assembly Programming https://blog.sgmansfield.com/2017/04/a-foray-into-go-assembly-programming/ Intel® 64 and IA-32 Architectures Software Developer Manuals https://software.intel.com/en-us/articles/intel-sdm Peter On Sunday, April 30, 2017 at 2:00:03 AM UTC-4, T L wrote: > > > > On Sunday, April 30, 2017 at 7:33:38 AM UTC+8, Ian Lance Taylor wrote: >> >> On Sat, Apr 29, 2017 at 1:43 AM, T L <tapi...@gmail.com> wrote: >> > >> > package main >> > >> > import ( >> > "testing" >> > ) >> > >> > const N = 4096 >> > type T int64 >> > var a [N]T >> > >> > var globalSum T >> > >> > func sumByLoopArray_a(p *[N]T) T { >> > var sum T >> > for i := 0; i < len(p); i++ { >> > sum += T(p[i]) >> > } >> > return sum >> > } >> > >> > func sumByLoopArray_b(p *[N]T) T { >> > var sum T >> > for i := 0; i < len(*p); i++ { >> > sum += T((*p)[i]) >> > } >> > return sum >> > } >> > >> > //============================================ >> > >> > func Benchmark_LoopArray_0a(b *testing.B) { >> > for i := 0; i < b.N; i++ { >> > var sum T >> > for i := 0; i < len(a); i++ { >> > sum += T(a[i]) >> > } >> > globalSum = sum >> > } >> > } >> > >> > func Benchmark_LoopArray_0b(b *testing.B) { >> > for i := 0; i < b.N; i++ { >> > var sum T >> > p := &a >> > for i := 0; i < len(p); i++ { >> > sum += T(p[i]) >> > } >> > globalSum = sum >> > } >> > } >> > >> > func Benchmark_LoopArray_1a(b *testing.B) { >> > for i := 0; i < b.N; i++ { >> > globalSum = sumByLoopArray_a(&a) >> > } >> > } >> > >> > func Benchmark_LoopArray_1b(b *testing.B) { >> > for i := 0; i < b.N; i++ { >> > globalSum = sumByLoopArray_b(&a) >> > } >> > } >> > >> > /* output: >> > >> > $ go test . -bench=. >> > Benchmark_LoopArray_0a-4 300000 5248 ns/op >> > Benchmark_LoopArray_0b-4 300000 5240 ns/op >> > Benchmark_LoopArray_1a-4 500000 3942 ns/op >> > Benchmark_LoopArray_1b-4 300000 3936 ns/op >> > */ >> > >> > why? >> >> Benchmarking is hard. >> >> If you really want to know why, you will have to look at the generated >> assembly code. There are likely to be differences there. For >> example, your 1a and 1b loops use a pointer to a global variable, but >> your 0a and 0b loops use a global variable directly. In general >> references through a pointer are more efficient than references to a >> named global variable. I don't know if that is the difference here, >> but it could be. >> >> It could also be something difficult to control for, like loop alignment. >> >> Ian >> > > Thanks, Ian. > > Yes, for 0a, it is reasonable that the reference to a global array really > matters. > But for 0b, the p pointer is a local variable, so it shouldn't be much > different with 1a and 1b. > > If I create a local array for 0a and 0b, then their results will become > much better, but still a little slower than 1a and 1b. > > I write a small program and use "-gcflags -S" to check the assembly output: > > package main > > const N = 4096 > type T int64 > > func fffffff (p *[N]T) T { > var sum T > for i := 0; i < len(p); i++ { > sum += T(p[i]) > } > return sum > } > > func ggggggg () T { > var a [N]T > var sum T > for i := 0; i < len(a); i++ { > sum += T(a[i]) > } > return sum > } > > func main() { > } > > The assembly code for fffffff and ggggggg are much different. > I am not familiar with go assembly now, so I can't make a conclusion why > ggggggg is slower. > > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.