[go-nuts] Table-driven benchmarks defeat inlining

Paul S. R. Chisholm Mon, 31 May 2021 15:29:13 -0700

This is not a serious problem, but it surprised me. (By the way, how can I
post a message with code formatting?)


I'd like to create table-driven benchmarks:
    https://blog.golang.org/subtests.

To that end, I started with code from *The Go Programming Language*:

// PopCount is based on an example from chapter 2 of THE GO PROGRAMMING
LANGUAGE.
// Copyright © 2016 Alan A. A. Donovan & Brian W. Kernighan.
// License: https://creativecommons.org/licenses/by-nc-sa/4.0/

package popcount

// pc[i] is the population count of i.
var pc [256]byte

func init() {
for i := range pc {
pc[i] = pc[i/2] + byte(i&1)
}
}

// PopCount returns the population count (number of set bits) of x.
func PopCount(x uint64) int {
return int(pc[byte(x>>(0*8))] +
pc[byte(x>>(1*8))] +
pc[byte(x>>(2*8))] +
pc[byte(x>>(3*8))] +
pc[byte(x>>(4*8))] +
pc[byte(x>>(5*8))] +
pc[byte(x>>(6*8))] +
pc[byte(x>>(7*8))])
}

I then wrote a simple test:

// popcount_simple_test.go
package popcount

import "testing"

func BenchmarkPopCountSimple(b *testing.B) {
sum := 0 // Avoid dead code elimination.
for i := 0; i < b.N; i++ {
sum += PopCount(0x1234567890abcdef)
}
}

Because I was paranoid about dead code elimination, I wrote an identical
test that calls the function many times (inside the loop to call it b.N
times, which should make no difference if the hypothetically-still-dead
code is being eliminated):

// popcount_slow_simple_test.go
package popcount

import "testing"

func BenchmarkPopCountSlowSimple(b *testing.B) {
sum := 0 // Avoid dead code elimination.
for i := 0; i < b.N; i++ {
// Exactly as before, but call the function many times.
for j:= 0; j < 1_000_000; j++ {
sum += PopCount(0x1234567890abcdef)
}
}
}

Then I wrote an "almost simple" test that uses testing.B.Run() but is
hardwired to call the function being benchmarked:

// popcount_almost_simple_test.go
package popcount

import "testing"

func BenchmarkPopCountAlmostSimple(b *testing.B) {
b.Run("BenchmarkPopCountAlmostSimple", func(b *testing.B) {
sum := 0 // Avoid dead code elimination.
for i := 0; i < b.N; i++ {
sum += PopCount(0x1234567890abcdef)
}
})
}

Finally, I wrote a "nearly-simple" test that passes arguments to Run() that
could have come from a table:

// popcount_nearly_simple.go
package popcount

import "testing"

func BenchmarkPopCountNearlySimple(b *testing.B) {
f := PopCount
name := "PopCountNearlySimple"
b.Run(name, func(b *testing.B) {
sum := 0 // Avoid dead code elimination.
for i := 0; i < b.N; i++ {
sum += f(0x1234567890abcdef)
}
})
}

The simple and almost-simple results are nearly identical (happily, the
slow results are not), but the nearly-simple results are an order of
magnitude slower:

$ go version
go version go1.16.4 windows/amd64
$ go test -cpu=1 -bench=.
goos: windows
goarch: amd64
pkg: gopl.io/popcount
cpu: Intel(R) Core(TM) i5 CPU         760  @ 2.80GHz
BenchmarkPopCountAlmostSimple/BenchmarkPopCountAlmostSimple
 1000000000               0.6102 ns/op
BenchmarkPopCountNearlySimple/PopCountNearlySimple
194925662                6.197 ns/op
BenchmarkPopCountSimple
 1000000000               0.5994 ns/op
BenchmarkPopCountSlowSimple
 1953            606194 ns/op
PASS
ok      gopl.io/popcount        4.534s

After reading this article:

https://medium.com/a-journey-with-go/go-inlining-strategy-limitation-6b6d7fc3b1be

I ran:

$ go test -cpu=1 -bench=. -gcflags=-m 2>&1 | egrep 'inlining call to
PopCount'
.\popcount_almost_simple_test.go:9:19: inlining call to PopCount
.\popcount_simple_test.go:8:18: inlining call to PopCount
.\popcount_slow_simple_test.go:10:19: inlining call to PopCount

That makes sense. 0.6 ns is less than a function call. The simple and
almost-simple benchmarks can inline the call to the hardwired function, but
the nearly-simple benchmark can't.

In practice, this isn't much of a problem. Any function small enough to be
inlined is unlikely to be a performance bottleneck. If it ever is, a
non-table-driven benchmark can still measure it.

Hope this helps. --PSRC

P.S.: Out of curiosity, how can I post a message with fancy code examples
like this one?
    https://groups.google.com/g/golang-nuts/c/5DgtH2Alt_I/m/hlsqdRSGAgAJ

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CAG%3D3cjmT5CO3sTptR-dy3ByQB65EHGhYJCYuY_ZuQS52EOaJXQ%40mail.gmail.com.

// PopCount is based on an example from chapter 2 of THE GO PROGRAMMING LANGUAGE.
// Copyright Â© 2016 Alan A. A. Donovan & Brian W. Kernighan.
// License: https://creativecommons.org/licenses/by-nc-sa/4.0/


package popcount

// pc[i] is the population count of i.
var pc [256]byte

func init() {
	for i := range pc {
		pc[i] = pc[i/2] + byte(i&1)
	}
}

// PopCount returns the population count (number of set bits) of x.
func PopCount(x uint64) int {
	return int(pc[byte(x>>(0*8))] +
		pc[byte(x>>(1*8))] +
		pc[byte(x>>(2*8))] +
		pc[byte(x>>(3*8))] +
		pc[byte(x>>(4*8))] +
		pc[byte(x>>(5*8))] +
		pc[byte(x>>(6*8))] +
		pc[byte(x>>(7*8))])
}

package popcount

import "testing"

func BenchmarkPopCountAlmostSimple(b *testing.B) {
	b.Run("BenchmarkPopCountAlmostSimple", func(b *testing.B) {
		sum := 0 // Avoid dead code elimination.
		for i := 0; i < b.N; i++ {
			sum += PopCount(0x1234567890abcdef)
		}
	})
}

package popcount

import "testing"

func BenchmarkPopCountSlowSimple(b *testing.B) {
	sum := 0 // Avoid dead code elimination.
	for i := 0; i < b.N; i++ {
		// Exactly as before, but call the function many times.
		for j:= 0; j < 1_000_000; j++ {
			sum += PopCount(0x1234567890abcdef)
		}
	}
}

package popcount

import "testing"

func BenchmarkPopCountSimple(b *testing.B) {
	sum := 0 // Avoid dead code elimination.
	for i := 0; i < b.N; i++ {
		sum += PopCount(0x1234567890abcdef)
	}
}

package popcount

import "testing"

func BenchmarkPopCountNearlySimple(b *testing.B) {
	f := PopCount
	name := "PopCountNearlySimple"
	b.Run(name, func(b *testing.B) {
		sum := 0 // Avoid dead code elimination.
		for i := 0; i < b.N; i++ {
			sum += f(0x1234567890abcdef)
		}
	})
}

[go-nuts] Table-driven benchmarks defeat inlining

Reply via email to