Ian, I don't know whether it is feasible. It is unnecessary.
A benchmark should be treated as a scientific experiment. If we do that with Paul's benchmarks and write them in scientific form, then we get the expected results. $ benchstat xpt.txt name time/op PopCountSimple-4 5.71ns ± 0% PopCountSlowSimple-4 5.70ms ± 0% PopCountAlmostSimple/BenchmarkPopCountAlmostSimple-4 5.70ns ± 0% PopCountNearlySimple/PopCountNearlySimple-4 5.69ns ± 0% $ Peter On Monday, June 7, 2021 at 6:14:05 AM UTC-4 Ian Davis wrote: > Would it be feasible for the Go tool to disable inlining and deadcode > elimination of code within the bodies of Benchmarks and Tests? Not the code > under test of course. It could be as crude as disabling these optimizations > for files in _test.go files. > > > > On Sun, 6 Jun 2021, at 1:33 PM, Paul S. R. Chisholm wrote: > > Thanks! I'd seen the "dead code elimination" comment somewhere and > questioned it, but not enough. > > If I worry about what some future Go compiler might optimize, I end up > worrying quite a lot! For example, could this code: > > func BenchmarkPopCountAlive(b *testing.B) { > sum = 0 > for i := 0; i < b.N; i++ { > sum += PopCount(0x1234567890abcdef) > } > } > > hypothetically be optimized to: > > func BenchmarkPopCountAlive(b *testing.B) { > sum = PopCount(0x1234567890abcdef) * b.N > } > > since PopCount() always returns the same value for the same argument? It > probably wouldn't, since that would break many existing benchmarks. > > Maybe more reasonably worrisome: If sum is initialized and incremented but > never read, could all references to it be optimized away? That's easy > enough to avoid, as is the optimization I worried about above: > > var sum = 0 // Avoid dead code elimination > const firstInput uint64 = 0x1234567890abcdef > > func BenchmarkPopCountSimple(b *testing.B) { > for i, input := 0, firstInput; i < b.N; i++ { > sum += PopCount(input) > input++ > } > if sum == 0 { > b.Error("BenchmarkPopCountSimple: sum == 0") > } > } > > Here are my new benchmark results: > > $ go version > go version go1.16.5 windows/amd64 > $ go test -cpu=1 -bench=. > goos: windows > goarch: amd64 > pkg: gopl.io/popcount > cpu: Intel(R) Core(TM) i5 CPU 760 @ 2.80GHz > BenchmarkPopCountSimple 271004790 4.419 ns/op > BenchmarkPopCountSlowSimple 278 4235890 ns/op > BenchmarkPopCountAlmostSimple/BenchmarkPopCountAlmostSimple > 272759134 4.434 ns/op > BenchmarkPopCountNearlySimple/PopCountNearlySimple > 145613077 8.172 ns/op > PASS > ok gopl.io/popcount 7.061s > > The anomalistically-low "simple" and "almost simple" results are now more > reasonable, but still nearly a factor of two lower than the "nearly simple" > results. They're still inlining the calls, as is the "slow simple" > benchmark, where the "nearly simple" one isn't: > > $ go test -cpu=1 -bench=. -gcflags=-m 2>&1 | egrep 'inlining call to > PopCount' > .\popcount_test.go:10:18: inlining call to PopCount > .\popcount_test.go:22:19: inlining call to PopCount > .\popcount_test.go:34:19: inlining call to PopCount > > Overkill, right? --PSRC > > P.S.: For better or worse, I discovered I can -- but shouldn't, based on > the feedback I got! -- get "fancy" formatting by copying from vscode and > pasting into Gmail. > > > > On Tuesday, June 1, 2021 at 12:01:51 PM UTC-4 peterGo wrote: > > Here is a more definitive reply than my original reply. > > > I got as far as this > func BenchmarkPopCountSimple(b *testing.B) { > sum := 0 // Avoid dead code elimination. > for i := 0; i < b.N; i++ { > sum += PopCount(0x1234567890abcdef) > } > } > > As you can see from the objdump, BenchmarkPopCountSimple dead code is > eliminated. > > func BenchmarkPopCountSimple(b *testing.B) { > 0x4e38c0 31c9 XORL CX, CX > > for i := 0; i < b.N; i++ { > 0x4e38c2 eb03 JMP 0x4e38c7 > 0x4e38c4 48ffc1 INCQ CX > 0x4e38c7 48398890010000 CMPQ CX, 0x190(AX) > 0x4e38ce 7ff4 JG 0x4e38c4 > } > 0x4e38d0 c3 RET > I added an additional BenchmarkPopCountAlive benchmark. > > var sum = 0 // Avoid dead code elimination. > > func BenchmarkPopCountAlive(b *testing.B) { > sum = 0 > > for i := 0; i < b.N; i++ { > sum += PopCount(0x1234567890abcdef) > } > } > > As you can see from the objdump, BenchmarkPopCountAlive code is not > eliminated. > > For details, see the popcount.txt attachment. > > Peter > > > On Monday, May 31, 2021 at 6:29:27 PM UTC-4 Paul S. R. Chisholm wrote: > > This is not a serious problem, but it surprised me. (By the way, how can I > post a message with code formatting?) > > I'd like to create table-driven benchmarks: > https://blog.golang.org/subtests. > > To that end, I started with code from *The Go Programming Language*: > > // PopCount is based on an example from chapter 2 of THE GO PROGRAMMING > LANGUAGE. > // Copyright © 2016 Alan A. A. Donovan & Brian W. Kernighan. > // License: https://creativecommons.org/licenses/by-nc-sa/4.0/ > > package popcount > > // pc[i] is the population count of i. > var pc [256]byte > > func init() { > for i := range pc { > pc[i] = pc[i/2] + byte(i&1) > } > } > > // PopCount returns the population count (number of set bits) of x. > func PopCount(x uint64) int { > return int(pc[byte(x>>(0*8))] + > pc[byte(x>>(1*8))] + > pc[byte(x>>(2*8))] + > pc[byte(x>>(3*8))] + > pc[byte(x>>(4*8))] + > pc[byte(x>>(5*8))] + > pc[byte(x>>(6*8))] + > pc[byte(x>>(7*8))]) > } > > I then wrote a simple test: > > // popcount_simple_test.go > package popcount > > import "testing" > > func BenchmarkPopCountSimple(b *testing.B) { > sum := 0 // Avoid dead code elimination. > for i := 0; i < b.N; i++ { > sum += PopCount(0x1234567890abcdef) > } > } > > Because I was paranoid about dead code elimination, I wrote an identical > test that calls the function many times (inside the loop to call it b.N > times, which should make no difference if the hypothetically-still-dead > code is being eliminated): > > // popcount_slow_simple_test.go > package popcount > > import "testing" > > func BenchmarkPopCountSlowSimple(b *testing.B) { > sum := 0 // Avoid dead code elimination. > for i := 0; i < b.N; i++ { > // Exactly as before, but call the function many times. > for j:= 0; j < 1_000_000; j++ { > sum += PopCount(0x1234567890abcdef) > } > } > } > > Then I wrote an "almost simple" test that uses testing.B.Run() but is > hardwired to call the function being benchmarked: > > // popcount_almost_simple_test.go > package popcount > > import "testing" > > func BenchmarkPopCountAlmostSimple(b *testing.B) { > b.Run("BenchmarkPopCountAlmostSimple", func(b *testing.B) { > sum := 0 // Avoid dead code elimination. > for i := 0; i < b.N; i++ { > sum += PopCount(0x1234567890abcdef) > } > }) > } > > Finally, I wrote a "nearly-simple" test that passes arguments to Run() > that could have come from a table: > > // popcount_nearly_simple.go > package popcount > > import "testing" > > func BenchmarkPopCountNearlySimple(b *testing.B) { > f := PopCount > name := "PopCountNearlySimple" > b.Run(name, func(b *testing.B) { > sum := 0 // Avoid dead code elimination. > for i := 0; i < b.N; i++ { > sum += f(0x1234567890abcdef) > } > }) > } > > The simple and almost-simple results are nearly identical (happily, the > slow results are not), but the nearly-simple results are an order of > magnitude slower: > > $ go version > go version go1.16.4 windows/amd64 > $ go test -cpu=1 -bench=. > goos: windows > goarch: amd64 > pkg: gopl.io/popcount > cpu: Intel(R) Core(TM) i5 CPU 760 @ 2.80GHz > BenchmarkPopCountAlmostSimple/BenchmarkPopCountAlmostSimple > 1000000000 0.6102 ns/op > BenchmarkPopCountNearlySimple/PopCountNearlySimple > 194925662 6.197 ns/op > BenchmarkPopCountSimple > 1000000000 0.5994 ns/op > BenchmarkPopCountSlowSimple > 1953 606194 ns/op > PASS > ok gopl.io/popcount 4.534s > > After reading this article: > > > https://medium.com/a-journey-with-go/go-inlining-strategy-limitation-6b6d7fc3b1be > > I ran: > > $ go test -cpu=1 -bench=. -gcflags=-m 2>&1 | egrep 'inlining call to > PopCount' > .\popcount_almost_simple_test.go:9:19: inlining call to PopCount > .\popcount_simple_test.go:8:18: inlining call to PopCount > .\popcount_slow_simple_test.go:10:19: inlining call to PopCount > > That makes sense. 0.6 ns is less than a function call. The simple and > almost-simple benchmarks can inline the call to the hardwired function, but > the nearly-simple benchmark can't. > > In practice, this isn't much of a problem. Any function small enough to be > inlined is unlikely to be a performance bottleneck. If it ever is, a > non-table-driven benchmark can still measure it. > > Hope this helps. --PSRC > > P.S.: Out of curiosity, how can I post a message with fancy code examples > like this one? > https://groups.google.com/g/golang-nuts/c/5DgtH2Alt_I/m/hlsqdRSGAgAJ > > > -- > You received this message because you are subscribed to the Google Groups > "golang-nuts" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to golang-nuts...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/golang-nuts/CAG%3D3cjnebauo-VDiX%2BQjPVnnSji1DA0DtxsTKkOcTW9eVfTN8g%40mail.gmail.com > > <https://groups.google.com/d/msgid/golang-nuts/CAG%3D3cjnebauo-VDiX%2BQjPVnnSji1DA0DtxsTKkOcTW9eVfTN8g%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > > > *Attachments:* > > - popcount_test.go > > > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/ce3ef331-4156-4b29-97f8-c73456a7b6f7n%40googlegroups.com.
package popcount import "testing" var sum = 0 // Avoid dead code elimination const firstInput uint64 = 0x1234567890abcdef func benchmarkPopCount(b *testing.B, f func(x uint64) int) { for i, input := 0, firstInput; i < b.N; i++ { sum += PopCount(input) input++ } if sum == 0 { b.Error("BenchmarkPopCountSimple: sum == 0") } } func BenchmarkPopCountSimple(b *testing.B) { f := PopCount benchmarkPopCount(b, f) } func BenchmarkPopCountSlowSimple(b *testing.B) { f := PopCount var bb = *b bb.N = 1_000_000 for i := 0; i < b.N; i++ { // Exactly as before, but call the function many times. benchmarkPopCount(&bb, f) } } func BenchmarkPopCountAlmostSimple(b *testing.B) { f := PopCount b.Run("BenchmarkPopCountAlmostSimple", func(b *testing.B) { benchmarkPopCount(b, f) }) } func BenchmarkPopCountNearlySimple(b *testing.B) { f := PopCount name := "PopCountNearlySimple" b.Run(name, func(b *testing.B) { benchmarkPopCount(b, f) }) }