Re: [go-nuts] Re: Table-driven benchmarks defeat inlining

peterGo Mon, 07 Jun 2021 08:47:33 -0700

Ian,

I don't know whether it is feasible. It is unnecessary.


A benchmark should be treated as a scientific experiment. If we do that 
with Paul's benchmarks and write them in scientific form, then we get the 
expected results.

$ benchstat xpt.txt
name                                                  time/op
PopCountSimple-4                                      5.71ns ± 0%
PopCountSlowSimple-4                                  5.70ms ± 0%
PopCountAlmostSimple/BenchmarkPopCountAlmostSimple-4  5.70ns ± 0%
PopCountNearlySimple/PopCountNearlySimple-4           5.69ns ± 0%
$

Peter


On Monday, June 7, 2021 at 6:14:05 AM UTC-4 Ian Davis wrote:

> Would it be feasible for the Go tool to disable inlining and deadcode 
> elimination of code within the bodies of Benchmarks and Tests? Not the code 
> under test of course. It could be as crude as disabling these optimizations 
> for files in _test.go files.
>
>
>
> On Sun, 6 Jun 2021, at 1:33 PM, Paul S. R. Chisholm wrote:
>
> Thanks! I'd seen the "dead code elimination" comment somewhere and 
> questioned it, but not enough.
>
> If I worry about what some future Go compiler might optimize, I end up 
> worrying quite a lot! For example, could this code:
>
> func BenchmarkPopCountAlive(b *testing.B) {
>     sum = 0
>     for i := 0; i < b.N; i++ {
>         sum += PopCount(0x1234567890abcdef)
>     }
> }
>
> hypothetically be optimized to:
>
> func BenchmarkPopCountAlive(b *testing.B) {
>     sum = PopCount(0x1234567890abcdef) * b.N
> }
>
> since PopCount() always returns the same value for the same argument? It 
> probably wouldn't, since that would break many existing benchmarks.
>
> Maybe more reasonably worrisome: If sum is initialized and incremented but 
> never read, could all references to it be optimized away? That's easy 
> enough to avoid, as is the optimization I worried about above:
>
> var sum = 0 // Avoid dead code elimination
> const firstInput uint64 = 0x1234567890abcdef
>
> func BenchmarkPopCountSimple(b *testing.B) {
>     for i, input := 0, firstInput; i < b.N; i++ {
>         sum += PopCount(input)
>         input++
>     }
>     if sum == 0 {
>         b.Error("BenchmarkPopCountSimple: sum == 0")
>     }
> }
>
> Here are my new benchmark results:
>
> $ go version
> go version go1.16.5 windows/amd64
> $ go test -cpu=1 -bench=.
> goos: windows
> goarch: amd64
> pkg: gopl.io/popcount
> cpu: Intel(R) Core(TM) i5 CPU         760  @ 2.80GHz
> BenchmarkPopCountSimple         271004790                4.419 ns/op
> BenchmarkPopCountSlowSimple          278           4235890 ns/op
> BenchmarkPopCountAlmostSimple/BenchmarkPopCountAlmostSimple             
> 272759134                4.434 ns/op
> BenchmarkPopCountNearlySimple/PopCountNearlySimple                     
>  145613077                8.172 ns/op
> PASS
> ok      gopl.io/popcount        7.061s
>
> The anomalistically-low "simple" and "almost simple" results are now more 
> reasonable, but still nearly a factor of two lower than the "nearly simple" 
> results. They're still inlining the calls, as is the "slow simple" 
> benchmark, where the "nearly simple" one isn't:
>
> $ go test -cpu=1 -bench=. -gcflags=-m 2>&1 | egrep 'inlining call to 
> PopCount'
> .\popcount_test.go:10:18: inlining call to PopCount
> .\popcount_test.go:22:19: inlining call to PopCount
> .\popcount_test.go:34:19: inlining call to PopCount
>
> Overkill, right? --PSRC
>
> P.S.: For better or worse, I discovered I can -- but shouldn't, based on 
> the feedback I got! -- get "fancy" formatting by copying from vscode and 
> pasting into Gmail.
>
>
>
> On Tuesday, June 1, 2021 at 12:01:51 PM UTC-4 peterGo wrote:
>
> Here is a more definitive reply than my original reply.
>
>
> I got as far as this
> func BenchmarkPopCountSimple(b *testing.B) {
>     sum := 0 // Avoid dead code elimination.
>     for i := 0; i < b.N; i++ {
>         sum += PopCount(0x1234567890abcdef)
>     }
> }
>
> As you can see from the objdump, BenchmarkPopCountSimple dead code is 
> eliminated.
>
> func BenchmarkPopCountSimple(b *testing.B) {
>   0x4e38c0        31c9            XORL CX, CX        
>
>     for i := 0; i < b.N; i++ {
>   0x4e38c2        eb03            JMP 0x4e38c7        
>   0x4e38c4        48ffc1            INCQ CX            
>   0x4e38c7        48398890010000        CMPQ CX, 0x190(AX)    
>   0x4e38ce        7ff4            JG 0x4e38c4        
> }
>   0x4e38d0        c3            RET            
> I added an additional BenchmarkPopCountAlive benchmark.
>
> var sum = 0 // Avoid dead code elimination.
>
> func BenchmarkPopCountAlive(b *testing.B) {
>     sum = 0
>
>     for i := 0; i < b.N; i++ {
>         sum += PopCount(0x1234567890abcdef)
>     }
> }
>
> As you can see from the objdump, BenchmarkPopCountAlive code is not 
> eliminated.
>
> For details, see the popcount.txt attachment.
>
> Peter
>
>
> On Monday, May 31, 2021 at 6:29:27 PM UTC-4 Paul S. R. Chisholm wrote:
>
> This is not a serious problem, but it surprised me. (By the way, how can I 
> post a message with code formatting?)
>
> I'd like to create table-driven benchmarks:
>     https://blog.golang.org/subtests.
>
> To that end, I started with code from *The Go Programming Language*:
>
> // PopCount is based on an example from chapter 2 of THE GO PROGRAMMING 
> LANGUAGE.
> // Copyright © 2016 Alan A. A. Donovan & Brian W. Kernighan.
> // License: https://creativecommons.org/licenses/by-nc-sa/4.0/
>
> package popcount
>
> // pc[i] is the population count of i.
> var pc [256]byte
>
> func init() {
> for i := range pc {
> pc[i] = pc[i/2] + byte(i&1)
> }
> }
>
> // PopCount returns the population count (number of set bits) of x.
> func PopCount(x uint64) int {
> return int(pc[byte(x>>(0*8))] +
> pc[byte(x>>(1*8))] +
> pc[byte(x>>(2*8))] +
> pc[byte(x>>(3*8))] +
> pc[byte(x>>(4*8))] +
> pc[byte(x>>(5*8))] +
> pc[byte(x>>(6*8))] +
> pc[byte(x>>(7*8))])
> }
>
> I then wrote a simple test:
>
> // popcount_simple_test.go
> package popcount
>
> import "testing"
>
> func BenchmarkPopCountSimple(b *testing.B) {
> sum := 0 // Avoid dead code elimination.
> for i := 0; i < b.N; i++ {
> sum += PopCount(0x1234567890abcdef)
> }
> }
>
> Because I was paranoid about dead code elimination, I wrote an identical 
> test that calls the function many times (inside the loop to call it b.N 
> times, which should make no difference if the hypothetically-still-dead 
> code is being eliminated):
>
> // popcount_slow_simple_test.go
> package popcount
>
> import "testing"
>
> func BenchmarkPopCountSlowSimple(b *testing.B) {
> sum := 0 // Avoid dead code elimination.
> for i := 0; i < b.N; i++ {
> // Exactly as before, but call the function many times.
> for j:= 0; j < 1_000_000; j++ {
> sum += PopCount(0x1234567890abcdef)
> }
> }
> }
>
> Then I wrote an "almost simple" test that uses testing.B.Run() but is 
> hardwired to call the function being benchmarked:
>
> // popcount_almost_simple_test.go
> package popcount
>
> import "testing"
>
> func BenchmarkPopCountAlmostSimple(b *testing.B) {
> b.Run("BenchmarkPopCountAlmostSimple", func(b *testing.B) {
> sum := 0 // Avoid dead code elimination.
> for i := 0; i < b.N; i++ {
> sum += PopCount(0x1234567890abcdef)
> }
> })
> }
>
> Finally, I wrote a "nearly-simple" test that passes arguments to Run() 
> that could have come from a table:
>
> // popcount_nearly_simple.go
> package popcount
>
> import "testing"
>
> func BenchmarkPopCountNearlySimple(b *testing.B) {
> f := PopCount
> name := "PopCountNearlySimple"
> b.Run(name, func(b *testing.B) {
> sum := 0 // Avoid dead code elimination.
> for i := 0; i < b.N; i++ {
> sum += f(0x1234567890abcdef)
> }
> })
> }
>
> The simple and almost-simple results are nearly identical (happily, the 
> slow results are not), but the nearly-simple results are an order of 
> magnitude slower:
>
> $ go version
> go version go1.16.4 windows/amd64
> $ go test -cpu=1 -bench=.
> goos: windows
> goarch: amd64
> pkg: gopl.io/popcount
> cpu: Intel(R) Core(TM) i5 CPU         760  @ 2.80GHz
> BenchmarkPopCountAlmostSimple/BenchmarkPopCountAlmostSimple            
>  1000000000               0.6102 ns/op
> BenchmarkPopCountNearlySimple/PopCountNearlySimple                      
> 194925662                6.197 ns/op
> BenchmarkPopCountSimple                                                
>  1000000000               0.5994 ns/op
> BenchmarkPopCountSlowSimple                                                
>  1953            606194 ns/op
> PASS
> ok      gopl.io/popcount        4.534s
>
> After reading this article:
>
>
> https://medium.com/a-journey-with-go/go-inlining-strategy-limitation-6b6d7fc3b1be
>
> I ran:
>
> $ go test -cpu=1 -bench=. -gcflags=-m 2>&1 | egrep 'inlining call to 
> PopCount'
> .\popcount_almost_simple_test.go:9:19: inlining call to PopCount
> .\popcount_simple_test.go:8:18: inlining call to PopCount
> .\popcount_slow_simple_test.go:10:19: inlining call to PopCount
>
> That makes sense. 0.6 ns is less than a function call. The simple and 
> almost-simple benchmarks can inline the call to the hardwired function, but 
> the nearly-simple benchmark can't.
>
> In practice, this isn't much of a problem. Any function small enough to be 
> inlined is unlikely to be a performance bottleneck. If it ever is, a 
> non-table-driven benchmark can still measure it.
>
> Hope this helps. --PSRC
>
> P.S.: Out of curiosity, how can I post a message with fancy code examples 
> like this one?
>     https://groups.google.com/g/golang-nuts/c/5DgtH2Alt_I/m/hlsqdRSGAgAJ
>
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to golang-nuts...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/golang-nuts/CAG%3D3cjnebauo-VDiX%2BQjPVnnSji1DA0DtxsTKkOcTW9eVfTN8g%40mail.gmail.com
>  
> <https://groups.google.com/d/msgid/golang-nuts/CAG%3D3cjnebauo-VDiX%2BQjPVnnSji1DA0DtxsTKkOcTW9eVfTN8g%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
>
> *Attachments:*
>
>    - popcount_test.go
>    
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/ce3ef331-4156-4b29-97f8-c73456a7b6f7n%40googlegroups.com.

package popcount

import "testing"

var sum = 0 // Avoid dead code elimination
const firstInput uint64 = 0x1234567890abcdef

func benchmarkPopCount(b *testing.B, f func(x uint64) int) {
	for i, input := 0, firstInput; i < b.N; i++ {
		sum += PopCount(input)
		input++
	}
	if sum == 0 {
		b.Error("BenchmarkPopCountSimple: sum == 0")
	}
}

func BenchmarkPopCountSimple(b *testing.B) {
	f := PopCount
	benchmarkPopCount(b, f)
}

func BenchmarkPopCountSlowSimple(b *testing.B) {
	f := PopCount
	var bb = *b
	bb.N = 1_000_000
	for i := 0; i < b.N; i++ {
		// Exactly as before, but call the function many times.
		benchmarkPopCount(&bb, f)
	}
}

func BenchmarkPopCountAlmostSimple(b *testing.B) {
	f := PopCount
	b.Run("BenchmarkPopCountAlmostSimple", func(b *testing.B) {
		benchmarkPopCount(b, f)
	})
}

func BenchmarkPopCountNearlySimple(b *testing.B) {
	f := PopCount
	name := "PopCountNearlySimple"
	b.Run(name, func(b *testing.B) {
		benchmarkPopCount(b, f)
	})
}

Re: [go-nuts] Re: Table-driven benchmarks defeat inlining

Reply via email to