Sorry for the double-post, I just realized that the version I posted before had my manually-inlined version that I did as a part of testing. For completeness, here's the non-manually-inlined version, which seems have the same performance qualities (and probably exactly the same machine code, though I didn't double-check): https://go.dev/play/p/h1K38Bq7Otv
On Friday, July 22, 2022 at 7:56:54 PM UTC-6 Kevin Chowski wrote: > Datapoint: same with windows/amd64 on Intel (running 1.19beta1): > > goos: windows > goarch: amd64 > pkg: common/sandbox > cpu: Intel(R) Core(TM) i7-6650U CPU @ 2.20GHz > BenchmarkNoInline-4 77425848 14.34 ns/op > BenchmarkInline-4 59108932 20.58 ns/op > PASS > ok common/sandbox 2.645s > > Looking at the disassembly, I noticed that in the Inline case there was a > 7-byte `lea 0xXXXXXX(%rip),%rbx` in the tight inner loop due to some > really proactive constant propagation (I hypothesize). If you manually > defeat the propagation by storing the string in a global and manually > copying it into the stack, the inlined becomes faster than NoInline again: > https://go.dev/play/p/VRgJP2y7joS > > goos: windows > goarch: amd64 > pkg: common/sandbox > cpu: Intel(R) Core(TM) i7-6650U CPU @ 2.20GHz > BenchmarkNoInline-4 81436539 14.08 ns/op > BenchmarkInline-4 59255162 21.32 ns/op > BenchmarkInlineDefeatConstProp-4 97524828 12.57 ns/op > PASS > ok common/sandbox 5.111s > > On Friday, July 22, 2022 at 11:01:00 AM UTC-6 mpr...@google.com wrote: > >> I can reproduce similar behavior on linux-amd64: >> >> $ perf stat ./example.com.test -test.bench=BenchmarkInline >> -test.benchtime=100000000x >> goos: linux >> goarch: amd64 >> pkg: example.com >> cpu: Intel(R) Xeon(R) W-2135 CPU @ 3.70GHz >> BenchmarkInline-12 100000000 16.78 ns/op >> >> PASS >> >> Performance counter stats for './example.com.test >> -test.bench=BenchmarkInline -test.benchtime=100000000x': >> >> 1,691.95 msec task-clock:u # 1.004 CPUs >> utilized >> 0 context-switches:u # 0.000 /sec >> >> 0 cpu-migrations:u # 0.000 /sec >> >> 352 page-faults:u # 208.044 /sec >> >> 6,732,752,072 cycles:u # 3.979 GHz >> >> 22,405,823,428 instructions:u # 3.33 insn per >> cycle >> 6,501,294,164 branches:u # 3.842 G/sec >> >> 149,596 branch-misses:u # 0.00% of all >> branches >> >> 1.684677260 seconds time elapsed >> >> 1.692474000 seconds user >> 0.004020000 seconds sys >> >> >> >> $ perf stat ./example.com.test -test.bench=BenchmarkNoInline >> -test.benchtime=100000000x >> goos: linux >> goarch: amd64 >> pkg: example.com >> cpu: Intel(R) Xeon(R) W-2135 CPU @ 3.70GHz >> BenchmarkNoInline-12 100000000 10.79 ns/op >> PASS >> >> Performance counter stats for './example.com.test >> -test.bench=BenchmarkNoInline -test.benchtime=100000000x': >> >> 1,091.71 msec task-clock:u # 1.005 CPUs >> utilized >> 0 context-switches:u # 0.000 /sec >> >> 0 cpu-migrations:u # 0.000 /sec >> >> 363 page-faults:u # 332.505 /sec >> >> 4,490,159,750 cycles:u # 4.113 GHz >> >> 20,205,764,499 instructions:u # 4.50 insn per >> cycle >> 6,701,281,015 branches:u # 6.138 G/sec >> >> 586,073 branch-misses:u # 0.01% of all >> branches >> >> 1.086302272 seconds time elapsed >> >> 1.087710000 seconds user >> 0.008027000 seconds sys >> >> The non-inlined version is actually fewer instructions to run the same >> benchmark, which surprises me because naively looking at the disassembly it >> seems that the inlined version is much more compact. >> >> >> On Fri, Jul 22, 2022 at 5:52 AM eric...@arm.com <eric...@arm.com> wrote: >> >>> For this piece of code, two test functions are the same, but one is >>> inlined, the other is not. However the inlined version is about 25% slower >>> than the no inlined version on apple m1 chip. Why is it? >>> >>> The code is here https://go.dev/play/p/0NkLMtTZtv4 >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "golang-nuts" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to golang-nuts...@googlegroups.com. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/golang-nuts/527264d7-7cc1-4278-9a29-c04eb3ec4e86n%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/golang-nuts/527264d7-7cc1-4278-9a29-c04eb3ec4e86n%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/b8222adf-2825-44f7-8879-c72ea7747696n%40googlegroups.com.