Harry,

To be meaningful, benchmarks must be reproducible.

You use some file with 10 million lines and 2.4 gigabytes.

grepin.go (https://play.golang.org/p/YGRAJ6dEVn) is a program that 
generates known input data.

Your programs had variations other than Bytes() and Text().

grepbytes.go (https://play.golang.org/p/RpKBJjHOeJ) and greptext.go 
(https://play.golang.org/p/XzW6IBueyy) are identical except for necessary 
Bytes() and Text() differences. Text() converts Bytes() to string.

pprof uses 100 Hz samples to estimate population values. Three and two 
second runtimes produce small samples. The grepin.go program adjusts the 
file size by a multiplier to produce a larger number of samples on my 
machine.

The pprof estimates for syscall.Syscall are 103.40ms for Bytes and 106.20ms 
for Text, they are equal within measurement error. 

You said: "First version runs in 3 seconds, second in 2 seconds. For the 
slower version using `.Text()` 90+ percent of the time was in 
`syscall.Syscall` (under syscall.Read on the file) that I wasn’t expecting 
anything more than 10% change. However, the second version shows a much 
bigger reduction in `syscall.Syscall`. How is that possible? I expected it 
to be faster, but not affect the time spend reading file." You didn't 
provide pprof cpu top output. Your attached profile SVG files are 
unreadable.

So I'm puzzled that you are puzzled.

$ go version
go version devel +24dc8c6 Tue Mar 21 16:03:12 2017 +0000 linux/amd64

$ go run grepin.go
40000000 9600000000

$ go build grepbytes.go && time ./grepbytes XxX /tmp/grepin.txt
2017/03/21 13:09:38 profile: cpu profiling enabled, 
/tmp/profile656604512/cpu.pprof
40000000 9600000000
2017/03/21 13:10:16 profile: cpu profiling disabled, 
/tmp/profile656604512/cpu.pprof
real    0m38.567s
user    0m25.016s
sys        0m10.744s

$ go tool pprof ./grepbytes /tmp/profile656604512/cpu.pprof
File: grepbytes
Type: cpu
Time: Mar 21, 2017 at 1:09pm (EDT)
Duration: 38.55s, Total samples = 344.60ms ( 0.89%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top20
Showing nodes accounting for 325.40ms, 94.43% of 344.60ms total
Dropped 36 nodes (cum <= 1.72ms)
Showing top 20 nodes out of 30
      flat  flat%   sum%        cum   cum%
  103.40ms 30.01% 30.01%   109.20ms 31.69%  syscall.Syscall 
/home/peter/go/src/syscall/asm_linux_amd64.s
   42.70ms 12.39% 42.40%    42.70ms 12.39%  runtime.indexbytebody 
/home/peter/go/src/runtime/asm_amd64.s
   27.20ms  7.89% 50.29%    27.20ms  7.89%  sync.(*Mutex).Lock 
/home/peter/go/src/sync/mutex.go
      25ms  7.25% 57.54%       25ms  7.25%  sync.(*Mutex).Unlock 
/home/peter/go/src/sync/mutex.go
   18.90ms  5.48% 63.03%   175.60ms 50.96%  bufio.(*Scanner).Scan 
/home/peter/go/src/bufio/scan.go
   12.70ms  3.69% 66.72%    68.80ms 19.97%  regexp.(*machine).backtrack 
/home/peter/go/src/regexp/backtrack.go
   12.70ms  3.69% 70.40%    12.70ms  3.69%  runtime.memclrNoHeapPointers 
/home/peter/go/src/runtime/memclr_amd64.s
   10.50ms  3.05% 73.45%    33.50ms  9.72%  bufio.ScanLines 
/home/peter/go/src/bufio/scan.go
   10.20ms  2.96% 76.41%   145.70ms 42.28%  regexp.(*Regexp).doExecute 
/home/peter/go/src/regexp/exec.go
       9ms  2.61% 79.02%   154.70ms 44.89%  regexp.(*Regexp).doMatch 
/home/peter/go/src/regexp/exec.go
    8.50ms  2.47% 81.49%    31.20ms  9.05%  bytes.Index 
/home/peter/go/src/bytes/bytes_amd64.go
    7.70ms  2.23% 83.72%    32.90ms  9.55%  regexp.(*Regexp).put 
/home/peter/go/src/regexp/regexp.go
    6.80ms  1.97% 85.69%    33.80ms  9.81%  regexp.(*Regexp).get 
/home/peter/go/src/regexp/regexp.go
    6.70ms  1.94% 87.64%   161.40ms 46.84%  regexp.(*Regexp).Match 
/home/peter/go/src/regexp/regexp.go
    6.10ms  1.77% 89.41%    18.80ms  5.46%  regexp.(*bitState).reset 
/home/peter/go/src/regexp/backtrack.go
    5.60ms  1.63% 91.03%    36.80ms 10.68%  regexp.(*inputBytes).index 
/home/peter/go/src/regexp/regexp.go
    5.50ms  1.60% 92.63%   342.50ms 99.39%  main.main 
/home/peter/gopath/src/nuts/prof/grepbytes.go
       3ms  0.87% 93.50%        3ms  0.87%  bytes.IndexByte 
/home/peter/go/src/runtime/asm_amd64.s
    1.70ms  0.49% 93.99%     2.40ms   0.7%  runtime.reentersyscall 
/home/peter/go/src/runtime/proc.go
    1.50ms  0.44% 94.43%   121.70ms 35.32%  os.(*File).read 
/home/peter/go/src/os/file_unix.go
(pprof) quit

$ go build greptext.go && time ./greptext XxX /tmp/grepin.txt
2017/03/21 13:15:26 profile: cpu profiling enabled, 
/tmp/profile069788137/cpu.pprof
40000000 9600000000
2017/03/21 13:16:19 profile: cpu profiling disabled, 
/tmp/profile069788137/cpu.pprof
real    0m53.217s
user    0m48.280s
sys        0m11.976s

$ go tool pprof ./greptext /tmp/profile069788137/cpu.pprof
File: greptext
Type: cpu
Time: Mar 21, 2017 at 1:15pm (EDT)
Duration: 53.19s, Total samples = 584ms ( 1.10%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top20
Showing nodes accounting for 433.10ms, 74.16% of 584ms total
Dropped 149 nodes (cum <= 2.92ms)
Showing top 20 nodes out of 80
      flat  flat%   sum%        cum   cum%
  106.20ms 18.18% 18.18%   115.40ms 19.76%  syscall.Syscall 
/home/peter/go/src/syscall/asm_linux_amd64.s
   47.70ms  8.17% 26.35%    47.70ms  8.17%  runtime.indexbytebody 
/home/peter/go/src/runtime/asm_amd64.s
      31ms  5.31% 31.66%       31ms  5.31%  runtime.memclrNoHeapPointers 
/home/peter/go/src/runtime/memclr_amd64.s
   30.90ms  5.29% 36.95%    30.90ms  5.29%  sync.(*Mutex).Lock 
/home/peter/go/src/sync/mutex.go
   29.90ms  5.12% 42.07%    29.90ms  5.12%  sync.(*Mutex).Unlock 
/home/peter/go/src/sync/mutex.go
   24.90ms  4.26% 46.34%   111.80ms 19.14%  runtime.mallocgc 
/home/peter/go/src/runtime/malloc.go
   18.50ms  3.17% 49.50%   191.30ms 32.76%  bufio.(*Scanner).Scan 
/home/peter/go/src/bufio/scan.go
   18.50ms  3.17% 52.67%    18.50ms  3.17%  runtime.memmove 
/home/peter/go/src/runtime/memmove_amd64.s
   16.10ms  2.76% 55.43%    19.90ms  3.41%  runtime.scanobject 
/home/peter/go/src/runtime/mgcmark.go
   13.60ms  2.33% 57.76%    13.60ms  2.33%  runtime.nextFreeFast 
/home/peter/go/src/runtime/malloc.go
   12.70ms  2.17% 59.93%    72.30ms 12.38%  regexp.(*machine).backtrack 
/home/peter/go/src/regexp/backtrack.go
   12.60ms  2.16% 62.09%   179.10ms 30.67%  regexp.(*Regexp).doMatch 
/home/peter/go/src/regexp/exec.go
   11.60ms  1.99% 64.08%   166.50ms 28.51%  regexp.(*Regexp).doExecute 
/home/peter/go/src/regexp/exec.go
   11.30ms  1.93% 66.01%       37ms  6.34%  bufio.ScanLines 
/home/peter/go/src/bufio/scan.go
    8.30ms  1.42% 67.43%    40.10ms  6.87%  regexp.(*Regexp).put 
/home/peter/go/src/regexp/regexp.go
    8.30ms  1.42% 68.85%    33.20ms  5.68%  strings.Index 
/home/peter/go/src/strings/strings_amd64.go
    7.90ms  1.35% 70.21%    18.60ms  3.18%  regexp.(*bitState).reset 
/home/peter/go/src/regexp/backtrack.go
    7.90ms  1.35% 71.56%   150.90ms 25.84%  runtime.slicebytetostring 
/home/peter/go/src/runtime/string.go
    7.60ms  1.30% 72.86%     7.60ms  1.30%  runtime.mallocgc 
/home/peter/go/src/runtime/mbitmap.go
    7.60ms  1.30% 74.16%     8.90ms  1.52%  runtime.scanblock 
/home/peter/go/src/runtime/mgcmark.go
(pprof) quit

Peter

On Monday, March 20, 2017 at 7:51:50 PM UTC-4, Harry B wrote:
>
> Hi,
>
> As part of preparing an internal go talk to explain using strings vs bytes 
> of the bufio.Scanner, I created two small samples 
> https://play.golang.org/p/-TYycdHaPC and 
> https://play.golang.org/p/L7W-jiaHdL , the only difference in the hot 
> path is reducing the conversions from `[]byte` to `string` (later also 
> avoids fmt module’s string conversions). Input being a 10 million line 
> file. Match is only one line.
>
> For the moment, ignore the fact that I am not checking for number of 
> arguments, is doing unnecessary optimizations etc. I am trying to take a 
> simple case to explain.
>
> First version runs in 3 seconds, second in 2 seconds. For the slower 
> version using `.Text()` 90+ percent of the time was in `syscall.Syscall` 
> (under syscall.Read on the file) that I wasn’t expecting anything more than 
> 10% change. However, the second version shows a much bigger reduction in 
> `syscall.Syscall`. How is that possible? I expected it to be faster, but 
> not affect the time spend reading file.The exclusion of fmt was unnecessary 
> since the number of matches will be very small. But that is probably not 
> relevant here. I have run it multiple times and made sure file is cached.
>
> Attached are the profile SVGs. You can also see that an entire branch of 
> the execution has disappeared in the faster/second version.
>
> I do understand the second has avoided the allocation of string via 
> scanner.Text(), however, the output of  go build -gcflags "-m"  give me the 
> following
>
> // First version
>
> > go build -gcflags "-m" grep_simple_re.go
> # command-line-arguments
> ./grep_simple_re.go:17: inlining call to bufio.NewScanner
> ./grep_simple_re.go:19: inlining call to (*bufio.Scanner).Text
> ./grep_simple_re.go:17: fp escapes to heap
> ./grep_simple_re.go:19: string(bufio.s·2.token) escapes to heap
> *./grep_simple_re.go:21: line escapes to heap*
> ./grep_simple_re.go:17: main &bufio.Scanner literal does not escape
> ./grep_simple_re.go:21: main ... argument does not escape
> <autogenerated>:1: leaking param: io.p
> <autogenerated>:1: leaking param: .this
>
> // Second version
> > go build -gcflags "-m" grep_simple_re_bytes.go
> # command-line-arguments
> ./grep_simple_re_bytes.go:18: inlining call to bufio.NewScanner
> ./grep_simple_re_bytes.go:20: inlining call to (*bufio.Scanner).Bytes
> ./grep_simple_re_bytes.go:16: fn escapes to heap
> ./grep_simple_re_bytes.go:16: err escapes to heap
> ./grep_simple_re_bytes.go:18: fp escapes to heap
> ./grep_simple_re_bytes.go:16: main ... argument does not escape
> ./grep_simple_re_bytes.go:18: main &bufio.Scanner literal does not escape
> ./grep_simple_re_bytes.go:24: main ([]byte)("\n") does not escape
> <autogenerated>:1: leaking param: io.p
> <autogenerated>:1: leaking param: .this
>
>
> *I can see "./grep_simple_re.go:21: line escapes to heap" could be a 
> problem in the first version.*
>
> Here is the memory profiles of a run on 10million line 2.4 G file
>
> $ go tool pprof ./grep_simple_re_mprofile 
> /var/folders/bt/1mh2p2vx41lbnq3fz1n8qlxw0000gn/T/profile066323621/mem.pprof
> Entering interactive mode (type "help" for commands)
> (pprof) top10
> 86.37kB of 86.37kB total (  100%)
> Dropped 4 nodes (cum <= 0.43kB)
> Showing top 10 nodes out of 23 (cum >= 43.89kB)
>       flat  flat%   sum%        cum   cum%
>    39.73kB 46.00% 46.00%    39.73kB 46.00%  regexp.(*bitState).reset
>    16.30kB 18.87% 64.87%    16.30kB 18.87%  bufio.(*Scanner).Scan
>    12.62kB 14.61% 79.48%    12.62kB 14.61%  runtime.malg
>     9.04kB 10.47% 89.95%    17.45kB 20.21%  runtime.allocm
>     4.52kB  5.23% 95.19%     4.52kB  5.23%  runtime.rawstringtmp
>     4.16kB  4.81%   100%     4.16kB  4.81%  regexp.progMachine
>          0     0%   100%    64.71kB 74.92%  main.main
>          0     0%   100%    43.89kB 50.82%  regexp.(*Regexp).MatchString
>          0     0%   100%    43.89kB 50.82%  regexp.(*Regexp).doExecute
>          0     0%   100%    43.89kB 50.82%  regexp.(*Regexp).doMatch
> (pprof) quit
> $ go tool pprof* -alloc_space* ./grep_simple_re_mprofile 
> /var/folders/bt/1mh2p2vx41lbnq3fz1n8qlxw0000gn/T/profile066323621/mem.pprof
> Entering interactive mode (type "help" for commands)
> (pprof) top10
> 2.14GB of 2.14GB total (  100%)
> Dropped 22 nodes (cum <= 0.01GB)
>       flat  flat%   sum%        cum   cum%
>     2.14GB   100%   100%     2.14GB   100%  runtime.rawstringtmp
>          0     0%   100%     2.14GB   100%  main.main
>          0     0%   100%     2.14GB   100%  runtime.goexit
>          0     0%   100%     2.14GB   100%  runtime.main
>          0     0%   100%     2.14GB   100%  runtime.slicebytetostring
>
>
> *The only explanation for such a large change is if it grew the stack 2.4G 
> in size for the first case. The SVG attached seems to show something like 
> that. Instead if it was heap allocation, why didn't it show up in the first 
> pprof (memory profile) output above? Does 'escape to heap' not imply it 
> will be allocated in heap?Any call to re.Match or fmt.Println() escapes the 
> string?*
>
> #### System details
>
> ```
> go version go1.8 darwin/amd64
> GOARCH="amd64"
> GOBIN=""
> GOEXE=""
> GOHOSTARCH="amd64"
> GOHOSTOS="darwin"
> GOOS="darwin"
> GOPATH="/Users/harry/code"
> GORACE=""
> GOROOT="/usr/local/go"
> GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
> GCCGO="gccgo"
> CC="clang"
> GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments 
> -fmessage-length=0 
> -fdebug-prefix-map=/var/folders/bt/1mh2p2vx41lbnq3fz1n8qlxw0000gn/T/go-build737735845=/tmp/go-build
>  
> -gno-record-gcc-switches -fno-common"
> CXX="clang++"
> CGO_ENABLED="1"
> PKG_CONFIG="pkg-config"
> CGO_CFLAGS="-g -O2"
> CGO_CPPFLAGS=""
> CGO_CXXFLAGS="-g -O2"
> CGO_FFLAGS="-g -O2"
> CGO_LDFLAGS="-L/usr/local/lib"
> GOROOT/bin/go version: go version go1.8 darwin/amd64
> GOROOT/bin/go tool compile -V: compile version go1.8 X:framepointer
> uname -v: Darwin Kernel Version 15.6.0: Mon Jan  9 23:07:29 PST 2017; 
> root:xnu-3248.60.11.2.1~1/RELEASE_X86_64
> ProductName:    Mac OS X
> ProductVersion:    10.11.6
> BuildVersion:    15G1217
> lldb --version: lldb-360.1.70
> ```
>
> Thanks
> --
> Harry
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to