Hi,

As part of preparing an internal go talk to explain using strings vs bytes 
of the bufio.Scanner, I created two small samples 
https://play.golang.org/p/-TYycdHaPC and 
https://play.golang.org/p/L7W-jiaHdL , the only difference in the hot path 
is reducing the conversions from `[]byte` to `string` (later also avoids 
fmt module’s string conversions). Input being a 10 million line file. Match 
is only one line.

For the moment, ignore the fact that I am not checking for number of 
arguments, is doing unnecessary optimizations etc. I am trying to take a 
simple case to explain.

First version runs in 3 seconds, second in 2 seconds. For the slower 
version using `.Text()` 90+ percent of the time was in `syscall.Syscall` 
(under syscall.Read on the file) that I wasn’t expecting anything more than 
10% change. However, the second version shows a much bigger reduction in 
`syscall.Syscall`. How is that possible? I expected it to be faster, but 
not affect the time spend reading file.The exclusion of fmt was unnecessary 
since the number of matches will be very small. But that is probably not 
relevant here. I have run it multiple times and made sure file is cached.

Attached are the profile SVGs. You can also see that an entire branch of 
the execution has disappeared in the faster/second version.

I do understand the second has avoided the allocation of string via 
scanner.Text(), however, the output of  go build -gcflags "-m"  give me the 
following

// First version

> go build -gcflags "-m" grep_simple_re.go
# command-line-arguments
./grep_simple_re.go:17: inlining call to bufio.NewScanner
./grep_simple_re.go:19: inlining call to (*bufio.Scanner).Text
./grep_simple_re.go:17: fp escapes to heap
./grep_simple_re.go:19: string(bufio.s·2.token) escapes to heap
*./grep_simple_re.go:21: line escapes to heap*
./grep_simple_re.go:17: main &bufio.Scanner literal does not escape
./grep_simple_re.go:21: main ... argument does not escape
<autogenerated>:1: leaking param: io.p
<autogenerated>:1: leaking param: .this

// Second version
> go build -gcflags "-m" grep_simple_re_bytes.go
# command-line-arguments
./grep_simple_re_bytes.go:18: inlining call to bufio.NewScanner
./grep_simple_re_bytes.go:20: inlining call to (*bufio.Scanner).Bytes
./grep_simple_re_bytes.go:16: fn escapes to heap
./grep_simple_re_bytes.go:16: err escapes to heap
./grep_simple_re_bytes.go:18: fp escapes to heap
./grep_simple_re_bytes.go:16: main ... argument does not escape
./grep_simple_re_bytes.go:18: main &bufio.Scanner literal does not escape
./grep_simple_re_bytes.go:24: main ([]byte)("\n") does not escape
<autogenerated>:1: leaking param: io.p
<autogenerated>:1: leaking param: .this


*I can see "./grep_simple_re.go:21: line escapes to heap" could be a 
problem in the first version.*

Here is the memory profiles of a run on 10million line 2.4 G file

$ go tool pprof ./grep_simple_re_mprofile 
/var/folders/bt/1mh2p2vx41lbnq3fz1n8qlxw0000gn/T/profile066323621/mem.pprof
Entering interactive mode (type "help" for commands)
(pprof) top10
86.37kB of 86.37kB total (  100%)
Dropped 4 nodes (cum <= 0.43kB)
Showing top 10 nodes out of 23 (cum >= 43.89kB)
      flat  flat%   sum%        cum   cum%
   39.73kB 46.00% 46.00%    39.73kB 46.00%  regexp.(*bitState).reset
   16.30kB 18.87% 64.87%    16.30kB 18.87%  bufio.(*Scanner).Scan
   12.62kB 14.61% 79.48%    12.62kB 14.61%  runtime.malg
    9.04kB 10.47% 89.95%    17.45kB 20.21%  runtime.allocm
    4.52kB  5.23% 95.19%     4.52kB  5.23%  runtime.rawstringtmp
    4.16kB  4.81%   100%     4.16kB  4.81%  regexp.progMachine
         0     0%   100%    64.71kB 74.92%  main.main
         0     0%   100%    43.89kB 50.82%  regexp.(*Regexp).MatchString
         0     0%   100%    43.89kB 50.82%  regexp.(*Regexp).doExecute
         0     0%   100%    43.89kB 50.82%  regexp.(*Regexp).doMatch
(pprof) quit
$ go tool pprof* -alloc_space* ./grep_simple_re_mprofile 
/var/folders/bt/1mh2p2vx41lbnq3fz1n8qlxw0000gn/T/profile066323621/mem.pprof
Entering interactive mode (type "help" for commands)
(pprof) top10
2.14GB of 2.14GB total (  100%)
Dropped 22 nodes (cum <= 0.01GB)
      flat  flat%   sum%        cum   cum%
    2.14GB   100%   100%     2.14GB   100%  runtime.rawstringtmp
         0     0%   100%     2.14GB   100%  main.main
         0     0%   100%     2.14GB   100%  runtime.goexit
         0     0%   100%     2.14GB   100%  runtime.main
         0     0%   100%     2.14GB   100%  runtime.slicebytetostring


*The only explanation for such a large change is if it grew the stack 2.4G 
in size for the first case. The SVG attached seems to show something like 
that. Instead if it was heap allocation, why didn't it show up in the first 
pprof (memory profile) output above? Does 'escape to heap' not imply it 
will be allocated in heap?Any call to re.Match or fmt.Println() escapes the 
string?*

#### System details

```
go version go1.8 darwin/amd64
GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/harry/code"
GORACE=""
GOROOT="/usr/local/go"
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments 
-fmessage-length=0 
-fdebug-prefix-map=/var/folders/bt/1mh2p2vx41lbnq3fz1n8qlxw0000gn/T/go-build737735845=/tmp/go-build
 
-gno-record-gcc-switches -fno-common"
CXX="clang++"
CGO_ENABLED="1"
PKG_CONFIG="pkg-config"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-L/usr/local/lib"
GOROOT/bin/go version: go version go1.8 darwin/amd64
GOROOT/bin/go tool compile -V: compile version go1.8 X:framepointer
uname -v: Darwin Kernel Version 15.6.0: Mon Jan  9 23:07:29 PST 2017; 
root:xnu-3248.60.11.2.1~1/RELEASE_X86_64
ProductName:    Mac OS X
ProductVersion:    10.11.6
BuildVersion:    15G1217
lldb --version: lldb-360.1.70
```

Thanks
--
Harry

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to