s/greptext.go (https://play.golang.org/p/XzW6IBueyy)/greptext.go (
https://play.golang.org/p/-C9bVEJ2QP))/ 

On Tuesday, March 21, 2017 at 2:40:29 PM UTC-4, peterGo wrote:
>
> Harry,
>
> To be meaningful, benchmarks must be reproducible.
>
> You use some file with 10 million lines and 2.4 gigabytes.
>
> grepin.go (https://play.golang.org/p/YGRAJ6dEVn) is a program that 
> generates known input data.
>
> Your programs had variations other than Bytes() and Text().
>
> grepbytes.go (https://play.golang.org/p/RpKBJjHOeJ) and greptext.go (
> https://play.golang.org/p/XzW6IBueyy) are identical except for necessary 
> Bytes() and Text() differences. Text() converts Bytes() to string.
>
> pprof uses 100 Hz samples to estimate population values. Three and two 
> second runtimes produce small samples. The grepin.go program adjusts the 
> file size by a multiplier to produce a larger number of samples on my 
> machine.
>
> The pprof estimates for syscall.Syscall are 103.40ms for Bytes and 
> 106.20ms for Text, they are equal within measurement error. 
>
> You said: "First version runs in 3 seconds, second in 2 seconds. For the 
> slower version using `.Text()` 90+ percent of the time was in 
> `syscall.Syscall` (under syscall.Read on the file) that I wasn’t expecting 
> anything more than 10% change. However, the second version shows a much 
> bigger reduction in `syscall.Syscall`. How is that possible? I expected it 
> to be faster, but not affect the time spend reading file." You didn't 
> provide pprof cpu top output. Your attached profile SVG files are 
> unreadable.
>
> So I'm puzzled that you are puzzled.
>
> $ go version
> go version devel +24dc8c6 Tue Mar 21 16:03:12 2017 +0000 linux/amd64
>
> $ go run grepin.go
> 40000000 9600000000
>
> $ go build grepbytes.go && time ./grepbytes XxX /tmp/grepin.txt
> 2017/03/21 13:09:38 profile: cpu profiling enabled, 
> /tmp/profile656604512/cpu.pprof
> 40000000 9600000000
> 2017/03/21 13:10:16 profile: cpu profiling disabled, 
> /tmp/profile656604512/cpu.pprof
> real    0m38.567s
> user    0m25.016s
> sys        0m10.744s
>
> $ go tool pprof ./grepbytes /tmp/profile656604512/cpu.pprof
> File: grepbytes
> Type: cpu
> Time: Mar 21, 2017 at 1:09pm (EDT)
> Duration: 38.55s, Total samples = 344.60ms ( 0.89%)
> Entering interactive mode (type "help" for commands, "o" for options)
> (pprof) top20
> Showing nodes accounting for 325.40ms, 94.43% of 344.60ms total
> Dropped 36 nodes (cum <= 1.72ms)
> Showing top 20 nodes out of 30
>       flat  flat%   sum%        cum   cum%
>   103.40ms 30.01% 30.01%   109.20ms 31.69%  syscall.Syscall 
> /home/peter/go/src/syscall/asm_linux_amd64.s
>    42.70ms 12.39% 42.40%    42.70ms 12.39%  runtime.indexbytebody 
> /home/peter/go/src/runtime/asm_amd64.s
>    27.20ms  7.89% 50.29%    27.20ms  7.89%  sync.(*Mutex).Lock 
> /home/peter/go/src/sync/mutex.go
>       25ms  7.25% 57.54%       25ms  7.25%  sync.(*Mutex).Unlock 
> /home/peter/go/src/sync/mutex.go
>    18.90ms  5.48% 63.03%   175.60ms 50.96%  bufio.(*Scanner).Scan 
> /home/peter/go/src/bufio/scan.go
>    12.70ms  3.69% 66.72%    68.80ms 19.97%  regexp.(*machine).backtrack 
> /home/peter/go/src/regexp/backtrack.go
>    12.70ms  3.69% 70.40%    12.70ms  3.69%  runtime.memclrNoHeapPointers 
> /home/peter/go/src/runtime/memclr_amd64.s
>    10.50ms  3.05% 73.45%    33.50ms  9.72%  bufio.ScanLines 
> /home/peter/go/src/bufio/scan.go
>    10.20ms  2.96% 76.41%   145.70ms 42.28%  regexp.(*Regexp).doExecute 
> /home/peter/go/src/regexp/exec.go
>        9ms  2.61% 79.02%   154.70ms 44.89%  regexp.(*Regexp).doMatch 
> /home/peter/go/src/regexp/exec.go
>     8.50ms  2.47% 81.49%    31.20ms  9.05%  bytes.Index 
> /home/peter/go/src/bytes/bytes_amd64.go
>     7.70ms  2.23% 83.72%    32.90ms  9.55%  regexp.(*Regexp).put 
> /home/peter/go/src/regexp/regexp.go
>     6.80ms  1.97% 85.69%    33.80ms  9.81%  regexp.(*Regexp).get 
> /home/peter/go/src/regexp/regexp.go
>     6.70ms  1.94% 87.64%   161.40ms 46.84%  regexp.(*Regexp).Match 
> /home/peter/go/src/regexp/regexp.go
>     6.10ms  1.77% 89.41%    18.80ms  5.46%  regexp.(*bitState).reset 
> /home/peter/go/src/regexp/backtrack.go
>     5.60ms  1.63% 91.03%    36.80ms 10.68%  regexp.(*inputBytes).index 
> /home/peter/go/src/regexp/regexp.go
>     5.50ms  1.60% 92.63%   342.50ms 99.39%  main.main 
> /home/peter/gopath/src/nuts/prof/grepbytes.go
>        3ms  0.87% 93.50%        3ms  0.87%  bytes.IndexByte 
> /home/peter/go/src/runtime/asm_amd64.s
>     1.70ms  0.49% 93.99%     2.40ms   0.7%  runtime.reentersyscall 
> /home/peter/go/src/runtime/proc.go
>     1.50ms  0.44% 94.43%   121.70ms 35.32%  os.(*File).read 
> /home/peter/go/src/os/file_unix.go
> (pprof) quit
>
> $ go build greptext.go && time ./greptext XxX /tmp/grepin.txt
> 2017/03/21 13:15:26 profile: cpu profiling enabled, 
> /tmp/profile069788137/cpu.pprof
> 40000000 9600000000
> 2017/03/21 13:16:19 profile: cpu profiling disabled, 
> /tmp/profile069788137/cpu.pprof
> real    0m53.217s
> user    0m48.280s
> sys        0m11.976s
>
> $ go tool pprof ./greptext /tmp/profile069788137/cpu.pprof
> File: greptext
> Type: cpu
> Time: Mar 21, 2017 at 1:15pm (EDT)
> Duration: 53.19s, Total samples = 584ms ( 1.10%)
> Entering interactive mode (type "help" for commands, "o" for options)
> (pprof) top20
> Showing nodes accounting for 433.10ms, 74.16% of 584ms total
> Dropped 149 nodes (cum <= 2.92ms)
> Showing top 20 nodes out of 80
>       flat  flat%   sum%        cum   cum%
>   106.20ms 18.18% 18.18%   115.40ms 19.76%  syscall.Syscall 
> /home/peter/go/src/syscall/asm_linux_amd64.s
>    47.70ms  8.17% 26.35%    47.70ms  8.17%  runtime.indexbytebody 
> /home/peter/go/src/runtime/asm_amd64.s
>       31ms  5.31% 31.66%       31ms  5.31%  runtime.memclrNoHeapPointers 
> /home/peter/go/src/runtime/memclr_amd64.s
>    30.90ms  5.29% 36.95%    30.90ms  5.29%  sync.(*Mutex).Lock 
> /home/peter/go/src/sync/mutex.go
>    29.90ms  5.12% 42.07%    29.90ms  5.12%  sync.(*Mutex).Unlock 
> /home/peter/go/src/sync/mutex.go
>    24.90ms  4.26% 46.34%   111.80ms 19.14%  runtime.mallocgc 
> /home/peter/go/src/runtime/malloc.go
>    18.50ms  3.17% 49.50%   191.30ms 32.76%  bufio.(*Scanner).Scan 
> /home/peter/go/src/bufio/scan.go
>    18.50ms  3.17% 52.67%    18.50ms  3.17%  runtime.memmove 
> /home/peter/go/src/runtime/memmove_amd64.s
>    16.10ms  2.76% 55.43%    19.90ms  3.41%  runtime.scanobject 
> /home/peter/go/src/runtime/mgcmark.go
>    13.60ms  2.33% 57.76%    13.60ms  2.33%  runtime.nextFreeFast 
> /home/peter/go/src/runtime/malloc.go
>    12.70ms  2.17% 59.93%    72.30ms 12.38%  regexp.(*machine).backtrack 
> /home/peter/go/src/regexp/backtrack.go
>    12.60ms  2.16% 62.09%   179.10ms 30.67%  regexp.(*Regexp).doMatch 
> /home/peter/go/src/regexp/exec.go
>    11.60ms  1.99% 64.08%   166.50ms 28.51%  regexp.(*Regexp).doExecute 
> /home/peter/go/src/regexp/exec.go
>    11.30ms  1.93% 66.01%       37ms  6.34%  bufio.ScanLines 
> /home/peter/go/src/bufio/scan.go
>     8.30ms  1.42% 67.43%    40.10ms  6.87%  regexp.(*Regexp).put 
> /home/peter/go/src/regexp/regexp.go
>     8.30ms  1.42% 68.85%    33.20ms  5.68%  strings.Index 
> /home/peter/go/src/strings/strings_amd64.go
>     7.90ms  1.35% 70.21%    18.60ms  3.18%  regexp.(*bitState).reset 
> /home/peter/go/src/regexp/backtrack.go
>     7.90ms  1.35% 71.56%   150.90ms 25.84%  runtime.slicebytetostring 
> /home/peter/go/src/runtime/string.go
>     7.60ms  1.30% 72.86%     7.60ms  1.30%  runtime.mallocgc 
> /home/peter/go/src/runtime/mbitmap.go
>     7.60ms  1.30% 74.16%     8.90ms  1.52%  runtime.scanblock 
> /home/peter/go/src/runtime/mgcmark.go
> (pprof) quit
>
> Peter
>
> On Monday, March 20, 2017 at 7:51:50 PM UTC-4, Harry B wrote:
>>
>> Hi,
>>
>> As part of preparing an internal go talk to explain using strings vs 
>> bytes of the bufio.Scanner, I created two small samples 
>> https://play.golang.org/p/-TYycdHaPC and 
>> https://play.golang.org/p/L7W-jiaHdL , the only difference in the hot 
>> path is reducing the conversions from `[]byte` to `string` (later also 
>> avoids fmt module’s string conversions). Input being a 10 million line 
>> file. Match is only one line.
>>
>> For the moment, ignore the fact that I am not checking for number of 
>> arguments, is doing unnecessary optimizations etc. I am trying to take a 
>> simple case to explain.
>>
>> First version runs in 3 seconds, second in 2 seconds. For the slower 
>> version using `.Text()` 90+ percent of the time was in `syscall.Syscall` 
>> (under syscall.Read on the file) that I wasn’t expecting anything more than 
>> 10% change. However, the second version shows a much bigger reduction in 
>> `syscall.Syscall`. How is that possible? I expected it to be faster, but 
>> not affect the time spend reading file.The exclusion of fmt was unnecessary 
>> since the number of matches will be very small. But that is probably not 
>> relevant here. I have run it multiple times and made sure file is cached.
>>
>> Attached are the profile SVGs. You can also see that an entire branch of 
>> the execution has disappeared in the faster/second version.
>>
>> I do understand the second has avoided the allocation of string via 
>> scanner.Text(), however, the output of  go build -gcflags "-m"  give me the 
>> following
>>
>> // First version
>>
>> > go build -gcflags "-m" grep_simple_re.go
>> # command-line-arguments
>> ./grep_simple_re.go:17: inlining call to bufio.NewScanner
>> ./grep_simple_re.go:19: inlining call to (*bufio.Scanner).Text
>> ./grep_simple_re.go:17: fp escapes to heap
>> ./grep_simple_re.go:19: string(bufio.s·2.token) escapes to heap
>> *./grep_simple_re.go:21: line escapes to heap*
>> ./grep_simple_re.go:17: main &bufio.Scanner literal does not escape
>> ./grep_simple_re.go:21: main ... argument does not escape
>> <autogenerated>:1: leaking param: io.p
>> <autogenerated>:1: leaking param: .this
>>
>> // Second version
>> > go build -gcflags "-m" grep_simple_re_bytes.go
>> # command-line-arguments
>> ./grep_simple_re_bytes.go:18: inlining call to bufio.NewScanner
>> ./grep_simple_re_bytes.go:20: inlining call to (*bufio.Scanner).Bytes
>> ./grep_simple_re_bytes.go:16: fn escapes to heap
>> ./grep_simple_re_bytes.go:16: err escapes to heap
>> ./grep_simple_re_bytes.go:18: fp escapes to heap
>> ./grep_simple_re_bytes.go:16: main ... argument does not escape
>> ./grep_simple_re_bytes.go:18: main &bufio.Scanner literal does not escape
>> ./grep_simple_re_bytes.go:24: main ([]byte)("\n") does not escape
>> <autogenerated>:1: leaking param: io.p
>> <autogenerated>:1: leaking param: .this
>>
>>
>> *I can see "./grep_simple_re.go:21: line escapes to heap" could be a 
>> problem in the first version.*
>>
>> Here is the memory profiles of a run on 10million line 2.4 G file
>>
>> $ go tool pprof ./grep_simple_re_mprofile 
>> /var/folders/bt/1mh2p2vx41lbnq3fz1n8qlxw0000gn/T/profile066323621/mem.pprof
>> Entering interactive mode (type "help" for commands)
>> (pprof) top10
>> 86.37kB of 86.37kB total (  100%)
>> Dropped 4 nodes (cum <= 0.43kB)
>> Showing top 10 nodes out of 23 (cum >= 43.89kB)
>>       flat  flat%   sum%        cum   cum%
>>    39.73kB 46.00% 46.00%    39.73kB 46.00%  regexp.(*bitState).reset
>>    16.30kB 18.87% 64.87%    16.30kB 18.87%  bufio.(*Scanner).Scan
>>    12.62kB 14.61% 79.48%    12.62kB 14.61%  runtime.malg
>>     9.04kB 10.47% 89.95%    17.45kB 20.21%  runtime.allocm
>>     4.52kB  5.23% 95.19%     4.52kB  5.23%  runtime.rawstringtmp
>>     4.16kB  4.81%   100%     4.16kB  4.81%  regexp.progMachine
>>          0     0%   100%    64.71kB 74.92%  main.main
>>          0     0%   100%    43.89kB 50.82%  regexp.(*Regexp).MatchString
>>          0     0%   100%    43.89kB 50.82%  regexp.(*Regexp).doExecute
>>          0     0%   100%    43.89kB 50.82%  regexp.(*Regexp).doMatch
>> (pprof) quit
>> $ go tool pprof* -alloc_space* ./grep_simple_re_mprofile 
>> /var/folders/bt/1mh2p2vx41lbnq3fz1n8qlxw0000gn/T/profile066323621/mem.pprof
>> Entering interactive mode (type "help" for commands)
>> (pprof) top10
>> 2.14GB of 2.14GB total (  100%)
>> Dropped 22 nodes (cum <= 0.01GB)
>>       flat  flat%   sum%        cum   cum%
>>     2.14GB   100%   100%     2.14GB   100%  runtime.rawstringtmp
>>          0     0%   100%     2.14GB   100%  main.main
>>          0     0%   100%     2.14GB   100%  runtime.goexit
>>          0     0%   100%     2.14GB   100%  runtime.main
>>          0     0%   100%     2.14GB   100%  runtime.slicebytetostring
>>
>>
>> *The only explanation for such a large change is if it grew the stack 
>> 2.4G in size for the first case. The SVG attached seems to show something 
>> like that. Instead if it was heap allocation, why didn't it show up in the 
>> first pprof (memory profile) output above? Does 'escape to heap' not imply 
>> it will be allocated in heap?Any call to re.Match or fmt.Println() escapes 
>> the string?*
>>
>> #### System details
>>
>> ```
>> go version go1.8 darwin/amd64
>> GOARCH="amd64"
>> GOBIN=""
>> GOEXE=""
>> GOHOSTARCH="amd64"
>> GOHOSTOS="darwin"
>> GOOS="darwin"
>> GOPATH="/Users/harry/code"
>> GORACE=""
>> GOROOT="/usr/local/go"
>> GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
>> GCCGO="gccgo"
>> CC="clang"
>> GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments 
>> -fmessage-length=0 
>> -fdebug-prefix-map=/var/folders/bt/1mh2p2vx41lbnq3fz1n8qlxw0000gn/T/go-build737735845=/tmp/go-build
>>  
>> -gno-record-gcc-switches -fno-common"
>> CXX="clang++"
>> CGO_ENABLED="1"
>> PKG_CONFIG="pkg-config"
>> CGO_CFLAGS="-g -O2"
>> CGO_CPPFLAGS=""
>> CGO_CXXFLAGS="-g -O2"
>> CGO_FFLAGS="-g -O2"
>> CGO_LDFLAGS="-L/usr/local/lib"
>> GOROOT/bin/go version: go version go1.8 darwin/amd64
>> GOROOT/bin/go tool compile -V: compile version go1.8 X:framepointer
>> uname -v: Darwin Kernel Version 15.6.0: Mon Jan  9 23:07:29 PST 2017; 
>> root:xnu-3248.60.11.2.1~1/RELEASE_X86_64
>> ProductName:    Mac OS X
>> ProductVersion:    10.11.6
>> BuildVersion:    15G1217
>> lldb --version: lldb-360.1.70
>> ```
>>
>> Thanks
>> --
>> Harry
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to