s/greptext.go (https://play.golang.org/p/XzW6IBueyy)/greptext.go ( https://play.golang.org/p/-C9bVEJ2QP))/
On Tuesday, March 21, 2017 at 2:40:29 PM UTC-4, peterGo wrote: > > Harry, > > To be meaningful, benchmarks must be reproducible. > > You use some file with 10 million lines and 2.4 gigabytes. > > grepin.go (https://play.golang.org/p/YGRAJ6dEVn) is a program that > generates known input data. > > Your programs had variations other than Bytes() and Text(). > > grepbytes.go (https://play.golang.org/p/RpKBJjHOeJ) and greptext.go ( > https://play.golang.org/p/XzW6IBueyy) are identical except for necessary > Bytes() and Text() differences. Text() converts Bytes() to string. > > pprof uses 100 Hz samples to estimate population values. Three and two > second runtimes produce small samples. The grepin.go program adjusts the > file size by a multiplier to produce a larger number of samples on my > machine. > > The pprof estimates for syscall.Syscall are 103.40ms for Bytes and > 106.20ms for Text, they are equal within measurement error. > > You said: "First version runs in 3 seconds, second in 2 seconds. For the > slower version using `.Text()` 90+ percent of the time was in > `syscall.Syscall` (under syscall.Read on the file) that I wasn’t expecting > anything more than 10% change. However, the second version shows a much > bigger reduction in `syscall.Syscall`. How is that possible? I expected it > to be faster, but not affect the time spend reading file." You didn't > provide pprof cpu top output. Your attached profile SVG files are > unreadable. > > So I'm puzzled that you are puzzled. > > $ go version > go version devel +24dc8c6 Tue Mar 21 16:03:12 2017 +0000 linux/amd64 > > $ go run grepin.go > 40000000 9600000000 > > $ go build grepbytes.go && time ./grepbytes XxX /tmp/grepin.txt > 2017/03/21 13:09:38 profile: cpu profiling enabled, > /tmp/profile656604512/cpu.pprof > 40000000 9600000000 > 2017/03/21 13:10:16 profile: cpu profiling disabled, > /tmp/profile656604512/cpu.pprof > real 0m38.567s > user 0m25.016s > sys 0m10.744s > > $ go tool pprof ./grepbytes /tmp/profile656604512/cpu.pprof > File: grepbytes > Type: cpu > Time: Mar 21, 2017 at 1:09pm (EDT) > Duration: 38.55s, Total samples = 344.60ms ( 0.89%) > Entering interactive mode (type "help" for commands, "o" for options) > (pprof) top20 > Showing nodes accounting for 325.40ms, 94.43% of 344.60ms total > Dropped 36 nodes (cum <= 1.72ms) > Showing top 20 nodes out of 30 > flat flat% sum% cum cum% > 103.40ms 30.01% 30.01% 109.20ms 31.69% syscall.Syscall > /home/peter/go/src/syscall/asm_linux_amd64.s > 42.70ms 12.39% 42.40% 42.70ms 12.39% runtime.indexbytebody > /home/peter/go/src/runtime/asm_amd64.s > 27.20ms 7.89% 50.29% 27.20ms 7.89% sync.(*Mutex).Lock > /home/peter/go/src/sync/mutex.go > 25ms 7.25% 57.54% 25ms 7.25% sync.(*Mutex).Unlock > /home/peter/go/src/sync/mutex.go > 18.90ms 5.48% 63.03% 175.60ms 50.96% bufio.(*Scanner).Scan > /home/peter/go/src/bufio/scan.go > 12.70ms 3.69% 66.72% 68.80ms 19.97% regexp.(*machine).backtrack > /home/peter/go/src/regexp/backtrack.go > 12.70ms 3.69% 70.40% 12.70ms 3.69% runtime.memclrNoHeapPointers > /home/peter/go/src/runtime/memclr_amd64.s > 10.50ms 3.05% 73.45% 33.50ms 9.72% bufio.ScanLines > /home/peter/go/src/bufio/scan.go > 10.20ms 2.96% 76.41% 145.70ms 42.28% regexp.(*Regexp).doExecute > /home/peter/go/src/regexp/exec.go > 9ms 2.61% 79.02% 154.70ms 44.89% regexp.(*Regexp).doMatch > /home/peter/go/src/regexp/exec.go > 8.50ms 2.47% 81.49% 31.20ms 9.05% bytes.Index > /home/peter/go/src/bytes/bytes_amd64.go > 7.70ms 2.23% 83.72% 32.90ms 9.55% regexp.(*Regexp).put > /home/peter/go/src/regexp/regexp.go > 6.80ms 1.97% 85.69% 33.80ms 9.81% regexp.(*Regexp).get > /home/peter/go/src/regexp/regexp.go > 6.70ms 1.94% 87.64% 161.40ms 46.84% regexp.(*Regexp).Match > /home/peter/go/src/regexp/regexp.go > 6.10ms 1.77% 89.41% 18.80ms 5.46% regexp.(*bitState).reset > /home/peter/go/src/regexp/backtrack.go > 5.60ms 1.63% 91.03% 36.80ms 10.68% regexp.(*inputBytes).index > /home/peter/go/src/regexp/regexp.go > 5.50ms 1.60% 92.63% 342.50ms 99.39% main.main > /home/peter/gopath/src/nuts/prof/grepbytes.go > 3ms 0.87% 93.50% 3ms 0.87% bytes.IndexByte > /home/peter/go/src/runtime/asm_amd64.s > 1.70ms 0.49% 93.99% 2.40ms 0.7% runtime.reentersyscall > /home/peter/go/src/runtime/proc.go > 1.50ms 0.44% 94.43% 121.70ms 35.32% os.(*File).read > /home/peter/go/src/os/file_unix.go > (pprof) quit > > $ go build greptext.go && time ./greptext XxX /tmp/grepin.txt > 2017/03/21 13:15:26 profile: cpu profiling enabled, > /tmp/profile069788137/cpu.pprof > 40000000 9600000000 > 2017/03/21 13:16:19 profile: cpu profiling disabled, > /tmp/profile069788137/cpu.pprof > real 0m53.217s > user 0m48.280s > sys 0m11.976s > > $ go tool pprof ./greptext /tmp/profile069788137/cpu.pprof > File: greptext > Type: cpu > Time: Mar 21, 2017 at 1:15pm (EDT) > Duration: 53.19s, Total samples = 584ms ( 1.10%) > Entering interactive mode (type "help" for commands, "o" for options) > (pprof) top20 > Showing nodes accounting for 433.10ms, 74.16% of 584ms total > Dropped 149 nodes (cum <= 2.92ms) > Showing top 20 nodes out of 80 > flat flat% sum% cum cum% > 106.20ms 18.18% 18.18% 115.40ms 19.76% syscall.Syscall > /home/peter/go/src/syscall/asm_linux_amd64.s > 47.70ms 8.17% 26.35% 47.70ms 8.17% runtime.indexbytebody > /home/peter/go/src/runtime/asm_amd64.s > 31ms 5.31% 31.66% 31ms 5.31% runtime.memclrNoHeapPointers > /home/peter/go/src/runtime/memclr_amd64.s > 30.90ms 5.29% 36.95% 30.90ms 5.29% sync.(*Mutex).Lock > /home/peter/go/src/sync/mutex.go > 29.90ms 5.12% 42.07% 29.90ms 5.12% sync.(*Mutex).Unlock > /home/peter/go/src/sync/mutex.go > 24.90ms 4.26% 46.34% 111.80ms 19.14% runtime.mallocgc > /home/peter/go/src/runtime/malloc.go > 18.50ms 3.17% 49.50% 191.30ms 32.76% bufio.(*Scanner).Scan > /home/peter/go/src/bufio/scan.go > 18.50ms 3.17% 52.67% 18.50ms 3.17% runtime.memmove > /home/peter/go/src/runtime/memmove_amd64.s > 16.10ms 2.76% 55.43% 19.90ms 3.41% runtime.scanobject > /home/peter/go/src/runtime/mgcmark.go > 13.60ms 2.33% 57.76% 13.60ms 2.33% runtime.nextFreeFast > /home/peter/go/src/runtime/malloc.go > 12.70ms 2.17% 59.93% 72.30ms 12.38% regexp.(*machine).backtrack > /home/peter/go/src/regexp/backtrack.go > 12.60ms 2.16% 62.09% 179.10ms 30.67% regexp.(*Regexp).doMatch > /home/peter/go/src/regexp/exec.go > 11.60ms 1.99% 64.08% 166.50ms 28.51% regexp.(*Regexp).doExecute > /home/peter/go/src/regexp/exec.go > 11.30ms 1.93% 66.01% 37ms 6.34% bufio.ScanLines > /home/peter/go/src/bufio/scan.go > 8.30ms 1.42% 67.43% 40.10ms 6.87% regexp.(*Regexp).put > /home/peter/go/src/regexp/regexp.go > 8.30ms 1.42% 68.85% 33.20ms 5.68% strings.Index > /home/peter/go/src/strings/strings_amd64.go > 7.90ms 1.35% 70.21% 18.60ms 3.18% regexp.(*bitState).reset > /home/peter/go/src/regexp/backtrack.go > 7.90ms 1.35% 71.56% 150.90ms 25.84% runtime.slicebytetostring > /home/peter/go/src/runtime/string.go > 7.60ms 1.30% 72.86% 7.60ms 1.30% runtime.mallocgc > /home/peter/go/src/runtime/mbitmap.go > 7.60ms 1.30% 74.16% 8.90ms 1.52% runtime.scanblock > /home/peter/go/src/runtime/mgcmark.go > (pprof) quit > > Peter > > On Monday, March 20, 2017 at 7:51:50 PM UTC-4, Harry B wrote: >> >> Hi, >> >> As part of preparing an internal go talk to explain using strings vs >> bytes of the bufio.Scanner, I created two small samples >> https://play.golang.org/p/-TYycdHaPC and >> https://play.golang.org/p/L7W-jiaHdL , the only difference in the hot >> path is reducing the conversions from `[]byte` to `string` (later also >> avoids fmt module’s string conversions). Input being a 10 million line >> file. Match is only one line. >> >> For the moment, ignore the fact that I am not checking for number of >> arguments, is doing unnecessary optimizations etc. I am trying to take a >> simple case to explain. >> >> First version runs in 3 seconds, second in 2 seconds. For the slower >> version using `.Text()` 90+ percent of the time was in `syscall.Syscall` >> (under syscall.Read on the file) that I wasn’t expecting anything more than >> 10% change. However, the second version shows a much bigger reduction in >> `syscall.Syscall`. How is that possible? I expected it to be faster, but >> not affect the time spend reading file.The exclusion of fmt was unnecessary >> since the number of matches will be very small. But that is probably not >> relevant here. I have run it multiple times and made sure file is cached. >> >> Attached are the profile SVGs. You can also see that an entire branch of >> the execution has disappeared in the faster/second version. >> >> I do understand the second has avoided the allocation of string via >> scanner.Text(), however, the output of go build -gcflags "-m" give me the >> following >> >> // First version >> >> > go build -gcflags "-m" grep_simple_re.go >> # command-line-arguments >> ./grep_simple_re.go:17: inlining call to bufio.NewScanner >> ./grep_simple_re.go:19: inlining call to (*bufio.Scanner).Text >> ./grep_simple_re.go:17: fp escapes to heap >> ./grep_simple_re.go:19: string(bufio.s·2.token) escapes to heap >> *./grep_simple_re.go:21: line escapes to heap* >> ./grep_simple_re.go:17: main &bufio.Scanner literal does not escape >> ./grep_simple_re.go:21: main ... argument does not escape >> <autogenerated>:1: leaking param: io.p >> <autogenerated>:1: leaking param: .this >> >> // Second version >> > go build -gcflags "-m" grep_simple_re_bytes.go >> # command-line-arguments >> ./grep_simple_re_bytes.go:18: inlining call to bufio.NewScanner >> ./grep_simple_re_bytes.go:20: inlining call to (*bufio.Scanner).Bytes >> ./grep_simple_re_bytes.go:16: fn escapes to heap >> ./grep_simple_re_bytes.go:16: err escapes to heap >> ./grep_simple_re_bytes.go:18: fp escapes to heap >> ./grep_simple_re_bytes.go:16: main ... argument does not escape >> ./grep_simple_re_bytes.go:18: main &bufio.Scanner literal does not escape >> ./grep_simple_re_bytes.go:24: main ([]byte)("\n") does not escape >> <autogenerated>:1: leaking param: io.p >> <autogenerated>:1: leaking param: .this >> >> >> *I can see "./grep_simple_re.go:21: line escapes to heap" could be a >> problem in the first version.* >> >> Here is the memory profiles of a run on 10million line 2.4 G file >> >> $ go tool pprof ./grep_simple_re_mprofile >> /var/folders/bt/1mh2p2vx41lbnq3fz1n8qlxw0000gn/T/profile066323621/mem.pprof >> Entering interactive mode (type "help" for commands) >> (pprof) top10 >> 86.37kB of 86.37kB total ( 100%) >> Dropped 4 nodes (cum <= 0.43kB) >> Showing top 10 nodes out of 23 (cum >= 43.89kB) >> flat flat% sum% cum cum% >> 39.73kB 46.00% 46.00% 39.73kB 46.00% regexp.(*bitState).reset >> 16.30kB 18.87% 64.87% 16.30kB 18.87% bufio.(*Scanner).Scan >> 12.62kB 14.61% 79.48% 12.62kB 14.61% runtime.malg >> 9.04kB 10.47% 89.95% 17.45kB 20.21% runtime.allocm >> 4.52kB 5.23% 95.19% 4.52kB 5.23% runtime.rawstringtmp >> 4.16kB 4.81% 100% 4.16kB 4.81% regexp.progMachine >> 0 0% 100% 64.71kB 74.92% main.main >> 0 0% 100% 43.89kB 50.82% regexp.(*Regexp).MatchString >> 0 0% 100% 43.89kB 50.82% regexp.(*Regexp).doExecute >> 0 0% 100% 43.89kB 50.82% regexp.(*Regexp).doMatch >> (pprof) quit >> $ go tool pprof* -alloc_space* ./grep_simple_re_mprofile >> /var/folders/bt/1mh2p2vx41lbnq3fz1n8qlxw0000gn/T/profile066323621/mem.pprof >> Entering interactive mode (type "help" for commands) >> (pprof) top10 >> 2.14GB of 2.14GB total ( 100%) >> Dropped 22 nodes (cum <= 0.01GB) >> flat flat% sum% cum cum% >> 2.14GB 100% 100% 2.14GB 100% runtime.rawstringtmp >> 0 0% 100% 2.14GB 100% main.main >> 0 0% 100% 2.14GB 100% runtime.goexit >> 0 0% 100% 2.14GB 100% runtime.main >> 0 0% 100% 2.14GB 100% runtime.slicebytetostring >> >> >> *The only explanation for such a large change is if it grew the stack >> 2.4G in size for the first case. The SVG attached seems to show something >> like that. Instead if it was heap allocation, why didn't it show up in the >> first pprof (memory profile) output above? Does 'escape to heap' not imply >> it will be allocated in heap?Any call to re.Match or fmt.Println() escapes >> the string?* >> >> #### System details >> >> ``` >> go version go1.8 darwin/amd64 >> GOARCH="amd64" >> GOBIN="" >> GOEXE="" >> GOHOSTARCH="amd64" >> GOHOSTOS="darwin" >> GOOS="darwin" >> GOPATH="/Users/harry/code" >> GORACE="" >> GOROOT="/usr/local/go" >> GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64" >> GCCGO="gccgo" >> CC="clang" >> GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments >> -fmessage-length=0 >> -fdebug-prefix-map=/var/folders/bt/1mh2p2vx41lbnq3fz1n8qlxw0000gn/T/go-build737735845=/tmp/go-build >> >> -gno-record-gcc-switches -fno-common" >> CXX="clang++" >> CGO_ENABLED="1" >> PKG_CONFIG="pkg-config" >> CGO_CFLAGS="-g -O2" >> CGO_CPPFLAGS="" >> CGO_CXXFLAGS="-g -O2" >> CGO_FFLAGS="-g -O2" >> CGO_LDFLAGS="-L/usr/local/lib" >> GOROOT/bin/go version: go version go1.8 darwin/amd64 >> GOROOT/bin/go tool compile -V: compile version go1.8 X:framepointer >> uname -v: Darwin Kernel Version 15.6.0: Mon Jan 9 23:07:29 PST 2017; >> root:xnu-3248.60.11.2.1~1/RELEASE_X86_64 >> ProductName: Mac OS X >> ProductVersion: 10.11.6 >> BuildVersion: 15G1217 >> lldb --version: lldb-360.1.70 >> ``` >> >> Thanks >> -- >> Harry >> >> -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.