Interesting, that does sound like it could be the cause. I just tried running the same benchmark on master (692df217ca21b6df8e4dc65538fcc90733e8900e), and I get the following results:
gc 1 @0.004s 3%: 0.009+0.41+0.049 ms clock, 0.036+0.11/0.36/0.12+0.19 ms cpu, 4->4->3 MB, 5 MB goal, 4 P gc 2 @0.008s 4%: 0.008+0.80+0.035 ms clock, 0.034+0.097/0.67/0.16+0.14 ms cpu, 7->7->7 MB, 8 MB goal, 4 P gc 3 @0.016s 3%: 0.010+0.91+0.044 ms clock, 0.041+0/0.31/0.79+0.17 ms cpu, 13->15->14 MB, 15 MB goal, 4 P gc 4 @0.032s 3%: 0.009+2.3+0.10 ms clock, 0.037+0.60/2.0/0.12+0.40 ms cpu, 27->28->27 MB, 29 MB goal, 4 P gc 5 @0.070s 3%: 0.010+7.6+0.068 ms clock, 0.043+0.79/5.4/8.5+0.27 ms cpu, 51->53->51 MB, 54 MB goal, 4 P gc 6 @0.149s 3%: 0.020+8.2+0.12 ms clock, 0.081+0.56/7.2/9.7+0.48 ms cpu, 98->102->99 MB, 103 MB goal, 4 P gc 7 @0.282s 4%: 0.028+21+0.082 ms clock, 0.11+10/20/1.9+0.32 ms cpu, 190->195->190 MB, 198 MB goal, 4 P gc 8 @0.568s 3%: 0.024+24+0.080 ms clock, 0.098+0/23/41+0.32 ms cpu, 364->376->214 MB, 381 MB goal, 4 P gc 9 @0.816s 3%: 0.008+27+0.072 ms clock, 0.035+0/25/34+0.29 ms cpu, 412->420->213 MB, 428 MB goal, 4 P gc 10 @1.064s 3%: 0.009+31+0.10 ms clock, 0.039+6.1/26/33+0.41 ms cpu, 415->427->216 MB, 427 MB goal, 4 P My manually calculated worst time for a call to mkMessage is 7.73812ms, which is much better than before. It's significantly faster than the worst wall clock time for the concurrent mark/scan phase, but it's also much slower than the worst STW phase. Do you know why this might be? Best, Will On Wednesday, 19 October 2016 17:29:23 UTC+1, rhys.h...@gmail.com wrote: > > Yes, this sounds a lot like https://golang.org/issue/16293, where > goroutines that allocate memory while the garbage collector is running can > end up stalled for nearly the entire GC cycle, in programs where a large > amount of the memory is in a single allocation. For the program you've > shared, that would be the "channel" map. The bug is present in Go 1.5–1.7, > and is fixed in tip (via CL 23540). > > Do you still see the problem if you run the program with the current > development version of Go? > > On Wednesday, October 19, 2016 at 6:10:23 AM UTC-7, r...@golang.org wrote: >> >> This is likely 23540 <https://go-review.googlesource.com/#/c/23540/>. >> >> >> On Wednesday, October 19, 2016 at 8:32:18 AM UTC-4, Will Sewell wrote: >>> >>> Hey, I previously posted this on StackOverflow, but I was told this >>> mailing list would be a better forum for discussion. >>> >>> I am attempting to benchmark the maximum STW GC pause time for different >>> numbers of heap objects. To do this I have written a simple benchmark that >>> pushes and pops messages from a map: >>> >>> package main >>> >>> type message []byte >>> >>> type channel map[int]message >>> >>> const ( >>> windowSize = 200000 >>> msgCount = 1000000 >>> ) >>> >>> func mkMessage(n int) message { >>> m := make(message, 1024) >>> for i := range m { >>> m[i] = byte(n) >>> } >>> return m >>> } >>> >>> >>> func pushMsg(c *channel, highID int) { >>> lowID := highID - windowSize >>> m := mkMessage(highID) >>> (*c)[highID] = m >>> if lowID >= 0 { >>> delete(*c, lowID) >>> } >>> } >>> >>> >>> func main() { >>> c := make(channel) >>> for i := 0; i < msgCount; i++ { >>> pushMsg(&c, i) >>> } >>> } >>> >>> I ran this with GODEBUG=gctrace=1 <https://golang.org/pkg/runtime/>, >>> and on my machine the output is: >>> >>> gc 1 @0.004s 2%: 0.007+0.44+0.032 ms clock, 0.029+0.22/0.20/0.28+0.12 >>> ms cpu, 4->4->3 MB, 5 MB goal, 4 P >>> gc 2 @0.009s 3%: 0.007+0.64+0.042 ms clock, 0.030+0/0.53/0.18+0.17 ms >>> cpu, 7->7->7 MB, 8 MB goal, 4 P >>> gc 3 @0.019s 1%: 0.007+0.99+0.037 ms clock, 0.031+0/0.13/1.0+0.14 ms cpu >>> , 13->13->13 MB, 14 MB goal, 4 P >>> gc 4 @0.044s 2%: 0.009+2.3+0.032 ms clock, 0.039+0/2.3/0.30+0.13 ms cpu, >>> 25->25->25 MB, 26 MB goal, 4 P >>> gc 5 @0.081s 1%: 0.009+9.2+0.082 ms clock, 0.039+0/0.32/9.7+0.32 ms cpu, >>> 49->49->48 MB, 50 MB goal, 4 P >>> gc 6 @0.162s 0%: 0.020+10+0.078 ms clock, 0.082+0/0.28/11+0.31 ms cpu, >>> 93->93->91 MB, 96 MB goal, 4 P >>> gc 7 @0.289s 0%: 0.020+27+0.092 ms clock, 0.080+0/0.95/28+0.37 ms cpu, >>> 178->178->173 MB, 182 MB goal, 4 P >>> gc 8 @0.557s 1%: 0.023+38+0.086 ms clock, 0.092+0/38/10+0.34 ms cpu, 337 >>> ->339->209 MB, 346 MB goal, 4 P >>> gc 9 @0.844s 1%: 0.008+40+0.077 ms clock, 0.032+0/5.6/46+0.30 ms cpu, >>> 407->409->211 MB, 418 MB goal, 4 P >>> gc 10 @1.100s 1%: 0.009+43+0.047 ms clock, 0.036+0/6.6/50+0.19 ms cpu, >>> 411->414->212 MB, 422 MB goal, 4 P >>> gc 11 @1.378s 1%: 0.008+45+0.093 ms clock, 0.033+0/6.5/52+0.37 ms cpu, >>> 414->417->213 MB, 425 MB goal, 4 P >>> >>> My version of Go is: >>> >>> $ go version >>> go version go1.7.1 darwin/amd64 >>> >>> From the above results, the longest wall clock STW pause time is >>> 0.093ms. Great! >>> >>> However as a sanity check I also manually timed how long it took to >>> create a new message by wrapping mkMessage with >>> >>> start := time.Now() >>> m := mkMessage(highID) >>> elapsed := time.Since(start) >>> >>> and printed the slowest `elapsed` time. The time I get for this was >>> 38.573036ms! >>> >>> I was instantly suspicious because this correlated strongly with the >>> wall clock times in the supposedly concurrent mark/scan phase, and in >>> particular with "idle GC time". >>> >>> *My question is: why does this supposedly concurrent phase of the GC >>> appear to block the mutator?* >>> >>> If I force the GC to run at regular intervals, my manually calculated >>> pause times go way down to <1ms, so it appears to be hitting some kind of >>> limit of non-live heap objects. If so, I'm not sure what that limit is, and >>> why it would cause a concurrent phase of the GC to appear to block the >>> mutator. >>> >>> Thanks! >>> >> -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.