Yes, this sounds like https://golang.org/issue/16528. During the concurrent mark phase (the "27 [ms]" of "0.008+27+0.072 ms clock"), both your code and the garbage collector are running. The program is allowed to use four OS threads ("4 P"), which might be executing your code in your goroutines, or might be running GC code in dedicated GC goroutines.
There's plenty of work for the GC to do, so when a GC helper goroutine is allowed to have some processing time it'll keep running until it has used up all of its allowed time—ten milliseconds. If all four threads end up running GC goroutines at the same time, your goroutine will need to wait until one of them has run for about 10ms before it can be scheduled again. This can lead to individual goroutines being paused for up to 10ms. You might be able to see this with the execution tracer, but it's not an easy tool to use. See package "runtime/trace" and the command "go tool trace" for some hints. Issue 16528 includes some screenshots of the tool. On Thursday, October 20, 2016 at 6:36:27 AM UTC-7, Will Sewell wrote: > > Interesting, that does sound like it could be the cause. > > I just tried running the same benchmark on master > (692df217ca21b6df8e4dc65538fcc90733e8900e), and I get the following results: > > gc 1 @0.004s 3%: 0.009+0.41+0.049 ms clock, 0.036+0.11/0.36/0.12+0.19 ms > cpu, 4->4->3 MB, 5 MB goal, 4 P > gc 2 @0.008s 4%: 0.008+0.80+0.035 ms clock, 0.034+0.097/0.67/0.16+0.14 ms > cpu, 7->7->7 MB, 8 MB goal, 4 P > gc 3 @0.016s 3%: 0.010+0.91+0.044 ms clock, 0.041+0/0.31/0.79+0.17 ms cpu, > 13->15->14 MB, 15 MB goal, 4 P > gc 4 @0.032s 3%: 0.009+2.3+0.10 ms clock, 0.037+0.60/2.0/0.12+0.40 ms cpu, > 27->28->27 MB, 29 MB goal, 4 P > gc 5 @0.070s 3%: 0.010+7.6+0.068 ms clock, 0.043+0.79/5.4/8.5+0.27 ms cpu, > 51->53->51 MB, 54 MB goal, 4 P > gc 6 @0.149s 3%: 0.020+8.2+0.12 ms clock, 0.081+0.56/7.2/9.7+0.48 ms cpu, > 98->102->99 MB, 103 MB goal, 4 P > gc 7 @0.282s 4%: 0.028+21+0.082 ms clock, 0.11+10/20/1.9+0.32 ms cpu, > 190->195->190 MB, 198 MB goal, 4 P > gc 8 @0.568s 3%: 0.024+24+0.080 ms clock, 0.098+0/23/41+0.32 ms cpu, > 364->376->214 MB, 381 MB goal, 4 P > gc 9 @0.816s 3%: 0.008+27+0.072 ms clock, 0.035+0/25/34+0.29 ms cpu, > 412->420->213 MB, 428 MB goal, 4 P > gc 10 @1.064s 3%: 0.009+31+0.10 ms clock, 0.039+6.1/26/33+0.41 ms cpu, > 415->427->216 MB, 427 MB goal, 4 P > > My manually calculated worst time for a call to mkMessage is 7.73812ms, > which is much better than before. It's significantly faster than the worst > wall clock time for the concurrent mark/scan phase, but it's also much > slower than the worst STW phase. Do you know why this might be? > > Best, > Will > > On Wednesday, 19 October 2016 17:29:23 UTC+1, rhys.h...@gmail.com wrote: >> >> Yes, this sounds a lot like https://golang.org/issue/16293, where >> goroutines that allocate memory while the garbage collector is running can >> end up stalled for nearly the entire GC cycle, in programs where a large >> amount of the memory is in a single allocation. For the program you've >> shared, that would be the "channel" map. The bug is present in Go 1.5–1.7, >> and is fixed in tip (via CL 23540). >> >> Do you still see the problem if you run the program with the current >> development version of Go? >> >> On Wednesday, October 19, 2016 at 6:10:23 AM UTC-7, r...@golang.org >> wrote: >>> >>> This is likely 23540 <https://go-review.googlesource.com/#/c/23540/>. >>> >>> >>> On Wednesday, October 19, 2016 at 8:32:18 AM UTC-4, Will Sewell wrote: >>>> >>>> Hey, I previously posted this on StackOverflow, but I was told this >>>> mailing list would be a better forum for discussion. >>>> >>>> I am attempting to benchmark the maximum STW GC pause time for >>>> different numbers of heap objects. To do this I have written a simple >>>> benchmark that pushes and pops messages from a map: >>>> >>>> package main >>>> >>>> type message []byte >>>> >>>> type channel map[int]message >>>> >>>> const ( >>>> windowSize = 200000 >>>> msgCount = 1000000 >>>> ) >>>> >>>> func mkMessage(n int) message { >>>> m := make(message, 1024) >>>> for i := range m { >>>> m[i] = byte(n) >>>> } >>>> return m >>>> } >>>> >>>> >>>> func pushMsg(c *channel, highID int) { >>>> lowID := highID - windowSize >>>> m := mkMessage(highID) >>>> (*c)[highID] = m >>>> if lowID >= 0 { >>>> delete(*c, lowID) >>>> } >>>> } >>>> >>>> >>>> func main() { >>>> c := make(channel) >>>> for i := 0; i < msgCount; i++ { >>>> pushMsg(&c, i) >>>> } >>>> } >>>> >>>> I ran this with GODEBUG=gctrace=1 <https://golang.org/pkg/runtime/>, >>>> and on my machine the output is: >>>> >>>> gc 1 @0.004s 2%: 0.007+0.44+0.032 ms clock, 0.029+0.22/0.20/0.28+0.12 >>>> ms cpu, 4->4->3 MB, 5 MB goal, 4 P >>>> gc 2 @0.009s 3%: 0.007+0.64+0.042 ms clock, 0.030+0/0.53/0.18+0.17 ms >>>> cpu, 7->7->7 MB, 8 MB goal, 4 P >>>> gc 3 @0.019s 1%: 0.007+0.99+0.037 ms clock, 0.031+0/0.13/1.0+0.14 ms >>>> cpu, 13->13->13 MB, 14 MB goal, 4 P >>>> gc 4 @0.044s 2%: 0.009+2.3+0.032 ms clock, 0.039+0/2.3/0.30+0.13 ms cpu >>>> , 25->25->25 MB, 26 MB goal, 4 P >>>> gc 5 @0.081s 1%: 0.009+9.2+0.082 ms clock, 0.039+0/0.32/9.7+0.32 ms cpu >>>> , 49->49->48 MB, 50 MB goal, 4 P >>>> gc 6 @0.162s 0%: 0.020+10+0.078 ms clock, 0.082+0/0.28/11+0.31 ms cpu, >>>> 93->93->91 MB, 96 MB goal, 4 P >>>> gc 7 @0.289s 0%: 0.020+27+0.092 ms clock, 0.080+0/0.95/28+0.37 ms cpu, >>>> 178->178->173 MB, 182 MB goal, 4 P >>>> gc 8 @0.557s 1%: 0.023+38+0.086 ms clock, 0.092+0/38/10+0.34 ms cpu, >>>> 337->339->209 MB, 346 MB goal, 4 P >>>> gc 9 @0.844s 1%: 0.008+40+0.077 ms clock, 0.032+0/5.6/46+0.30 ms cpu, >>>> 407->409->211 MB, 418 MB goal, 4 P >>>> gc 10 @1.100s 1%: 0.009+43+0.047 ms clock, 0.036+0/6.6/50+0.19 ms cpu, >>>> 411->414->212 MB, 422 MB goal, 4 P >>>> gc 11 @1.378s 1%: 0.008+45+0.093 ms clock, 0.033+0/6.5/52+0.37 ms cpu, >>>> 414->417->213 MB, 425 MB goal, 4 P >>>> >>>> My version of Go is: >>>> >>>> $ go version >>>> go version go1.7.1 darwin/amd64 >>>> >>>> From the above results, the longest wall clock STW pause time is >>>> 0.093ms. Great! >>>> >>>> However as a sanity check I also manually timed how long it took to >>>> create a new message by wrapping mkMessage with >>>> >>>> start := time.Now() >>>> m := mkMessage(highID) >>>> elapsed := time.Since(start) >>>> >>>> and printed the slowest `elapsed` time. The time I get for this was >>>> 38.573036ms! >>>> >>>> I was instantly suspicious because this correlated strongly with the >>>> wall clock times in the supposedly concurrent mark/scan phase, and in >>>> particular with "idle GC time". >>>> >>>> *My question is: why does this supposedly concurrent phase of the GC >>>> appear to block the mutator?* >>>> >>>> If I force the GC to run at regular intervals, my manually calculated >>>> pause times go way down to <1ms, so it appears to be hitting some kind of >>>> limit of non-live heap objects. If so, I'm not sure what that limit is, >>>> and >>>> why it would cause a concurrent phase of the GC to appear to block the >>>> mutator. >>>> >>>> Thanks! >>>> >>> -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.