[go-nuts] Re: Why does a "concurrent" Go GC phase appear to be stop-the-world?

Will Sewell Thu, 01 Dec 2016 11:28:14 -0800

I just wrote up our investigation in this blog 
post: https://blog.pusher.com/golangs-real-time-gc-in-theory-and-practice/


It includes an animation of the GC algorithm that my colleague James Fisher 
made. I think this could be valuable as a standalone resource.

If any of you have any feedback, please let me know.

On Thursday, 27 October 2016 17:58:30 UTC+1, Dave Cheney wrote:
>
> Thanks for confirming that CL 23540 has reduced STW delays.
>
> On Friday, 28 October 2016 03:52:29 UTC+11, Will Sewell wrote:
>>
>> Are you referring to https://go-review.googlesource.com/#/c/23540/ or 
>> https://github.com/golang/go/issues/16528? If it's the former, then yes 
>> I have tried the benchmark with the HEAD version on master of the compiler, 
>> and it did bring the pause down to ~7.7ms. I was under the impression the 
>> latter issue has not been fixed yet, and is the reason the pause time was 
>> not even lower.
>>
>> On Wednesday, 26 October 2016 21:45:59 UTC+1, Dave Cheney wrote:
>>>
>>> Will, the changes has been in master for a few months now, are you able 
>>> to build master from source and see if this has addressed the issue? I'm 
>>> sure Rick and Austin would appreciate the feedback.
>>>
>>> On Thursday, 27 October 2016 01:46:47 UTC+11, Will Sewell wrote:
>>>>
>>>> Thanks for the information. I think it could well be caused by that. 
>>>> Below is the screenshot of one of the periods of time where the mutator is 
>>>> blocked.
>>>>
>>>>
>>>> <https://lh3.googleusercontent.com/-9oESdo48mbY/WBDBYsoWB2I/AAAAAAAAHYA/d32gkJjxjXU0G-_4to7SdUxcI6GOGgFSACLcB/s1600/Screen%2BShot%2B2016-10-26%2Bat%2B15.39.07.png>
>>>>
>>>>
>>>> Note: pause times were as high as 15ms with the tracer enabled.
>>>>
>>>> Similar sizes occur every ~100ms.
>>>>
>>>> Let's hope this gets resolved in Go1.8 :)
>>>>
>>>> On Monday, 24 October 2016 17:06:59 UTC+1, rhys.h...@gmail.com wrote:
>>>>>
>>>>> Yes, this sounds like https://golang.org/issue/16528. During the 
>>>>> concurrent mark phase (the "27 [ms]" of "0.008+27+0.072 ms clock"), both 
>>>>> your code and the garbage collector are running. The program is allowed 
>>>>> to 
>>>>> use four OS threads ("4 P"), which might be executing your code in your 
>>>>> goroutines, or might be running GC code in dedicated GC goroutines.
>>>>>
>>>>> There's plenty of work for the GC to do, so when a GC helper goroutine 
>>>>> is allowed to have some processing time it'll keep running until it has 
>>>>> used up all of its allowed time—ten milliseconds. If all four threads end 
>>>>> up running GC goroutines at the same time, your goroutine will need to 
>>>>> wait 
>>>>> until one of them has run for about 10ms before it can be scheduled 
>>>>> again. 
>>>>> This can lead to individual goroutines being paused for up to 10ms.
>>>>>
>>>>> You might be able to see this with the execution tracer, but it's not 
>>>>> an easy tool to use. See package "runtime/trace" and the command "go tool 
>>>>> trace" for some hints. Issue 16528 includes some screenshots of the tool.
>>>>>
>>>>> On Thursday, October 20, 2016 at 6:36:27 AM UTC-7, Will Sewell wrote:
>>>>>>
>>>>>> Interesting, that does sound like it could be the cause.
>>>>>>
>>>>>> I just tried running the same benchmark on master 
>>>>>> (692df217ca21b6df8e4dc65538fcc90733e8900e), and I get the following 
>>>>>> results:
>>>>>>
>>>>>> gc 1 @0.004s 3%: 0.009+0.41+0.049 ms clock, 0.036+0.11/0.36/0.12+0.19 
>>>>>> ms cpu, 4->4->3 MB, 5 MB goal, 4 P
>>>>>> gc 2 @0.008s 4%: 0.008+0.80+0.035 ms clock, 
>>>>>> 0.034+0.097/0.67/0.16+0.14 ms cpu, 7->7->7 MB, 8 MB goal, 4 P
>>>>>> gc 3 @0.016s 3%: 0.010+0.91+0.044 ms clock, 0.041+0/0.31/0.79+0.17 ms 
>>>>>> cpu, 13->15->14 MB, 15 MB goal, 4 P
>>>>>> gc 4 @0.032s 3%: 0.009+2.3+0.10 ms clock, 0.037+0.60/2.0/0.12+0.40 ms 
>>>>>> cpu, 27->28->27 MB, 29 MB goal, 4 P
>>>>>> gc 5 @0.070s 3%: 0.010+7.6+0.068 ms clock, 0.043+0.79/5.4/8.5+0.27 ms 
>>>>>> cpu, 51->53->51 MB, 54 MB goal, 4 P
>>>>>> gc 6 @0.149s 3%: 0.020+8.2+0.12 ms clock, 0.081+0.56/7.2/9.7+0.48 ms 
>>>>>> cpu, 98->102->99 MB, 103 MB goal, 4 P
>>>>>> gc 7 @0.282s 4%: 0.028+21+0.082 ms clock, 0.11+10/20/1.9+0.32 ms cpu, 
>>>>>> 190->195->190 MB, 198 MB goal, 4 P
>>>>>> gc 8 @0.568s 3%: 0.024+24+0.080 ms clock, 0.098+0/23/41+0.32 ms cpu, 
>>>>>> 364->376->214 MB, 381 MB goal, 4 P
>>>>>> gc 9 @0.816s 3%: 0.008+27+0.072 ms clock, 0.035+0/25/34+0.29 ms cpu, 
>>>>>> 412->420->213 MB, 428 MB goal, 4 P
>>>>>> gc 10 @1.064s 3%: 0.009+31+0.10 ms clock, 0.039+6.1/26/33+0.41 ms 
>>>>>> cpu, 415->427->216 MB, 427 MB goal, 4 P
>>>>>>
>>>>>> My manually calculated worst time for a call to mkMessage is 7.73812ms, 
>>>>>> which is much better than before. It's significantly faster than the 
>>>>>> worst 
>>>>>> wall clock time for the concurrent mark/scan phase, but it's also much 
>>>>>> slower than the worst STW phase. Do you know why this might be?
>>>>>>
>>>>>> Best,
>>>>>> Will
>>>>>>
>>>>>> On Wednesday, 19 October 2016 17:29:23 UTC+1, rhys.h...@gmail.com 
>>>>>> wrote:
>>>>>>>
>>>>>>> Yes, this sounds a lot like https://golang.org/issue/16293, where 
>>>>>>> goroutines that allocate memory while the garbage collector is running 
>>>>>>> can 
>>>>>>> end up stalled for nearly the entire GC cycle, in programs where a 
>>>>>>> large 
>>>>>>> amount of the memory is in a single allocation. For the program you've 
>>>>>>> shared, that would be the "channel" map. The bug is present in Go 
>>>>>>> 1.5–1.7, 
>>>>>>> and is fixed in tip (via CL 23540).
>>>>>>>
>>>>>>> Do you still see the problem if you run the program with the current 
>>>>>>> development version of Go?
>>>>>>>
>>>>>>> On Wednesday, October 19, 2016 at 6:10:23 AM UTC-7, r...@golang.org 
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> This is likely 23540 
>>>>>>>> <https://go-review.googlesource.com/#/c/23540/>.  
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wednesday, October 19, 2016 at 8:32:18 AM UTC-4, Will Sewell 
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Hey, I previously posted this on StackOverflow, but I was told 
>>>>>>>>> this mailing list would be a better forum for discussion.
>>>>>>>>>
>>>>>>>>> I am attempting to benchmark the maximum STW GC pause time for 
>>>>>>>>> different numbers of heap objects. To do this I have written a simple 
>>>>>>>>> benchmark that pushes and pops messages from a map:
>>>>>>>>>
>>>>>>>>> package main
>>>>>>>>>
>>>>>>>>> type message []byte
>>>>>>>>>
>>>>>>>>> type channel map[int]message
>>>>>>>>>
>>>>>>>>> const (
>>>>>>>>>     windowSize = 200000
>>>>>>>>>     msgCount = 1000000
>>>>>>>>> )
>>>>>>>>>
>>>>>>>>> func mkMessage(n int) message {
>>>>>>>>>     m := make(message, 1024)
>>>>>>>>>     for i := range m {
>>>>>>>>>         m[i] = byte(n)
>>>>>>>>>     }
>>>>>>>>>     return m
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> func pushMsg(c *channel, highID int) {
>>>>>>>>>     lowID := highID - windowSize
>>>>>>>>>     m := mkMessage(highID)
>>>>>>>>>     (*c)[highID] = m
>>>>>>>>>     if lowID >= 0 {
>>>>>>>>>         delete(*c, lowID)
>>>>>>>>>     }
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> func main() {
>>>>>>>>>     c := make(channel)
>>>>>>>>>     for i := 0; i < msgCount; i++ {
>>>>>>>>>         pushMsg(&c, i)
>>>>>>>>>     }
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> I ran this with GODEBUG=gctrace=1 
>>>>>>>>> <https://golang.org/pkg/runtime/>, and on my machine the output 
>>>>>>>>> is:
>>>>>>>>>
>>>>>>>>> gc 1 @0.004s 2%: 0.007+0.44+0.032 ms clock, 0.029+0.22/0.20/0.28+
>>>>>>>>> 0.12 ms cpu, 4->4->3 MB, 5 MB goal, 4 P
>>>>>>>>> gc 2 @0.009s 3%: 0.007+0.64+0.042 ms clock, 0.030+0/0.53/0.18+0.17 
>>>>>>>>> ms cpu, 7->7->7 MB, 8 MB goal, 4 P
>>>>>>>>> gc 3 @0.019s 1%: 0.007+0.99+0.037 ms clock, 0.031+0/0.13/1.0+0.14 
>>>>>>>>> ms cpu, 13->13->13 MB, 14 MB goal, 4 P
>>>>>>>>> gc 4 @0.044s 2%: 0.009+2.3+0.032 ms clock, 0.039+0/2.3/0.30+0.13 
>>>>>>>>> ms cpu, 25->25->25 MB, 26 MB goal, 4 P
>>>>>>>>> gc 5 @0.081s 1%: 0.009+9.2+0.082 ms clock, 0.039+0/0.32/9.7+0.32 
>>>>>>>>> ms cpu, 49->49->48 MB, 50 MB goal, 4 P
>>>>>>>>> gc 6 @0.162s 0%: 0.020+10+0.078 ms clock, 0.082+0/0.28/11+0.31 ms 
>>>>>>>>> cpu, 93->93->91 MB, 96 MB goal, 4 P
>>>>>>>>> gc 7 @0.289s 0%: 0.020+27+0.092 ms clock, 0.080+0/0.95/28+0.37 ms 
>>>>>>>>> cpu, 178->178->173 MB, 182 MB goal, 4 P
>>>>>>>>> gc 8 @0.557s 1%: 0.023+38+0.086 ms clock, 0.092+0/38/10+0.34 ms 
>>>>>>>>> cpu, 337->339->209 MB, 346 MB goal, 4 P
>>>>>>>>> gc 9 @0.844s 1%: 0.008+40+0.077 ms clock, 0.032+0/5.6/46+0.30 ms 
>>>>>>>>> cpu, 407->409->211 MB, 418 MB goal, 4 P
>>>>>>>>> gc 10 @1.100s 1%: 0.009+43+0.047 ms clock, 0.036+0/6.6/50+0.19 ms 
>>>>>>>>> cpu, 411->414->212 MB, 422 MB goal, 4 P
>>>>>>>>> gc 11 @1.378s 1%: 0.008+45+0.093 ms clock, 0.033+0/6.5/52+0.37 ms 
>>>>>>>>> cpu, 414->417->213 MB, 425 MB goal, 4 P
>>>>>>>>>
>>>>>>>>> My version of Go is:
>>>>>>>>>
>>>>>>>>> $ go version
>>>>>>>>> go version go1.7.1 darwin/amd64
>>>>>>>>>
>>>>>>>>> From the above results, the longest wall clock STW pause time is 
>>>>>>>>> 0.093ms. Great!
>>>>>>>>>
>>>>>>>>> However as a sanity check I also manually timed how long it took 
>>>>>>>>> to create a new message by wrapping mkMessage with
>>>>>>>>>
>>>>>>>>> start := time.Now()
>>>>>>>>> m := mkMessage(highID)
>>>>>>>>> elapsed := time.Since(start)
>>>>>>>>>
>>>>>>>>> and printed the slowest `elapsed` time. The time I get for this 
>>>>>>>>> was 38.573036ms!
>>>>>>>>>
>>>>>>>>> I was instantly suspicious because this correlated strongly with 
>>>>>>>>> the wall clock times in the supposedly concurrent mark/scan phase, 
>>>>>>>>> and in 
>>>>>>>>> particular with "idle GC time".
>>>>>>>>>
>>>>>>>>> *My question is: why does this supposedly concurrent phase of the 
>>>>>>>>> GC appear to block the mutator?*
>>>>>>>>>
>>>>>>>>> If I force the GC to run at regular intervals, my manually 
>>>>>>>>> calculated pause times go way down to <1ms, so it appears to be 
>>>>>>>>> hitting 
>>>>>>>>> some kind of limit of non-live heap objects. If so, I'm not sure what 
>>>>>>>>> that 
>>>>>>>>> limit is, and why it would cause a concurrent phase of the GC to 
>>>>>>>>> appear to 
>>>>>>>>> block the mutator.
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>>
>>>>>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[go-nuts] Re: Why does a "concurrent" Go GC phase appear to be stop-the-world?

Reply via email to