Nothing is obvious but there is something going on we don't understand. If 
you have the time to create a simple reproducer the next step would be to 
open an issue that contains the reproducer. Once the problem can be 
reproduced locally it can be diagnosed.


On Wednesday, June 7, 2017 at 11:37:28 PM UTC-4, Xun Liu wrote:
>
> https://drive.google.com/open?id=0B4AgU_xkh4SZVUthM0VJZ3V1U0U
>
> gc is in the first slice
>
> On Wed, Jun 7, 2017 at 7:14 AM, Rick Hudson <r...@golang.org <javascript:>
> > wrote:
>
>> Could you send us the trace file that you used. From the image it is hard 
>> to distinguish Goroutines doing GC assists from application goroutines.
>>
>> On Tue, Jun 6, 2017 at 4:11 PM, Xun Liu <pas...@gmail.com <javascript:>> 
>> wrote:
>>
>>>
>>>
>>> On Tue, Jun 6, 2017, 20:38 Rick Hudson <r...@golang.org <javascript:>> 
>>> wrote:
>>>
>>>> >> 1. GOMAXPROCS=24: no latency change (they are the same as 
>>>> GOMAXPROCS=32, both when GC is running and not running)
>>>> This is an interesting clue. 
>>>>
>>>> My guess:
>>>> The application is memory bandwidth limited. If the GC's concurrent 
>>>> mark phase is memory bandwidth bound then the application would also be 
>>>> memory bandwidth bound. If one reduces the amount of bandwidth the GC is 
>>>> using then the application would get the bandwidth it needs. One easy way 
>>>> to reduce the GC's bandwidth requirement would be to reduce the Ps the GC 
>>>> is given from 8 to 6 which is exactly what the reducing GOMAXPROCS=32 to 
>>>> 24 
>>>> does. The remaining 16 Ps would no longer be memory bandwidth bound so 
>>>> could meet the latency SLO. The graph of latency vs. GOMAXPROCS might have 
>>>> an interesting shape.
>>>>
>>>> Alternatively if the GC flushes the cache at a high rate it would cause 
>>>> the application to experience capacity misses and this would also slow the 
>>>> application down. It could be both since capacity misses tend to increase 
>>>> memory bandwidth requirements.
>>>>
>>>> I'll discuss this at lunch and see if we can come up with a simple 
>>>> experiment to confirm/refute the guess.
>>>>
>>>> The good news is that the application has a work around, just reduce 
>>>> GOMAXPROCS to 24. 
>>>>
>>> Sorry maybe I didn't make myself clear. By "no latency change" I meant 
>>> the application latency was the same as when GOMAXPROCS is set to 32: e.g 
>>> p90 was the same 100-120ms during non-gc and jumped to 160-250ms during GC.
>>>
>>> In fact among all the things I tried, nothing seemed to change the 
>>> applucation latency (comparing to GOMAXPROCS=32) except when GOMAXPROCS was 
>>> set to 8 , in which case the application starved for CPU and p90 jumped to 
>>> 120-150ms during non-gc and 200-300 during GC.
>>>
>>>
>>>
>>>> >> The latency spikes only 80-90% of times during gc. There are 10-20% 
>>>> times the latency is not affected by gc.  
>>>> Don't know what to make of this clue. Is the application the only thing 
>>>> running on the machine?
>>>>
>>>> On Mon, Jun 5, 2017 at 10:04 PM, Xun Liu <pas...@gmail.com 
>>>> <javascript:>> wrote:
>>>>
>>>>> Here are some updates on the tests I did:
>>>>>
>>>>> 1. GOMAXPROCS=24: no latency change (they are the same as 
>>>>> GOMAXPROCS=32, both when gc is running and not running)
>>>>> 2. GOMAXPROCS=16: no latency change
>>>>> 3. GOMAXPROCS=8: latency increase by 20-30% across p50-p99 -- the 
>>>>> process starts to starve on cpu.
>>>>>
>>>>> I also did the following to directly measure the impact by assist and 
>>>>> idle scan (by hacking into golang runtime package):
>>>>> 1. disable idle scan : no latency change
>>>>> 2. disable assist scan : no latency change
>>>>> 3. disable both : no latency change
>>>>>
>>>>> So something else in gc is at play here -- if you'd like to have a 
>>>>> guess, here is one more data point:
>>>>> The latency spikes only 80-90% of times during gc. There are 10-20% 
>>>>> times the latency is not affected by gc.  
>>>>>
>>>>>
>>>>> On Thu, Jun 1, 2017 at 11:45 AM, Rick Hudson <r...@golang.org 
>>>>> <javascript:>> wrote:
>>>>>
>>>>>> As a baseline the GC's SLO allows it to use 25% of the CPU and some 
>>>>>> assists from high allocating goroutines that is at most proportional to 
>>>>>> the 
>>>>>> goroutine's allocations. Beyond that the intent is that the GC only 
>>>>>> enlist 
>>>>>> otherwise idle Ps.  If it is the first 25% that is backing up the 
>>>>>> goroutines then things are working as expected. If on the other hand it 
>>>>>> is 
>>>>>> the enlistment of idle Ps that are backing up the goroutines then we 
>>>>>> need 
>>>>>> to understand what is happening and adjust the scheduling based on what 
>>>>>> we 
>>>>>> learn. I am hoping that the experiment I proposed above will shed some 
>>>>>> light on what is causing the backup.
>>>>>>
>>>>>>
>>>>>> On Thu, Jun 1, 2017 at 12:48 PM, Xun Liu <pas...@gmail.com 
>>>>>> <javascript:>> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jun 1, 2017 at 6:06 AM, Rick Hudson <r...@golang.org 
>>>>>>> <javascript:>> wrote:
>>>>>>>
>>>>>>>> Yes, the GC seems to finish early but that is the result of having 
>>>>>>>> to be very conservative about the amount of work left to do. Add to 
>>>>>>>> this a 
>>>>>>>> lot of Ps with nothing scheduled that are enlisted to do GC work. It 
>>>>>>>> seems 
>>>>>>>> counter intuitive to leave Ps idle so the approach will need numbers 
>>>>>>>> to 
>>>>>>>> support it. There is no easy way to change the assist and idle 
>>>>>>>> settings.
>>>>>>>>
>>>>>>>
>>>>>>> I would argue that the fact that we saw runnable goroutines piling 
>>>>>>> up from time to time during gc (which almost never happens outside gc) 
>>>>>>> suggests go gc is too aggressive in enlisting help to a point that can 
>>>>>>> starve user goroutines -- unless you have other explanation for the 
>>>>>>> goroutine queue-up.
>>>>>>>
>>>>>>>
>>>>>>>  
>>>>>>>
>>>>>>>>
>>>>>>>> In an attempt to get some sort of an upper bound on potential 
>>>>>>>> latency improvements that might result from making the GC less 
>>>>>>>> aggressive 
>>>>>>>> could you run the following experiments. 
>>>>>>>>
>>>>>>>> GOMAXPROCS=24 yourApp
>>>>>>>> and report what, if any, change you see on your P50, P90 and P99 
>>>>>>>> numbers when GC is not running.
>>>>>>>> This should give us an upper bound on what can be expected if the 
>>>>>>>> GC runs with only 25% of the CPU. The experiment is not interested in 
>>>>>>>> the 
>>>>>>>> latency when the GC is running. This is trying to simulate what 
>>>>>>>> happens if 
>>>>>>>> we have a minimally aggressive GC.
>>>>>>>>
>>>>>>>> GOMAXPROCS=16 yourApp
>>>>>>>> This should give us an idea of what to expect if the GC never 
>>>>>>>> schedules half the processors regardless if they are idle.
>>>>>>>>
>>>>>>>> GOMAXPROCS=8 yourApp
>>>>>>>> This should give us an idea of what to expect if the GC never 
>>>>>>>> schedules 25% of the processors regardless if they are idle.
>>>>>>>>
>>>>>>>> Thanks, looking forward to seeing the percentile numbers and 
>>>>>>>> graphs. 
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, May 31, 2017 at 3:01 PM, Xun Liu <pas...@gmail.com 
>>>>>>>> <javascript:>> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, May 31, 2017 at 11:35 AM, Rick Hudson <r...@golang.org 
>>>>>>>>> <javascript:>> wrote:
>>>>>>>>>
>>>>>>>>>> gc 347 @6564.164s 0%: 0.89+518+1.0 ms clock, 28+3839/4091/3959+33 
>>>>>>>>>> ms cpu, 23813->23979->12265 MB, 24423 MB goal, 32 P
>>>>>>>>>>
>>>>>>>>>> What I'm seeing here is that you have 32 HW threads and you spend 
>>>>>>>>>> .89+518+1 or 520 ms wall clock in the GC. You also spend  
>>>>>>>>>> 28+3839+4091+3959+33 
>>>>>>>>>> or 11950 ms CPU time out of total of 520*32 or 16640 available CPU 
>>>>>>>>>> cycles 
>>>>>>>>>> while the GC is running. The GC will reserve 25% of the CPU to 
>>>>>>>>>> do its work. That's 16640*.25 or 4160 ms. If the GC finds any of the 
>>>>>>>>>> remaining 24 threads idle it will aggressively enlist them to do GC 
>>>>>>>>>> work.
>>>>>>>>>>
>>>>>>>>>> The graph only shows the 8 HW threads but not the other 24 so it 
>>>>>>>>>> is hard to tell what is going on with them.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Other cpus are doing user level work. I pasted the whole trace 
>>>>>>>>> pictures at the end. 
>>>>>>>>>  
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> This may well be related to  issue 12812 
>>>>>>>>>> <https://github.com/golang/go/issues/14812> so you might want to 
>>>>>>>>>> read that thread for more insight.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> I read the issue before which seems to be related. Something 
>>>>>>>>> special about my trace is many goroutines are GCWaiting, what 
>>>>>>>>> triggers gc 
>>>>>>>>> waiting?
>>>>>>>>> Also is there an easy way to turn off idle gc or assist gc? Not 
>>>>>>>>> knowing too much details about go gc, I wonder if it's worthy to test 
>>>>>>>>> how 
>>>>>>>>> much they 
>>>>>>>>> play a role in the latency increase -- it feels to me they are too 
>>>>>>>>> aggressive, e.g. the gc finishes using ~27% of the time it needs to 
>>>>>>>>> finish: 27% 
>>>>>>>>> = (23979 - 23813) / (24423 - 23813)
>>>>>>>>>
>>>>>>>>> gc 347 @6564.164s 0%: 0.89+518+1.0 ms clock, 28+3839/4091/3959+33 
>>>>>>>>> ms cpu, 23813->23979->12265 MB, 24423 MB goal, 32 P
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ​
>>>>>>>>>
>>>>>>>>> ​ 
>>>>>>>>>  
>>>>>>>>>
>>>>>>>>>> On Wed, May 31, 2017 at 10:25 AM, Xun Liu <pas...@gmail.com 
>>>>>>>>>> <javascript:>> wrote:
>>>>>>>>>>
>>>>>>>>>>> $ go version
>>>>>>>>>>>
>>>>>>>>>>> go version go1.8 linux/amd64
>>>>>>>>>>>
>>>>>>>>>>> On Wednesday, May 31, 2017 at 7:13:38 AM UTC-7, Ian Lance Taylor 
>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> [ +rlh, austin ]
>>>>>>>>>>>>
>>>>>>>>>>>> Which version of Go are you running?
>>>>>>>>>>>>
>>>>>>>>>>>> Ian
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, May 30, 2017 at 10:01 PM, Xun Liu <pas...@gmail.com> 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi, we see a clear correlation between GC and latency spike in 
>>>>>>>>>>>>> our Go server. The server uses fairly large amount of memory 
>>>>>>>>>>>>> (20G) and does 
>>>>>>>>>>>>> mostly CPU work. The server runs on a beefy box with 32 cores and 
>>>>>>>>>>>>> the load 
>>>>>>>>>>>>> is pretty light (average CPU 20-30%).  GC kicks in once every 
>>>>>>>>>>>>> 10-20 seconds 
>>>>>>>>>>>>> and whenever it runs we observe pretty big latency spike ranging 
>>>>>>>>>>>>> from 30% 
>>>>>>>>>>>>> to 100% across p50-p90 percentiles (e.g. p90 can jump from 
>>>>>>>>>>>>> 100-120ms to 
>>>>>>>>>>>>> 160-250ms) 
>>>>>>>>>>>>>
>>>>>>>>>>>>> I captured a trace of a gc and noticed the following:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1. user gorountines seem run longer during gc. This is through 
>>>>>>>>>>>>> ad-hoc check. I don't really know how to get stats to confirm 
>>>>>>>>>>>>> this.
>>>>>>>>>>>>> The gc log is as following (tiny pauses but is very aggressive 
>>>>>>>>>>>>> in assist and idle time)
>>>>>>>>>>>>> gc 347 @6564.164s 0%: 0.89+518+1.0 ms clock, 
>>>>>>>>>>>>> 28+3839/4091/3959+33 ms cpu, 23813->23979->12265 MB, 24423 MB 
>>>>>>>>>>>>> goal, 32 P
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2. during gc, goroutines can queue up. In this particular case 
>>>>>>>>>>>>> there is a stretch of time (~20ms) where we see many goroutines 
>>>>>>>>>>>>> are 
>>>>>>>>>>>>> GCWaiting. See below -- the second row is goroutines with light 
>>>>>>>>>>>>> grey 
>>>>>>>>>>>>> indicating GCWaiting count and light green Runnable.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> <https://lh3.googleusercontent.com/-5NWj8HzrzyI/WS5Hg0c6WiI/AAAAAAAAqL8/_P-S9vXuGU0FVZV1tG_dHX-14tDHhTsnwCLcB/s1600/Screen%2BShot%2B2017-05-30%2Bat%2B9.32.27%2BPM.png>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> <https://lh3.googleusercontent.com/-EdVRp4FjhPw/WS5MjjiwdrI/AAAAAAAAqMM/vE5Srmu9e9A5MV0d50NpDiqFsFrkn-E-QCLcB/s1600/Screen%2BShot%2B2017-05-30%2Bat%2B9.54.06%2BPM.png>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Any idea what's going on here? What can I do to reduce the 
>>>>>>>>>>>>> spikes?
>>>>>>>>>>>>>
>>>>>>>>>>>>>  
>>>>>>>>>>>>>
>>>>>>>>>>>>> -- 
>>>>>>>>>>>>> You received this message because you are subscribed to the 
>>>>>>>>>>>>> Google Groups "golang-nuts" group.
>>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from 
>>>>>>>>>>>>> it, send an email to golang-nuts...@googlegroups.com.
>>>>>>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to