On Wed, Aug 8, 2018 at 11:29 AM, brian.armstrong via golang-nuts <golang-nuts@googlegroups.com> wrote: > Hi golang-nuts, > > I've got a service that is seeing some confusing GC behavior. The behavior > manifests as long request latencies when the GC runs. At first I thought > this would just be stop-the-world behavior from the GC, but as I dig into it > more, it's clear that the STW phase is actually quite short. It's actually > that the concurrent phase is completely using all cores at 100% utilization > while it runs. Because there's no CPU left to run the service itself, the > service is unavailable during this window, which seems to be a few hundred > ms. I'm not especially excited about the service being unavailable for this > length of time, to say the least. It's worth noting that > 100ms is the > worst case, maybe one out of every 10 GC runs, but all of them spike the CPU > pretty hard for at least 10ms. > > I've tried adjusting GOGC to a smaller number, but this just makes the cpu > spikes come more frequently. I've been poking around pprof and it looks like > the service has roughly 20M long-lived objects and ~80M garbage objects on > each GC run. If it matters, this program has a pretty large LRU cache > implemented with a map and a container/list. It's also got quite a few > mutexes, though I'm not sure these meaningfully interact with Go's GC in any > way. > > 1) Is this common behavior for Go's GC? > > 2) Is there possibly something about how the service allocates objects > that's causing undue work, say perhaps an unexpected behavior during the > mark phase? > > 3) Is it possible to convince Go to spread its concurrent phase out over > more time? From the gc trace, it appears to be doing about 2s of work (wall > clock), but it runs every 120s, so it feels like this work could be spread > out in a way that doesn't interfere with the service. > > I'm compiled with Go 1.10.3, 4 cores. Here's a sample trace from gctrace > > GC forced > pacer: assist ratio=+8.611156e-002 (scan 960 MB in 15690->21268 MB) > workers=1++0.000000e+000 > pacer: H_m_prev=11151043456 h_t=+9.500000e-001 H_T=21744534739 > h_a=+4.765627e-001 H_a=16465215016 h_g=+1.000000e+000 H_g=22302086912 > u_a=+9.186874e-001 u_g=+3.000000e-001 W_a=347113784 goalΔ=+5.000000e-002 > actualΔ=-4.734373e-001 u_a/u_g=+3.062291e+000 > gc 36 @1197.776s 1%: 0.59+2210+0.29 ms clock, 2.3+5915/2192/0+1.1 ms cpu, > 15690->15702->10671 MB, 21268 MB goal, 4 P > scvg7: inuse: 15834, idle: 107, sys: 15942, released: 0, consumed: 15942 > (MB) > pacer: sweep done at heap size 10851MB; allocated 179MB during sweep; swept > 2026864 pages at +1.906868e-004 pages/byte
Thanks for the report. I suggest that you read through https://golang.org/issue/14812 and add details about the case you are encountering. Thanks. Ian -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.