[go-nuts] 10x latency spikes during GC alloc assist phase

stbaker Mon, 24 Jul 2017 17:44:16 -0700

Hi,


We are experiencing a problem that I believe may be related to issue 14812 
<https://github.com/golang/go/issues/14812> but I wanted to ask here before 
adding to that case or filing a new issue. Of course, we’d also greatly 
appreciate any advice about how to make our program performant.

Here is what we observe: at Laserlike one of our core user-facing services 
(the “leaf”) typically responds to a particular rpc in <400ms.  During GC 
we see spikes in latency to >5s on some simple requests.  The stop the 
world periods are short, so the GC spikes appear to happen at other times.

We have been unable to replicate this in a simple program, but we did run 
our code in a test mode that repros it.  In our test environment the server 
loads ~10 GB of persistent data (never deleted so doesn’t really need to be 
GCed), and we ask for 8 processors.  We are running go version 1.8.3 on 
kubernetes on GCP machine of type n1-highmem-64.  To create the problem we 
send the server a single request with >500 tasks..


This google drive folder has leaf-logs-full.redacted.txt as well as other 
performance tooling files 
<https://drive.google.com/drive/folders/0BypqYg6r4ebqb0lQNnM5eWtISms?usp=sharing>.
 
 A snippet from that log here shows normal query responses and timing:

I0719 22:50:22.467367 leaf.go:363] Worker #5 done search for '[redacted]', 
took 0.013 seconds

I0719 22:50:22.467406 leaf.go:225] Worker #5 starting search for 
'[redacted]'

I0719 22:50:22.467486 leaf.go:363] Worker #6 done search for '[redacted]', 
took 0.001 seconds

I0719 22:50:22.467520 leaf.go:225] Worker #6 starting search for 
'[redacted]'

I0719 22:50:22.468050 leaf.go:363] Worker #9 done search for '[redacted]', 
took 0.005 seconds


We have observed that if a GC cycle happens to start while serving traffic 
(which is often) AND there is a large amount of time spent in assist, then 
our serving latency skyrockets by 10x.  In the log the slowdown commences 
roughly when pacer assist starts at I0719 22:50:31.079283 and then reverts 
to normal latencies shortly after the gc cycle completes at I0719 
22:50:36.806085.

Below I copy parts of the log where we see latencies of up to 729ms on 
tasks.  I also bold the line that shows 32929ms spent on alloc gc assist.

We have captured an attached cpu profile during this time which seems to 
confirm a large amount of time spent in runtime.gcAssistAlloc.func1.


Pardon our ignorance about GC in golang, but our hypothesis about what may 
be going wrong is that our large in-memory data structures are causing gc 
to often go into assist mode, and that for reasons we don’t understand 
malloc becomes expensive in that mode.  Since we also create ~100k new data 
objects when processing user requests, we are guessing those allocs become 
very slow.  Another piece of evidence for this hypothesis is that we have 
another (prototype) implementation of this kind of service that makes more 
use of object pools and doesn’t seem to have as much of slowdown during GC.

Note on large in-memory data-structures:

The principal data structures can be thought of as:

Map[uint64][]byte (about 10M map entries, the slice lengths between 5K to 
50K) (around ~10G total memory usage) 

Map[uint64][]uint64 (about 50M map entries, the slice lengths vary between 
10 and 100K, in a zipfian distribution, about 3G total memory usage)

These data structures mostly stay as is over the life of the program.

We are trying to figure out how to solve this so would appreciate any 
advice. An engineer on our team wrote up the following ideas, none of which 
are that great:

   1. 
   
   Figure out a simple way to prevent our goroutines slowing down during 
   GC.  I had some hopes LockOSThread() could be made to work, but it didn't 
   seem to help in my experiments.  I'm not ruling this solution out entirely, 
   but if it's the write barriers that are the main problem, I don't have much 
   hope.
   2. 
   
   Run at least 2 replicas of all our servers.  Manage their GC cycles 
   ourselves, synchronized so that at most one replica is in GC at any given 
   time.  The clients should either send all requests to both replicas (and 
   cancel when one replies), or use some more complicated Kubernetes and 
   client logic so a GCing replica is never sent requests.  This is the 
   simplest solution that is likely to work, and doesn't require us to change 
   our habits too much.  It just costs more. :)
   3. 
   
   Refactor our current servers and program future servers with the 
   explicit goal of reducing GC burden.  Give our servers more memory and 
   increase SetGCPercent() so that Garbage Collection happens less frequently. 
    Use simpler data structures with fewer pointers to reduce the length of 
   time the GC cycle lasts when it does run.  This isn't a perfect solution, 
   because the GC will still hit us on occasion, but it reduces the harm.
   4. 
   
   Throw out much of the convenience of Go and write our own 
   allocators/shared pointers that index into a flat byte buffer, making no 
   use of the GC.  (Either do this for particular objects, or just write a 
   general solution.)  Make GC cycles so infrequent that we can either turn GC 
   off completely without going OOM, or at least run it only once per day, 
   during off hours.  This seems very hard, and we'll be fighting with Go a 
   lot.
   5. 
   
   Implement the core serving data structures of our service in C++, paired 
   off with another server written in Go that manages populating that server’s 
   data -- the Go server would manage the data in the C++ server via rpc, and 
   query it over rpc. 
   6. 
   
   The nuclear option: decide that Go is simply not a suitable language for 
   our serving stack.  Rewrite major servers in C++.
   


Right now #5 is our (terrible but only feasible-seeming) choice.

Appreciate any advice!

Steve Baker




Logs with gc log lines:


pacer: assist ratio=+3.155887e+001 (scan 2669 MB in 10072->10157 MB) 
workers=2+0

I0719 22:50:31.097707 leaf.go:363] Worker #3 done search for '[redacted]', 
took 0.017 seconds

I0719 22:50:31.098036 leaf.go:225] Worker #3 starting search for 
'[redacted]'

I0719 22:50:31.133959 leaf.go:363] Worker #6 done search for '[redacted]', 
took 0.069 seconds

I0719 22:50:31.134300 leaf.go:225] Worker #6 starting search for 
'[redacted]'

I0719 22:50:31.142191 leaf.go:363] Worker #9 done search for '[redacted]', 
took 0.066 seconds

I0719 22:50:31.142275 leaf.go:225] Worker #9 starting search for 
'[redacted]'

I0719 22:50:31.173921 leaf.go:363] Worker #10 done search for '[redacted]', 
took 0.098 seconds

I0719 22:50:31.174009 leaf.go:225] Worker #10 starting search for 
'[redacted]'

I0719 22:50:31.328796 leaf.go:363] Worker #15 done search for '[redacted]', 
took 0.260 seconds

I0719 22:50:31.330194 leaf.go:225] Worker #15 starting search for 
'[redacted]'

I0719 22:50:31.395579 leaf.go:363] Worker #11 done search for '[redacted]', 
took 0.374 seconds


<SNIP>

I0719 22:50:36.317004 leaf.go:363] Worker #5 done search for '[redacted]', 
took 0.729 seconds

I0719 22:50:36.317190 leaf.go:225] Worker #5 starting search for 
'[redacted]'

I0719 22:50:36.370191 leaf.go:363] Worker #15 done search for '[redacted]', 
took 0.192 seconds

I0719 22:50:36.371446 leaf.go:225] Worker #15 starting search for 
'[redacted]'

I0719 22:50:36.421953 leaf.go:363] Worker #7 done search for '[redacted]', 
took 0.116 seconds

I0719 22:50:36.422092 leaf.go:225] Worker #7 starting search for 
'[redacted]'

I0719 22:50:36.570008 leaf.go:363] Worker #12 done search for '[redacted]', 
took 0.778 seconds

I0719 22:50:36.572970 leaf.go:225] Worker #12 starting search for 
'[redacted]'

I0719 22:50:36.573571 leaf.go:363] Worker #6 done search for '[redacted]', 
took 0.710 seconds

I0719 22:50:36.573721 leaf.go:225] Worker #6 starting search for 
'[redacted]'

pacer: H_m_prev=8875421544 h_t=+1.900000e-001 H_T=10561751637 
h_a=+1.983366e-001 H_a=10635742360 h_g=+2.000000e-001 H_g=10650505852 
u_a=+9.735779e-001 u_g=+2.500000e-001 W_a=2354135120 goalΔ=+1.000000e-002 
actualΔ=+8.336587e-003 u_a/u_g=+3.894312e+000

gc 1072 @4337.737s 39%: 0.37+5688+0.32 ms clock, 2.2+32929/11310/21+1.9 ms 
cpu, 10072->10143->8661 MB, 10157 MB goal, 8 P


-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[go-nuts] 10x latency spikes during GC alloc assist phase

Reply via email to