Re: [go-nuts] 10x latency spikes during GC alloc assist phase

Bakul Shah Thu, 27 Jul 2017 09:40:20 -0700

You're storing millions of slices. A slice is implemented as (ptr, length, 
capacity). If your maps are mostly static, may be you can use one giant array 
for each value type and just store (offset, length) or (base, limit) in your 
maps and in effect do your own slicing? Now to access a slice, you'd have to 
replace e.g. myoldmap[key] with mynewmap.Get(key) that converts offset, length 
into a slice. If values do get added or deleted once in a while, you can do an 
occasional background GC by creating a new map indexing into a new array, 
copying active contents from the old array into new and then atomically switch 
the map. If values change, things can get a bit trickier but you can convert a 
"change" into add new slice+delete old slice.


> On Jul 24, 2017, at 4:45 PM, stba...@laserlike.com wrote:
> 
> Hi,
> We are experiencing a problem that I believe may be related to issue 14812 
> but I wanted to ask here before adding to that case or filing a new issue. Of 
> course, we’d also greatly appreciate any advice about how to make our program 
> performant.
> 
> Here is what we observe: at Laserlike one of our core user-facing services 
> (the “leaf”) typically responds to a particular rpc in <400ms.  During GC we 
> see spikes in latency to >5s on some simple requests.  The stop the world 
> periods are short, so the GC spikes appear to happen at other times.
> 
> We have been unable to replicate this in a simple program, but we did run our 
> code in a test mode that repros it.  In our test environment the server loads 
> ~10 GB of persistent data (never deleted so doesn’t really need to be GCed), 
> and we ask for 8 processors.  We are running go version 1.8.3 on kubernetes 
> on GCP machine of type n1-highmem-64.  To create the problem we send the 
> server a single request with >500 tasks..
> 
> 
> This google drive folder has leaf-logs-full.redacted.txt as well as other 
> performance tooling files.  A snippet from that log here shows normal query 
> responses and timing:
> 
> I0719 22:50:22.467367 leaf.go:363] Worker #5 done search for '[redacted]', 
> took 0.013 seconds
> I0719 22:50:22.467406 leaf.go:225] Worker #5 starting search for '[redacted]'
> I0719 22:50:22.467486 leaf.go:363] Worker #6 done search for '[redacted]', 
> took 0.001 seconds
> I0719 22:50:22.467520 leaf.go:225] Worker #6 starting search for '[redacted]'
> I0719 22:50:22.468050 leaf.go:363] Worker #9 done search for '[redacted]', 
> took 0.005 seconds
> 
> 
> We have observed that if a GC cycle happens to start while serving traffic 
> (which is often) AND there is a large amount of time spent in assist, then 
> our serving latency skyrockets by 10x.  In the log the slowdown commences 
> roughly when pacer assist starts at I0719 22:50:31.079283 and then reverts to 
> normal latencies shortly after the gc cycle completes at I0719 
> 22:50:36.806085.
> 
> Below I copy parts of the log where we see latencies of up to 729ms on tasks. 
>  I also bold the line that shows 32929ms spent on alloc gc assist.
> 
> We have captured an attached cpu profile during this time which seems to 
> confirm a large amount of time spent in runtime.gcAssistAlloc.func1.
> 
> 
> Pardon our ignorance about GC in golang, but our hypothesis about what may be 
> going wrong is that our large in-memory data structures are causing gc to 
> often go into assist mode, and that for reasons we don’t understand malloc 
> becomes expensive in that mode.  Since we also create ~100k new data objects 
> when processing user requests, we are guessing those allocs become very slow. 
>  Another piece of evidence for this hypothesis is that we have another 
> (prototype) implementation of this kind of service that makes more use of 
> object pools and doesn’t seem to have as much of slowdown during GC.
> 
> Note on large in-memory data-structures:
> The principal data structures can be thought of as:
> Map[uint64][]byte (about 10M map entries, the slice lengths between 5K to 
> 50K) (around ~10G total memory usage) 
> Map[uint64][]uint64 (about 50M map entries, the slice lengths vary between 10 
> and 100K, in a zipfian distribution, about 3G total memory usage)
> These data structures mostly stay as is over the life of the program.
> 
> We are trying to figure out how to solve this so would appreciate any advice. 
> An engineer on our team wrote up the following ideas, none of which are that 
> great:
> Figure out a simple way to prevent our goroutines slowing down during GC.  I 
> had some hopes LockOSThread() could be made to work, but it didn't seem to 
> help in my experiments.  I'm not ruling this solution out entirely, but if 
> it's the write barriers that are the main problem, I don't have much hope.
> Run at least 2 replicas of all our servers.  Manage their GC cycles 
> ourselves, synchronized so that at most one replica is in GC at any given 
> time.  The clients should either send all requests to both replicas (and 
> cancel when one replies), or use some more complicated Kubernetes and client 
> logic so a GCing replica is never sent requests.  This is the simplest 
> solution that is likely to work, and doesn't require us to change our habits 
> too much.  It just costs more. :)
> Refactor our current servers and program future servers with the explicit 
> goal of reducing GC burden.  Give our servers more memory and increase 
> SetGCPercent() so that Garbage Collection happens less frequently.  Use 
> simpler data structures with fewer pointers to reduce the length of time the 
> GC cycle lasts when it does run.  This isn't a perfect solution, because the 
> GC will still hit us on occasion, but it reduces the harm.
> Throw out much of the convenience of Go and write our own allocators/shared 
> pointers that index into a flat byte buffer, making no use of the GC.  
> (Either do this for particular objects, or just write a general solution.)  
> Make GC cycles so infrequent that we can either turn GC off completely 
> without going OOM, or at least run it only once per day, during off hours.  
> This seems very hard, and we'll be fighting with Go a lot.
> Implement the core serving data structures of our service in C++, paired off 
> with another server written in Go that manages populating that server’s data 
> -- the Go server would manage the data in the C++ server via rpc, and query 
> it over rpc. 
> The nuclear option: decide that Go is simply not a suitable language for our 
> serving stack.  Rewrite major servers in C++.
> 
> 
> Right now #5 is our (terrible but only feasible-seeming) choice.
> 
> Appreciate any advice!
> Steve Baker
> 
> 
> 
> 
> Logs with gc log lines:
> 
> 
> pacer: assist ratio=+3.155887e+001 (scan 2669 MB in 10072->10157 MB) 
> workers=2+0
> I0719 22:50:31.097707 leaf.go:363] Worker #3 done search for '[redacted]', 
> took 0.017 seconds
> I0719 22:50:31.098036 leaf.go:225] Worker #3 starting search for '[redacted]'
> I0719 22:50:31.133959 leaf.go:363] Worker #6 done search for '[redacted]', 
> took 0.069 seconds
> I0719 22:50:31.134300 leaf.go:225] Worker #6 starting search for '[redacted]'
> I0719 22:50:31.142191 leaf.go:363] Worker #9 done search for '[redacted]', 
> took 0.066 seconds
> I0719 22:50:31.142275 leaf.go:225] Worker #9 starting search for '[redacted]'
> I0719 22:50:31.173921 leaf.go:363] Worker #10 done search for '[redacted]', 
> took 0.098 seconds
> I0719 22:50:31.174009 leaf.go:225] Worker #10 starting search for '[redacted]'
> I0719 22:50:31.328796 leaf.go:363] Worker #15 done search for '[redacted]', 
> took 0.260 seconds
> I0719 22:50:31.330194 leaf.go:225] Worker #15 starting search for '[redacted]'
> I0719 22:50:31.395579 leaf.go:363] Worker #11 done search for '[redacted]', 
> took 0.374 seconds
> 
> 
> <SNIP>
> 
> I0719 22:50:36.317004 leaf.go:363] Worker #5 done search for '[redacted]', 
> took 0.729 seconds
> I0719 22:50:36.317190 leaf.go:225] Worker #5 starting search for '[redacted]'
> I0719 22:50:36.370191 leaf.go:363] Worker #15 done search for '[redacted]', 
> took 0.192 seconds
> I0719 22:50:36.371446 leaf.go:225] Worker #15 starting search for '[redacted]'
> I0719 22:50:36.421953 leaf.go:363] Worker #7 done search for '[redacted]', 
> took 0.116 seconds
> I0719 22:50:36.422092 leaf.go:225] Worker #7 starting search for '[redacted]'
> I0719 22:50:36.570008 leaf.go:363] Worker #12 done search for '[redacted]', 
> took 0.778 seconds
> I0719 22:50:36.572970 leaf.go:225] Worker #12 starting search for '[redacted]'
> I0719 22:50:36.573571 leaf.go:363] Worker #6 done search for '[redacted]', 
> took 0.710 seconds
> I0719 22:50:36.573721 leaf.go:225] Worker #6 starting search for '[redacted]'
> pacer: H_m_prev=8875421544 h_t=+1.900000e-001 H_T=10561751637 
> h_a=+1.983366e-001 H_a=10635742360 h_g=+2.000000e-001 H_g=10650505852 
> u_a=+9.735779e-001 u_g=+2.500000e-001 W_a=2354135120 goalΔ=+1.000000e-002 
> actualΔ=+8.336587e-003 u_a/u_g=+3.894312e+000
> gc 1072 @4337.737s 39%: 0.37+5688+0.32 ms clock, 2.2+32929/11310/21+1.9 ms 
> cpu, 10072->10143->8661 MB, 10157 MB goal, 8 P
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to golang-nuts+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [go-nuts] 10x latency spikes during GC alloc assist phase

Reply via email to