Re: [go-nuts] 10x latency spikes during GC alloc assist phase

Bakul Shah Thu, 27 Jul 2017 11:13:26 -0700

It would still need to examine all those 50 million slices even if they are all 
pointing to the same array.


> On Jul 27, 2017, at 10:20 AM, Joe Taber <infogu...@gmail.com> wrote:
> 
> Bakul,
> 
> You wouldn't even need to change the slicing semantics like that.
> 
> Just concatenate all slices in the map into one master slice, then take 
> capacity-slices[1] of it with the right offsets to rebuild the whole map. 
> This wouldn't change the semantics of accessing or changing the map at all: 
> trimming a slice in one map entry would just skip some elements in the master 
> slice, and appending elements would use up that unused space until it ran 
> into the next slice boundary at which point it would create a new allocation 
> for that map entry. Then like you said do a custom "gc" step to rebuild that 
> map into a single big allocation every once in a while.
> 
> Again, this could only work if the map and map entries grow infrequently, 
> which sounds plausible based on Steve's description of their use.
> 
> [1]: 
> https://docs.google.com/document/d/1GKKdiGYAghXRxC2BFrSEbHBZZgAGKQ-yXK-hRKBo0Kk/pub
> 
> Joe
> 
>> On Thursday, July 27, 2017 at 11:39:41 AM UTC-5, Bakul Shah wrote:
>> You're storing millions of slices. A slice is implemented as (ptr, length, 
>> capacity). If your maps are mostly static, may be you can use one giant 
>> array for each value type and just store (offset, length) or (base, limit) 
>> in your maps and in effect do your own slicing? Now to access a slice, you'd 
>> have to replace e.g. myoldmap[key] with mynewmap.Get(key) that converts 
>> offset, length into a slice. If values do get added or deleted once in a 
>> while, you can do an occasional background GC by creating a new map indexing 
>> into a new array, copying active contents from the old array into new and 
>> then atomically switch the map. If values change, things can get a bit 
>> trickier but you can convert a "change" into add new slice+delete old slice.
>> 
>>> On Jul 24, 2017, at 4:45 PM, stb...@laserlike.com wrote:
>>> 
>>> Hi,
>>> We are experiencing a problem that I believe may be related to issue 14812 
>>> but I wanted to ask here before adding to that case or filing a new issue. 
>>> Of course, we’d also greatly appreciate any advice about how to make our 
>>> program performant.
>>> 
>>> Here is what we observe: at Laserlike one of our core user-facing services 
>>> (the “leaf”) typically responds to a particular rpc in <400ms.  During GC 
>>> we see spikes in latency to >5s on some simple requests.  The stop the 
>>> world periods are short, so the GC spikes appear to happen at other times.
>>> 
>>> We have been unable to replicate this in a simple program, but we did run 
>>> our code in a test mode that repros it.  In our test environment the server 
>>> loads ~10 GB of persistent data (never deleted so doesn’t really need to be 
>>> GCed), and we ask for 8 processors.  We are running go version 1.8.3 on 
>>> kubernetes on GCP machine of type n1-highmem-64.  To create the problem we 
>>> send the server a single request with >500 tasks..
>>> 
>>> 
>>> This google drive folder has leaf-logs-full.redacted.txt as well as other 
>>> performance tooling files.  A snippet from that log here shows normal query 
>>> responses and timing:
>>> 
>>> I0719 22:50:22.467367 leaf.go:363] Worker #5 done search for '[redacted]', 
>>> took 0.013 seconds
>>> I0719 22:50:22.467406 leaf.go:225] Worker #5 starting search for 
>>> '[redacted]'
>>> I0719 22:50:22.467486 leaf.go:363] Worker #6 done search for '[redacted]', 
>>> took 0.001 seconds
>>> I0719 22:50:22.467520 leaf.go:225] Worker #6 starting search for 
>>> '[redacted]'
>>> I0719 22:50:22.468050 leaf.go:363] Worker #9 done search for '[redacted]', 
>>> took 0.005 seconds
>>> 
>>> 
>>> We have observed that if a GC cycle happens to start while serving traffic 
>>> (which is often) AND there is a large amount of time spent in assist, then 
>>> our serving latency skyrockets by 10x.  In the log the slowdown commences 
>>> roughly when pacer assist starts at I0719 22:50:31.079283 and then reverts 
>>> to normal latencies shortly after the gc cycle completes at I0719 
>>> 22:50:36.806085.
>>> 
>>> Below I copy parts of the log where we see latencies of up to 729ms on 
>>> tasks.  I also bold the line that shows 32929ms spent on alloc gc assist.
>>> 
>>> We have captured an attached cpu profile during this time which seems to 
>>> confirm a large amount of time spent in runtime.gcAssistAlloc.func1.
>>> 
>>> 
>>> Pardon our ignorance about GC in golang, but our hypothesis about what may 
>>> be going wrong is that our large in-memory data structures are causing gc 
>>> to often go into assist mode, and that for reasons we don’t understand 
>>> malloc becomes expensive in that mode.  Since we also create ~100k new data 
>>> objects when processing user requests, we are guessing those allocs become 
>>> very slow.  Another piece of evidence for this hypothesis is that we have 
>>> another (prototype) implementation of this kind of service that makes more 
>>> use of object pools and doesn’t seem to have as much of slowdown during GC.
>>> 
>>> Note on large in-memory data-structures:
>>> The principal data structures can be thought of as:
>>> Map[uint64][]byte (about 10M map entries, the slice lengths between 5K to 
>>> 50K) (around ~10G total memory usage) 
>>> Map[uint64][]uint64 (about 50M map entries, the slice lengths vary between 
>>> 10 and 100K, in a zipfian distribution, about 3G total memory usage)
>>> These data structures mostly stay as is over the life of the program.
>>> 
>>> We are trying to figure out how to solve this so would appreciate any 
>>> advice. An engineer on our team wrote up the following ideas, none of which 
>>> are that great:
>>> Figure out a simple way to prevent our goroutines slowing down during GC.  
>>> I had some hopes LockOSThread() could be made to work, but it didn't seem 
>>> to help in my experiments.  I'm not ruling this solution out entirely, but 
>>> if it's the write barriers that are the main problem, I don't have much 
>>> hope.
>>> Run at least 2 replicas of all our servers.  Manage their GC cycles 
>>> ourselves, synchronized so that at most one replica is in GC at any given 
>>> time.  The clients should either send all requests to both replicas (and 
>>> cancel when one replies), or use some more complicated Kubernetes and 
>>> client logic so a GCing replica is never sent requests.  This is the 
>>> simplest solution that is likely to work, and doesn't require us to change 
>>> our habits too much.  It just costs more. :)
>>> Refactor our current servers and program future servers with the explicit 
>>> goal of reducing GC burden.  Give our servers more memory and increase 
>>> SetGCPercent() so that Garbage Collection happens less frequently.  Use 
>>> simpler data structures with fewer pointers to reduce the length of time 
>>> the GC cycle lasts when it does run.  This isn't a perfect solution, 
>>> because the GC will still hit us on occasion, but it reduces the harm.
>>> Throw out much of the convenience of Go and write our own allocators/shared 
>>> pointers that index into a flat byte buffer, making no use of the GC.  
>>> (Either do this for particular objects, or just write a general solution.)  
>>> Make GC cycles so infrequent that we can either turn GC off completely 
>>> without going OOM, or at least run it only once per day, during off hours.  
>>> This seems very hard, and we'll be fighting with Go a lot.
>>> Implement the core serving data structures of our service in C++, paired 
>>> off with another server written in Go that manages populating that server’s 
>>> data -- the Go server would manage the data in the C++ server via rpc, and 
>>> query it over rpc. 
>>> The nuclear option: decide that Go is simply not a suitable language for 
>>> our serving stack.  Rewrite major servers in C++.
>>> 
>>> 
>>> Right now #5 is our (terrible but only feasible-seeming) choice.
>>> 
>>> Appreciate any advice!
>>> Steve Baker
>>> 
>>> 
>>> 
>>> 
>>> Logs with gc log lines:
>>> 
>>> 
>>> pacer: assist ratio=+3.155887e+001 (scan 2669 MB in 10072->10157 MB) 
>>> workers=2+0
>>> I0719 22:50:31.097707 leaf.go:363] Worker #3 done search for '[redacted]', 
>>> took 0.017 seconds
>>> I0719 22:50:31.098036 leaf.go:225] Worker #3 starting search for 
>>> '[redacted]'
>>> I0719 22:50:31.133959 leaf.go:363] Worker #6 done search for '[redacted]', 
>>> took 0.069 seconds
>>> I0719 22:50:31.134300 leaf.go:225] Worker #6 starting search for 
>>> '[redacted]'
>>> I0719 22:50:31.142191 leaf.go:363] Worker #9 done search for '[redacted]', 
>>> took 0.066 seconds
>>> I0719 22:50:31.142275 leaf.go:225] Worker #9 starting search for 
>>> '[redacted]'
>>> I0719 22:50:31.173921 leaf.go:363] Worker #10 done search for '[redacted]', 
>>> took 0.098 seconds
>>> I0719 22:50:31.174009 leaf.go:225] Worker #10 starting search for 
>>> '[redacted]'
>>> I0719 22:50:31.328796 leaf.go:363] Worker #15 done search for '[redacted]', 
>>> took 0.260 seconds
>>> I0719 22:50:31.330194 leaf.go:225] Worker #15 starting search for 
>>> '[redacted]'
>>> I0719 22:50:31.395579 leaf.go:363] Worker #11 done search for '[redacted]', 
>>> took 0.374 seconds
>>> 
>>> 
>>> <SNIP>
>>> 
>>> I0719 22:50:36.317004 leaf.go:363] Worker #5 done search for '[redacted]', 
>>> took 0.729 seconds
>>> I0719 22:50:36.317190 leaf.go:225] Worker #5 starting search for 
>>> '[redacted]'
>>> I0719 22:50:36.370191 leaf.go:363] Worker #15 done search for '[redacted]', 
>>> took 0.192 seconds
>>> I0719 22:50:36.371446 leaf.go:225] Worker #15 starting search for 
>>> '[redacted]'
>>> I0719 22:50:36.421953 leaf.go:363] Worker #7 done search for '[redacted]', 
>>> took 0.116 seconds
>>> I0719 22:50:36.422092 leaf.go:225] Worker #7 starting search for 
>>> '[redacted]'
>>> I0719 22:50:36.570008 leaf.go:363] Worker #12 done search for '[redacted]', 
>>> took 0.778 seconds
>>> I0719 22:50:36.572970 leaf.go:225] Worker #12 starting search for 
>>> '[redacted]'
>>> I0719 22:50:36.573571 leaf.go:363] Worker #6 done search for '[redacted]', 
>>> took 0.710 seconds
>>> I0719 22:50:36.573721 leaf.go:225] Worker #6 starting search for 
>>> '[redacted]'
>>> pacer: H_m_prev=8875421544 h_t=+1.900000e-001 H_T=10561751637 
>>> h_a=+1.983366e-001 H_a=10635742360 h_g=+2.000000e-001 H_g=10650505852 
>>> u_a=+9.735779e-001 u_g=+2.500000e-001 W_a=2354135120 goalΔ=+1.000000e-002 
>>> actualΔ=+8.336587e-003 u_a/u_g=+3.894312e+000
>>> gc 1072 @4337.737s 39%: 0.37+5688+0.32 ms clock, 2.2+32929/11310/21+1.9 ms 
>>> cpu, 10072->10143->8661 MB, 10157 MB goal, 8 P
>>> 
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "golang-nuts" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to golang-nuts...@googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to golang-nuts+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [go-nuts] 10x latency spikes during GC alloc assist phase

Reply via email to