Hey everybody,

I am looking at a Go process where the total number of bytes managed by 
Go's heap in the long run is around 20-30 megs, but the RssAnon reported in 
/proc/self/status is extremely high (800+ megs). I am trying to figure out 
what is going on, and have a few questions about the proper way to debug 
such situations.

Background:
The process does a fair number of allocations & computations on startup, 
and after it finishes, settles into a quiet life of sporadic allocation and 
occasionally reacting to events. This means there is a pronounced spike of 
memory consumption at the beginning, followed by a long period (months) of 
low memory consumption. Unfortunately, what I am seeing is a failure of the 
garbage collector to release memory back to the OS, even after hours of 
waiting.
I am periodically logging Alloc & Sys from the runtime's memory statistics, 
along with RssAnon and RssFile from /proc/self/status, and RssAnon spikes 
up to 800+ megs and never goes down thereafter. I have plotted the 
evolution of these values over time here: 
https://twitter.com/halvarflake/status/1271538641878290432/photo/1

My theory is that we are somehow hitting a pessimal situation where we 
leave just enough allocations alive to ensure the garbage collector cannot 
release anything. To confirm this hunch, there are two things I'd like to 
do, but I am quite unclear how to achieve them:

1. If I could get a list of all live objects, I should be able to see how 
"scattered" they are through memory, and how this leads to a failure to 
release memory. I added some code to call debug.WriteHeapDump() at regular 
intervals, and also when our go-reported memory usage spikes. Once I had 
done so, I tried to find a library or tool to parse the heap dumps, and 
failed to find one - is there anything out there that I can use, or does 
that still need to be written? 

2. To analyze heap fragmentation and heap layout issues in C/C++, I have 
lightweight infrastructure that logs *all* allocations, sizes, and 
deallocations in a compact binary format into a buffer; when that buffer is 
full, the process forks and the child writes the data. I then have tooling 
to draw (large) diagrams from this where the x-axis is time and the y-axis 
address space, and free/live allocations are drawn as rectangles. An 
example of such a diagram is here: 
https://twitter.com/halvarflake/status/1075156510555168769/photo/1
I have found such diagrams diagrams to be immensely helpful when diagnosing 
allocation pathologies and interacting with complex heap layouts, and would 
love to gather similar data for my Go processes. The easiest for me would 
be if I had the ability to add a call to my (C-based) shared library into 
mallocgc; since that code eats almost no stack and is very much under my 
control, this should (at least theoretically) be doable without all the 
bells and whistles of an FFI. Is there a (not-production-safe, hackish 
etc.) way of doing that that is not quite as bad as patching a hook into 
the binary? Or is there even a way to get a callback from mallocgc and the 
free'ing functions to build such logging provided one does not perform any 
heap operations in the callback?

Does anyone have any advice?

Cheers,
Thomas

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/8be632ab-8120-4271-8471-b3cf127d870do%40googlegroups.com.

Reply via email to