I actually just managed to trace down the root cause of the bug, and it's
quite surprising. It's not a heap overflow, it's a stack overflow, due to a
bug in the stdlib! Specifically, adding the same http.ClientTrace twice
onto a request context causes a stack overflow.
https://github.com/mightygua
On Mon, Jul 1, 2019 at 12:42 PM 'Yunchi Luo' via golang-nuts <
golang-nuts@googlegroups.com> wrote:
> Hello, I'd like to solicit some help with a weird GC issue we are seeing.
>
> I'm trying to debug OOM on a service we are running in k8s. The service is
> just a CRUD server hitting a database (Dy
Yeah, I've been looking at the goroutine profiles and there are some
strange stacks like the below.
1 reflect.flag.mustBeExported /go/src/reflect/value.go:213
reflect.Value.call /go/src/reflect/value.go:424
reflect.Value.Call /go/src/reflect/value.go:308
ok, this is interesting:
reflect.MakeFunc: i've never done this before. what are the allocation
patterns for creating functions with reflect? i see a few crashes
related to these functions but no mentioning of severe memory
consumption.
in my opinion, trying to capture MakeFunc patterns from your
Switching Go version seems like a stab in the dark. If the OOM symptom does
show up, you have simply wasted time. If it doesn't show up, you still don't
know if the bug exists and is simply hiding. Even if you think the bug in Go
code generation (or GC) and not in your code, there is nothing the
I removed the httptrace call yesterday and there have been no OOMs yet.
Going to let it bake for another day. If OOMs show up again, I'll try
reverting to an older Go version tomorrow. Otherwise I'll point my finger
at httptrace I guess.
On Tue, Jul 2, 2019 at 2:15 PM Yunchi Luo wrote:
> I did t
I did try to do that! I have 3 heap profiles captured from the ~3 seconds
before crash. The only thing particularly suspicious is the httptrace call
I mentioned earlier in the thread.
Diffing 1 to 2
(pprof) cum
(pprof) top 50
Showing nodes accounting for 4604.15kB, 81.69% of 5636.17kB total
Did you try running on an older release of Go, like 1.10?
> On Jul 2, 2019, at 11:53 AM, 'Yunchi Luo' via golang-nuts
> wrote:
>
> I'm not so much pointing my finger at GC as I am hoping GC logs could help
> tell the story, and that someone with a strong understanding of GC in Go
> could weig
What I have found useful in the past is pprof's ability to diff profiles.
That means that if you capture heap profiles at regular intervals you can
see a much smaller subset of changes and compare allocation patterns.
On Tue, Jul 2, 2019, 10:53 AM 'Yunchi Luo' via golang-nuts <
golang-nuts@googleg
I'm not so much pointing my finger at GC as I am hoping GC logs could help
tell the story, and that someone with a strong understanding of GC in Go
could weigh in here. In the last 4 seconds before OOM, "TotalAlloc"
increased by only 80M, yet "HeapIdle" increased to 240M from 50M, RSS
increased by
' depending on kernel version, that kernel memory used goes
> > > > > against the process for OOM purposes, so this is a likely candidate
> > > > > if pprof is showing nothing.
> > > > >
> > > > > Do you by chance do any o
; > > > server).
> > > > > > >
> > > > > > > I 'think' depending on kernel version, that kernel memory used
> > > > > > > goes against the process for OOM purposes, so this is a likely
> > > > > > > candi
and 'statm' - if my theory is
> > > > > > > > > correct you will see growth here long before the process is
> > > > > > > > > killed. Since you are running under k8s and cgroups, you will
> > > > > > > > > n
Before assuming it is the GC or something system related, you may wish to
verify it is *not your own logic*. Larger RSS could also be due to your own
logic touching more and more memory due to some runaway effect. The probability
this has to do with GC is very low given the very widespread use o
this along side the Go process (unless you
> have root access to the server).
>
> I 'think' depending on kernel version, that kernel memory used goes
> against the process for OOM purposes, so this is a likely candidate if
> pprof is showing nothing.
>
> Do you by chance
reng...@ix.netcom.com>> wrote:
>>>>>> I think don't think you are going to find it in the 'heap', rather it
>>>>>> would be in native memory.
>>>>>>
>>>>>> I would use the monitor the /proc/[pid] for the
the server).
>>>>
>>>> I 'think' depending on kernel version, that kernel memory used goes
>>>> against the process for OOM purposes, so this is a likely candidate if
>>>> pprof is showing nothing.
>>>>
>>>> Do
inst
>> the process for OOM purposes, so this is a likely candidate if pprof is
>> showing nothing.
>>
>> Do you by chance do any of your own memory management (via malloc/CGO)? If
>> so, this is not going to show in pprof either.
>> -Original Message--
n, that kernel memory used goes
> against the process for OOM purposes, so this is a likely candidate if
> pprof is showing nothing.
>
> Do you by chance do any of your own memory management (via malloc/CGO)? If
> so, this is not going to show in pprof either.
>
> -Original
e do any of your own memory management (via malloc/CGO)?
>> If so, this is not going to show in pprof either.
>>
>> -Original Message-
>> From: 'Yunchi Luo' via golang-nuts
>> Sent: Jul 1, 2019 4:26 PM
>> To: Robert Engels
>> Cc: golang-nu
our own memory management (via malloc/CGO)? If
> so, this is not going to show in pprof either.
>
> -Original Message-
> From: 'Yunchi Luo' via golang-nuts
> Sent: Jul 1, 2019 4:26 PM
> To: Robert Engels
> Cc: golang-nuts@googlegroups.com, Alec Thomas
> Sub
ng-nuts@googlegroups.com, Alec Thomas
Subject: Re: [go-nuts] OOM occurring with a small heap
I actually have a heap profile (pasted at the bottom) from about 1 second before the service died (the goroutine that is logging "[Memory]" triggers heap profiles once RSS > 100MB). I don't s
I actually have a heap profile (pasted at the bottom) from about 1 second
before the service died (the goroutine that is logging "[Memory]" triggers
heap profiles once RSS > 100MB). I don't see TCP connections there. Maybe
it's too few to be sampled. How would I verify your theory? That the
service
A leak of the TCP connections (maybe not properly closed)? Each TCP connection will use kernel memory and process memory (local buffers), that won't be on the heap (the reference to the TCP connection will be in the Go heap, but is probably much smaller than the buffer allocation).That would be my
24 matches
Mail list logo