Usually if it is hung doing an allocation, that means that the STW cannot
complete, which means some go routine is in a tight loop no yielding to the
scheduler.
The only other possibility (I would think) is if you placed a cap on the memory
size of the process, and it is trying to allocate but
We do have a convenient sampler that tells us the state of all of the
goroutines, goprocs, and threads: schedtrace+scheddetail. The traces in my
first post were one example, but I think I can find/repro a simpler case where
we are stuck harder with nothing changing until the SIGTERM comes in.
I think the allocation triggers the GC, but it is stuck in start because it is
waiting for all other Go routines to come to the safe point. I would grab a
trace (but I’m not sure you can capture the trace if things are “stuck”) and
review that. Or maybe use an external sampler like ‘Instruments’
Thanks for the various possibilities. It’s helpful to know things to look out
for.
I don’t think there’s a tight polling loop in the code— at least I haven’t
found it yet.
We’re using some open source packages, notably Kanister
(https://github.com/kanisterio) which Kasten contributes, Stow
(
You might want to review this https://github.com/golang/go/issues/10958
> On Dec 11, 2018, at 10:16 PM, robert engels wrote:
>
> Reviewing the code, the 5s of cpu time is technically the stop-the-world
> (STW) sweep termination
>
> So, I think the cause of your problem is that you have a tight
Reviewing the code, the 5s of cpu time is technically the stop-the-world (STW)
sweep termination
So, I think the cause of your problem is that you have a tight / “infinite”
loop that is not calling runtime.Gosched(), so it is taking a very long time
for the sweep termination to complete.
> On
Btw, I only am guessing large stacks because the memory in use is not that
great or expanding, and the CPU time is massive - meaning walking lots of stack
or root objects.
> On Dec 11, 2018, at 9:43 PM, robert engels wrote:
>
> Well, your pause is clearly related to the GC - the first phase, t
Well, your pause is clearly related to the GC - the first phase, the mark, is
5s in #17. Are you certain you don’t have an incorrect highly recursive loop
that is causing the stack marking to take a really long time… ?
> On Dec 11, 2018, at 8:45 PM, Eric Hamilton wrote:
>
> Of course. (I for
Of course. (I forgot about that option and I'd collected those traces at a
time when I thought I'd ruled out GC-- probably misread MemStats values).
Here's the gctrace output for a repro with a 5.6 second delay ending with
finish of gc17 and a 30+ second delay ending in SIGTERM (which coincides
I think it would be more helpful if you used gctrace=1 to report on the GC
activity.
> On Dec 11, 2018, at 3:37 PM, e...@kasten.io wrote:
>
> I am observing pause failures in an application and it appears to me that I’m
> seeing it take from 5 to 28 seconds or more for the StopTheWorld to take
I am observing pause failures in an application and it appears to me that
I’m seeing it take from 5 to 28 seconds or more for the StopTheWorld to
take effect for GC.
In all likelihood the problem lies in the application, but before I start
changing it or attempting to tune GC, I’d like to under
11 matches
Mail list logo