Any updates, I am also facing this issue in my case Golang server just
stuck and openFD gets spiked.
-Rahul
On Thursday, August 27, 2020 at 8:40:19 PM UTC+5:30 Siddhesh Divekar wrote:
> Ok, so your data collectors never complete. A simple change to make this
>> easier to diagnose is to not spin
>
> Ok, so your data collectors never complete. A simple change to make this
> easier to diagnose is to not spin up another collector controller at the 2
> min mark if the previous has not completed.
>
> I would determine if the stuck collector is BQ or Elastic and check the
> server side logs.
>
W
Could be a bug in the http/2 implementation. I would disable http/2 and see if
you encounter the problem.
> On Aug 27, 2020, at 6:02 AM, Robert Engels wrote:
>
>
> Ok, so your data collectors never complete. A simple change to make this
> easier to diagnose is to not spin up another collect
Ok, so your data collectors never complete. A simple change to make this easier
to diagnose is to not spin up another collector controller at the 2 min mark if
the previous has not completed.
I would determine if the stuck collector is BQ or Elastic and check the server
side logs.
> On Aug 2
That's right Kurtis, we don't wait or check if the prior go routine which
fired the big query has completed or not.
The rationale there was the data which comes out from big query is not much
and should not take more than 2 mins.
To be precise every 2 mins we will fire 6 independent Big queries &
On Wed, Aug 26, 2020 at 8:51 PM Siddhesh Divekar
wrote:
> Right, then it looks less likely that we are blocked on a mutex.
>
> Every 2 minutes we spin up a go routine which then in turn spins up a
> bunch of go routines to collect data from big query & elastic (data
> collector routines).
> The 2
Typo.
>From backtrace we have from sigabort we see only *2* such data collector go
> routine blocked and other 2 2 mins thread waiting on waitgroup.
On Wed, Aug 26, 2020 at 8:50 PM Siddhesh Divekar
wrote:
> Right, then it looks less likely that we are blocked on a mutex.
>
> Every 2 minutes w
Right, then it looks less likely that we are blocked on a mutex.
Every 2 minutes we spin up a go routine which then in turn spins up a bunch
of go routines to collect data from big query & elastic (data collector
routines).
The 2 minutes go routine then in turn waits on waitgroup for data collecto
If you look at the stack trace the futex is because it is trying to shutdown
the entire process - killing all of the M’s - probably because you sent the
sigabort
> On Aug 26, 2020, at 9:41 PM, Robert Engels wrote:
>
>
> The big query client may be “single threaded” meaning you may need a po
The big query client may be “single threaded” meaning you may need a pool of
connections. Not sure - haven’t used the BQ library from Go.
If that’s the case a single slow query will block all of your connections (if
they are all accessing BQ). Plus there are BQ rate limits - when you hit those
Robert,
That's where the confusion is.
>From the backtrace we see that two go routines are blocked on the waitgroup.
The go routine on which they are blocked is waiting on big query to return.
On every user request we create a new go routine so they shouldn't get
blocked because of these two.
Ho
That should allow your server to clean up “dead” clients. Typically you use
this in conjunction with a ‘keep alive’ in the protocol.
I am doubtful that a bunch of dead clients hanging around would cause a CPU
spike. You really don’t have too many Go routines/connections involved (I’ve
worked wi
Robert,
I assume we can safely add these timeouts based on what we expect
should be a reasonable timeout for our clients ?
s.ReadTimeout = expTimeOut * time.Second
s.WriteTimeout = expTimeOut * time.Second
On Tue, Aug 25, 2020 at 1:14 PM Siddhesh Divekar
wrote:
> Both servers and data sources
Both servers and data sources are in the cloud.
I would not say a lot of data, it's precomputed data which shouldn't take
that long.
On Tue, Aug 25, 2020 at 11:25 AM Robert Engels
wrote:
> Are you transferring a lot of data? Are the servers non-cloud hosted?
>
> You could be encountering “tcp s
Are you transferring a lot of data? Are the servers non-cloud hosted?
You could be encountering “tcp stalls”.
> On Aug 25, 2020, at 9:24 AM, Siddhesh Divekar
> wrote:
>
>
> Clients are over the internet.
>
>> On Tue, Aug 25, 2020 at 3:25 AM Robert Engels wrote:
>> The tcp protocol allows
Clients are over the internet.
On Tue, Aug 25, 2020 at 3:25 AM Robert Engels wrote:
> The tcp protocol allows the connection to wait for hours. Go routines
> stuck in wait do not burn CPU. Are the clients local or remote (over
> internet)?
>
> On Aug 24, 2020, at 10:29 PM, Siddhesh Divekar
> wr
The tcp protocol allows the connection to wait for hours. Go routines stuck in
wait do not burn CPU. Are the clients local or remote (over internet)?
> On Aug 24, 2020, at 10:29 PM, Siddhesh Divekar
> wrote:
>
>
> Robert,
>
> We will do the profiling next time we hit the issue again & see w
Robert,
We will do the profiling next time we hit the issue again & see what is
happening.
This was the first time we saw the issue & don't want to get rid of http2
advantages without making sure it's the actual culprit.
Do you think in the meanwhile we should do what the discussion below
suggest
I think it is too hard to tell with the limited information. It could be
exhausted connections or it could be thrashing (the claim of high cpu)
I think you want to run profiling capture prior to hitting the stick state -
you should be able to detect what is happening.
If the referenced issue i
Hi Robert,
Sorry I missed your earlier response.
>From what we saw our UI was blocked and since everything was unresponsive
we had to recover the system by sending sig abort.
On Mon, Aug 24, 2020 at 4:11 PM Siddhesh Divekar
wrote:
> Looking at the no. of go routines we have does this apply to
Looking at the no. of go routines we have does this apply to this issue ?
https://github.com/golang/go/issues/27044
On Mon, Aug 24, 2020 at 12:54 PM Robert Engels
wrote:
> Go routines in a waiting state should not be consuming CPU. Are you
> certain they are not in constant transition from waiti
Go routines in a waiting state should not be consuming CPU. Are you certain
they are not in constant transition from waiting to processing - this could
show up as high CPU usage while everything looks blocks.
I would use pprof - github.com/robaho/goanalyzer might be of assistance here to
see t
On Sat, Aug 22, 2020 at 12:06 PM Siddhesh Divekar
wrote:
>
> We saw an issue with our process running in k8s on ubuntu 18.04.
> Most of the go routines are stuck for several minutes in http/http2 net code.
>
> Have you seen similar issues ?
>
> goroutine 2800143 [select, 324 minutes]:
> net/http.
Hi All,
We saw an issue with our process running in k8s on ubuntu 18.04.
Most of the go routines are stuck for several minutes in http/http2 net
code.
Have you seen similar issues ?
goroutine 2800143 [select, 324 minutes]:
net/http.(*persistConn).readLoop(0xc00187d440)
/usr/local/go/src/net/h
24 matches
Mail list logo