That should allow your server to clean up “dead” clients. Typically you use this in conjunction with a ‘keep alive’ in the protocol.
I am doubtful that a bunch of dead clients hanging around would cause a CPU spike. You really don’t have too many Go routines/connections involved (I’ve worked with 1000’s of live connections). I would look at something else… I am guessing your synchronization is incorrect and your threads are blocking, and you have a few that are spinning expecting something to happen that never will (because the source of the event is blocked on the mutex/lock). > On Aug 26, 2020, at 10:09 AM, Siddhesh Divekar <siddhesh.dive...@gmail.com> > wrote: > > Robert, > > I assume we can safely add these timeouts based on what we expect > should be a reasonable timeout for our clients ? > > s.ReadTimeout = expTimeOut * time.Second > s.WriteTimeout = expTimeOut * time.Second > > On Tue, Aug 25, 2020 at 1:14 PM Siddhesh Divekar <siddhesh.dive...@gmail.com > <mailto:siddhesh.dive...@gmail.com>> wrote: > Both servers and data sources are in the cloud. > I would not say a lot of data, it's precomputed data which shouldn't take > that long. > > > On Tue, Aug 25, 2020 at 11:25 AM Robert Engels <reng...@ix.netcom.com > <mailto:reng...@ix.netcom.com>> wrote: > Are you transferring a lot of data? Are the servers non-cloud hosted? > > You could be encountering “tcp stalls”. > >> On Aug 25, 2020, at 9:24 AM, Siddhesh Divekar <siddhesh.dive...@gmail.com >> <mailto:siddhesh.dive...@gmail.com>> wrote: >> >> >> Clients are over the internet. >> >> On Tue, Aug 25, 2020 at 3:25 AM Robert Engels <reng...@ix.netcom.com >> <mailto:reng...@ix.netcom.com>> wrote: >> The tcp protocol allows the connection to wait for hours. Go routines stuck >> in wait do not burn CPU. Are the clients local or remote (over internet)? >> >>> On Aug 24, 2020, at 10:29 PM, Siddhesh Divekar <siddhesh.dive...@gmail.com >>> <mailto:siddhesh.dive...@gmail.com>> wrote: >>> >>> >>> Robert, >>> >>> We will do the profiling next time we hit the issue again & see what is >>> happening. >>> This was the first time we saw the issue & don't want to get rid of http2 >>> advantages without making sure it's the actual culprit. >>> >>> Do you think in the meanwhile we should do what the discussion below >>> suggests anyways ? >>> https://stackoverflow.com/questions/42238695/goroutine-in-io-wait-state-for-long-time >>> >>> <https://stackoverflow.com/questions/42238695/goroutine-in-io-wait-state-for-long-time> >>> >>> >>> On Mon, Aug 24, 2020 at 5:37 PM Robert Engels <reng...@ix.netcom.com >>> <mailto:reng...@ix.netcom.com>> wrote: >>> I think it is too hard to tell with the limited information. It could be >>> exhausted connections or it could be thrashing (the claim of high cpu) >>> >>> I think you want to run profiling capture prior to hitting the stick state >>> - you should be able to detect what is happening. >>> >>> If the referenced issue is related I would assume you should be able to >>> connect by forcing http/1. >>> >>> You can also try disabling http/2 and see if your issue goes away. >>> >>> On Aug 24, 2020, at 6:15 PM, Siddhesh Divekar <siddhesh.dive...@gmail.com >>> <mailto:siddhesh.dive...@gmail.com>> wrote: >>> >>>> Hi Robert, >>>> >>>> Sorry I missed your earlier response. >>>> >>>> From what we saw our UI was blocked and since everything was unresponsive >>>> we had to recover the system by sending sig abort. >>>> >>>> On Mon, Aug 24, 2020 at 4:11 PM Siddhesh Divekar >>>> <siddhesh.dive...@gmail.com <mailto:siddhesh.dive...@gmail.com>> wrote: >>>> Looking at the no. of go routines we have does this apply to this issue ? >>>> https://github.com/golang/go/issues/27044 >>>> <https://github.com/golang/go/issues/27044> >>>> >>>> On Mon, Aug 24, 2020 at 12:54 PM Robert Engels <reng...@ix.netcom.com >>>> <mailto:reng...@ix.netcom.com>> wrote: >>>> Go routines in a waiting state should not be consuming CPU. Are you >>>> certain they are not in constant transition from waiting to processing - >>>> this could show up as high CPU usage while everything looks blocks. >>>> >>>> I would use pprof - github.com/robaho/goanalyzer >>>> <http://github.com/robaho/goanalyzer> might be of assistance here to see >>>> the actual work being done. >>>> >>>>> On Aug 24, 2020, at 9:10 AM, Siddhesh Divekar <siddhesh.dive...@gmail.com >>>>> <mailto:siddhesh.dive...@gmail.com>> wrote: >>>>> >>>>> >>>>> Hi Ian, >>>>> >>>>> Thanks for replying. >>>>> >>>>> We have a go server running which handles user requests & collects data >>>>> from various sources like gcp's cloud sql and big query. >>>>> We are using shopify's sarama library to do kafka operations. >>>>> >>>>> There are seeing lots of go routines in waiting state for several >>>>> minutes. >>>>> Over the period of time around 587 goroutines have been spun up. >>>>> >>>>> We see that two go routines are stuck on gcp big query and we are using >>>>> wait groups there. >>>>> However, it's not clear why that would cause all other go routines to get >>>>> hung & make cpu go high. >>>>> >>>>> goroutine 3332131 [semacquire, 79 minutes]: >>>>> sync.runtime_Semacquire(0xc001c4fcf8) >>>>> /usr/local/go/src/runtime/sema.go:56 +0x42 >>>>> sync.(*WaitGroup).Wait(0xc001c4fcf0) >>>>> /usr/local/go/src/sync/waitgroup.go:130 +0x64 >>>>> git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).runParallelQuery(0xc001b54d40 >>>>> >>>>> <http://git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).runParallelQuery(0xc001b54d40>, >>>>> 0xc002912c00, 0x330e1b0, 0xf, 0xc002912cf0, 0x3) >>>>> /builds/fusionio/fusion/controller/stats/ins_mgr.go:488 +0x1d7 >>>>> git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).GetMainUi(0xc001b54d40 >>>>> >>>>> <http://git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).GetMainUi(0xc001b54d40>, >>>>> 0xc002912db8, 0xc001870e68, 0x746121, 0xc0010fcaf8, 0x17) >>>>> /builds/fusionio/fusion/controller/stats/ins_mgr.go:567 +0xa0d >>>>> git.fusion.io/fusionio/fusion/controller.git/stats.(*Prefetcher).fetchMainUiTeamInterval(0xc001b56780 >>>>> >>>>> <http://git.fusion.io/fusionio/fusion/controller.git/stats.(*Prefetcher).fetchMainUiTeamInterval(0xc001b56780>, >>>>> 0xc002356810, 0x24, 0x32f7b78, 0x5) >>>>> /builds/fusionio/fusion/controller/stats/prefetcher.go:77 +0xf2 >>>>> created by >>>>> git.fusion.io/fusionio/fusion/controller.git/stats.(*Prefetcher).prefetchStats >>>>> >>>>> <http://git.fusion.io/fusionio/fusion/controller.git/stats.(*Prefetcher).prefetchStats> >>>>> /builds/fusionio/fusion/controller/stats/prefetcher.go:100 +0xd8 >>>>> >>>>> >>>>> goroutine 3332149 [semacquire, 79 minutes]: >>>>> sync.runtime_Semacquire(0xc0015ede48) >>>>> /usr/local/go/src/runtime/sema.go:56 +0x42 >>>>> sync.(*WaitGroup).Wait(0xc0015ede40) >>>>> /usr/local/go/src/sync/waitgroup.go:130 +0x64 >>>>> git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).runParallelQuery(0xc001b54d40 >>>>> >>>>> <http://git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).runParallelQuery(0xc001b54d40>, >>>>> 0xc00249dc00, 0x330e1b0, 0xf, 0xc00249dcf0, 0x3) >>>>> /builds/fusionio/fusion/controller/stats/ins_mgr.go:488 +0x1d7 >>>>> git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).GetMainUi(0xc001b54d40 >>>>> >>>>> <http://git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).GetMainUi(0xc001b54d40>, >>>>> 0xc00249ddb8, 0xc003200668, 0xc00407a520, 0xc003200590, 0x46ee97) >>>>> /builds/fusionio/fusion/controller/stats/ins_mgr.go:567 +0xa0d >>>>> git.fusion.io/fusionio/fusion/controller.git/stats.(*Prefetcher).fetchMainUiTeamInterval(0xc001b56780 >>>>> >>>>> <http://git.fusion.io/fusionio/fusion/controller.git/stats.(*Prefetcher).fetchMainUiTeamInterval(0xc001b56780>, >>>>> 0xc002356ba0, 0x24, 0x32f7b78, 0x5) >>>>> /builds/fusionio/fusion/controller/stats/prefetcher.go:77 +0xf2 >>>>> created by >>>>> git.fusion.io/fusionio/fusion/controller.git/stats.(*Prefetcher).prefetchStats >>>>> >>>>> <http://git.fusion.io/fusionio/fusion/controller.git/stats.(*Prefetcher).prefetchStats> >>>>> /builds/fusionio/fusion/controller/stats/prefetcher.go:100 +0xd8 >>>>> >>>>> I found the link below which kind of co-relates to our scenario. >>>>> https://stackoverflow.com/questions/42238695/goroutine-in-io-wait-state-for-long-time >>>>> >>>>> <https://stackoverflow.com/questions/42238695/goroutine-in-io-wait-state-for-long-time> >>>>> >>>>> Most of the go routines in the backtrace are in a net/http package so our >>>>> suspicion is that above bug in our code might be causing that. >>>>> Even the bigquery is getting hung in net/http. >>>>> >>>>> We are using go version 1.13.8 & are running on gcp kubernetes cluster on >>>>> ubuntu 18.04 docker. >>>>> >>>>> go env >>>>> GO111MODULE="" >>>>> GOARCH="amd64" >>>>> GOBIN="" >>>>> GOCACHE="/root/.cache/go-build" >>>>> GOENV="/root/.config/go/env" >>>>> GOEXE="" >>>>> GOFLAGS="" >>>>> GOHOSTARCH="amd64" >>>>> GOHOSTOS="linux" >>>>> GONOPROXY="" >>>>> GONOSUMDB="" >>>>> GOOS="linux" >>>>> GOPATH="/root/go" >>>>> GOPRIVATE="" >>>>> GOPROXY="https://proxy.golang.org <https://proxy.golang.org/>,direct" >>>>> GOROOT="/usr/local/go" >>>>> GOSUMDB="sum.golang.org <http://sum.golang.org/>" >>>>> GOTMPDIR="" >>>>> GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64" >>>>> GCCGO="gccgo" >>>>> AR="ar" >>>>> CC="gcc" >>>>> CXX="g++" >>>>> CGO_ENABLED="1" >>>>> GOMOD="/builds/prosimoio/prosimo/pdash/go.mod" >>>>> CGO_CFLAGS="-g -O2" >>>>> CGO_CPPFLAGS="" >>>>> CGO_CXXFLAGS="-g -O2" >>>>> CGO_FFLAGS="-g -O2" >>>>> CGO_LDFLAGS="-g -O2" >>>>> PKG_CONFIG="pkg-config" >>>>> GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 >>>>> -fdebug-prefix-map=/tmp/go-build048009048=/tmp/go-build >>>>> -gno-record-gcc-switches" >>>>> >>>>> Let me know if any other information is needed. >>>>> >>>>> >>>>> On Sat, Aug 22, 2020 at 12:30 PM Ian Lance Taylor <i...@golang.org >>>>> <mailto:i...@golang.org>> wrote: >>>>> On Sat, Aug 22, 2020 at 12:06 PM Siddhesh Divekar >>>>> <siddhesh.dive...@gmail.com <mailto:siddhesh.dive...@gmail.com>> wrote: >>>>> > >>>>> > We saw an issue with our process running in k8s on ubuntu 18.04. >>>>> > Most of the go routines are stuck for several minutes in http/http2 net >>>>> > code. >>>>> > >>>>> > Have you seen similar issues ? >>>>> > >>>>> > goroutine 2800143 [select, 324 minutes]: >>>>> > net/http.(*persistConn).readLoop(0xc00187d440) >>>>> > /usr/local/go/src/net/http/transport.go:2032 +0x999 created by >>>>> > net/http.(*Transport).dialConn >>>>> > /usr/local/go/src/net/http/transport.go:1580 +0xb0d goroutine 2738894 >>>>> > [IO wait, 352 minutes]: internal/poll.runtime_pollWait(0x7f5b61b280c0, >>>>> > 0x72, 0xffffffffffffffff) /usr/local/go/src/runtime/netpoll.go:184 >>>>> > +0x55 internal/poll.(*pollDesc).wait(0xc0017e7e18, 0x72, 0x1000, >>>>> > 0x1000, 0xffffffffffffffff) >>>>> > /usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x45 >>>>> > internal/poll.(*pollDesc).waitRead(...) >>>>> > /usr/local/go/src/internal/poll/fd_poll_runtime.go:92 >>>>> > internal/poll.(*FD).Read(0xc0017e7e00, 0xc0044a9000, 0x1000, 0x1000, >>>>> > 0x0, 0x0, 0x0) /usr/local/go/src/internal/poll/fd_unix.go:169 +0x1cf >>>>> > net.(*netFD).Read(0xc0017e7e00, 0xc0044a9000, 0x1000, 0x1000, >>>>> > 0xc0026359e8, 0x49d7fd, 0xc0017e7e00) >>>>> > /usr/local/go/src/net/fd_unix.go:202 +0x4f >>>>> > net.(*conn).Read(0xc0000db8b8, 0xc0044a9000, 0x1000, 0x1000, 0x0, 0x0, >>>>> > 0x0) /usr/local/go/src/net/net.go:184 +0x68 >>>>> > net/http.(*connReader).Read(0xc004a4fec0, 0xc0044a9000, 0x1000, 0x1000, >>>>> > 0x0, 0x0, 0x0) /usr/local/go/src/net/http/server.go:785 +0xf4 >>>>> > bufio.(*Reader).fill(0xc003f1a360) /usr/local/go/src/bufio/bufio.go:100 >>>>> > +0x103 bufio.(*Reader).Peek(0xc003f1a360, 0x4, 0x0, 0x0, 0x0, 0x0, >>>>> > 0xc002635ad0) /usr/local/go/src/bufio/bufio.go:138 +0x4f >>>>> > net/http.(*conn).readRequest(0xc0028e1d60, 0x393ed20, 0xc0024e9780, >>>>> > 0x0, 0x0, 0x0) /usr/local/go/src/net/http/server.go:962 +0xb3b >>>>> > net/http.(*conn).serve(0xc0028e1d60, 0x393ed20, 0xc0024e9780) >>>>> > /usr/local/go/src/net/http/server.go:1817 +0x6d4 created by >>>>> > net/http.(*Server).Serve /usr/local/go/src/net/http/server.go:2928 >>>>> > +0x384 >>>>> > >>>>> > Is there a know issue or something obvious from the backtrace here. >>>>> >>>>> It's entirely normal for goroutines to sit in pollWait if they are >>>>> waiting for network I/O. There may be reasons why this is incorrect >>>>> for your program, but you'll have to tell us those reasons. >>>>> >>>>> Also, along with those reasons, please tell us the version of Go and >>>>> the exact environment that you are running. Thanks. >>>>> >>>>> Ian >>>>> >>>>> >>>>> -- >>>>> -Siddhesh. >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google Groups >>>>> "golang-nuts" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send an >>>>> email to golang-nuts+unsubscr...@googlegroups.com >>>>> <mailto:golang-nuts+unsubscr...@googlegroups.com>. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/golang-nuts/CAMjfk%2BggC%2BwgwM_%2BM5ML0SKD3qJphCrif%3D4c2AqB9v6n%2Btw5Jw%40mail.gmail.com >>>>> >>>>> <https://groups.google.com/d/msgid/golang-nuts/CAMjfk%2BggC%2BwgwM_%2BM5ML0SKD3qJphCrif%3D4c2AqB9v6n%2Btw5Jw%40mail.gmail.com?utm_medium=email&utm_source=footer>. >>>>> <backtrace.txt> >>>> >>>> >>>> -- >>>> -Siddhesh. >>>> >>>> >>>> -- >>>> -Siddhesh. >>> >>> >>> -- >>> -Siddhesh. >> >> >> -- >> -Siddhesh. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "golang-nuts" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to golang-nuts+unsubscr...@googlegroups.com >> <mailto:golang-nuts+unsubscr...@googlegroups.com>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/golang-nuts/CAMjfk%2BgQDHshxmGFiUFh6Ket9fp%2BwMuhgH6-FksyzMp9wz%2BXug%40mail.gmail.com >> >> <https://groups.google.com/d/msgid/golang-nuts/CAMjfk%2BgQDHshxmGFiUFh6Ket9fp%2BwMuhgH6-FksyzMp9wz%2BXug%40mail.gmail.com?utm_medium=email&utm_source=footer>. > > > -- > -Siddhesh. > > > -- > -Siddhesh. > > -- > You received this message because you are subscribed to the Google Groups > "golang-nuts" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to golang-nuts+unsubscr...@googlegroups.com > <mailto:golang-nuts+unsubscr...@googlegroups.com>. > To view this discussion on the web visit > https://groups.google.com/d/msgid/golang-nuts/CAMjfk%2BgHmUTo-f4u2NOtvzuZ-VH7708%3DXBcM02c76KDCCPiY%3Dw%40mail.gmail.com > > <https://groups.google.com/d/msgid/golang-nuts/CAMjfk%2BgHmUTo-f4u2NOtvzuZ-VH7708%3DXBcM02c76KDCCPiY%3Dw%40mail.gmail.com?utm_medium=email&utm_source=footer>. -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/51026787-ADAF-4B65-AB4A-07F7A178E57D%40ix.netcom.com.