That's right Kurtis, we don't wait or check if the prior go routine which fired the big query has completed or not. The rationale there was the data which comes out from big query is not much and should not take more than 2 mins.
To be precise every 2 mins we will fire 6 independent Big queries & 24 independent elastic queries each in its own go routine. The data coming out is cached locally and when the user requests that data we serve it out of cache. What we are unable to understand is why would this cause http server to stop responding to user requests which are not coming for this data. Regular requests were also not getting served. Since the system was responding we had to send sig abort and collect the back trace. goroutine 3332366 [select, *79 minutes*]: net/http.(*http2ClientConn).*roundTrip*(0xc003759b00, 0xc00050b700, 0x0, 0x0, 0x0, 0x0) /usr/local/go/src/net/http/h2_bundle.go:7573 +0x9a0 net/http.(*http2Transport).*RoundTripOpt*(0xc0000a0480, 0xc00050b700, 0xc0037a8600, 0x74aba5, 0x0, 0xc0001c8000) /usr/local/go/src/net/http/h2_bundle.go:6936 +0x172 net/http.(*http2Transport).*RoundTrip*(...) /usr/local/go/src/net/http/h2_bundle.go:6897 net/http.http2noDialH2RoundTripper.*RoundTrip*(0xc0000a0480, 0xc00050b700, 0xc0047e97a0, 0x5, 0xc0001c8088) /usr/local/go/src/net/http/h2_bundle.go:9032 +0x3e net/http.(*Transport).*roundTrip*(0x5b48c00, 0xc00050b700, 0x2b3ce20, 0x5b48801, 0xc0048deb00) /usr/local/go/src/net/http/transport.go:485 +0xdc8 net/http.(*Transport).*RoundTrip*(0x5b48c00, 0xc00050b700, 0x3301658, 0xa, 0xc0048debd0) /usr/local/go/src/net/http/roundtrip.go:17 +0x35 google.golang.org/api/transport/http.(*parameterTransport).*RoundTrip*(0xc0028d9e40, 0xc00050b600, 0x5, 0x5, 0x0) /builds/fusionio/fusion/controller/gopath/pkg/mod/ google.golang.org/api@v0.19.0/transport/http/dial.go:145 +0x224 go.opencensus.io/plugin/ochttp.(*traceTransport).*RoundTrip*(0xc00284d6c0, 0xc00050b600, 0xc002080980, 0x1, 0x1) /builds/fusionio/fusion/controller/gopath/pkg/mod/go.opencensus.io@v0.22.3/plugin/ochttp/trace.go:84 +0x456 go.opencensus.io/plugin/ochttp.statsTransport.*RoundTrip*(0x38d4e60, 0xc00284d6c0, 0xc00050b400, 0xc0025e27e0, 0xc0025e27e0, 0x2ecb720) /builds/fusionio/fusion/controller/gopath/pkg/mod/go.opencensus.io@v0.22.3/plugin/ochttp/client_stats.go:57 +0x5cc go.opencensus.io/plugin/ochttp.(*Transport).*RoundTrip*(0xc003731d60, 0xc00050b400, 0x0, 0x0, 0x150) /builds/fusionio/fusion/controller/gopath/pkg/mod/go.opencensus.io@v0.22.3/plugin/ochttp/client.go:99 +0x20e golang.org/x/oauth2.(*Transport).*RoundTrip*(0xc00376bc20, 0xc00050b300, 0x0, 0x0, 0x0) /builds/fusionio/fusion/controller/gopath/pkg/mod/ golang.org/x/oauth2@v0.0.0-20200107190931-bf48bf16ab8d/transport.go:55 +0xf4 net/http.send(0xc00050b300, 0x38d5060, 0xc00376bc20, 0x0, 0x0, 0x0, 0xc0036a64c8, 0x1d6cede, 0x1, 0x0) /usr/local/go/src/net/http/client.go:250 +0x443 net/http.(*Client).send(0xc003afdb60, 0xc00050b300, 0x0, 0x0, 0x0, 0xc0036a64c8, 0x0, 0x1, 0x0) /usr/local/go/src/net/http/client.go:174 +0xfa net/http.(*Client).do(0xc003afdb60, 0xc00050b300, 0x0, 0x0, 0x0) /usr/local/go/src/net/http/client.go:641 +0x3ce net/http.(*Client).Do(...) /usr/local/go/src/net/http/client.go:509 google.golang.org/api/internal/gensupport.send(0x393ede0, 0xc00346b8c0, 0xc003afdb60, 0xc00050b200, 0x5b7fee0, 0xc0037a9300, 0x1d6f876) /builds/fusionio/fusion/controller/gopath/pkg/mod/ google.golang.org/api@v0.19.0/internal/gensupport/send.go:67 +0xef google.golang.org/api/internal/gensupport.SendRequest(0x393ede0, 0xc00346b8c0, 0xc003afdb60, 0xc00050b200, 0xc0037a95e8, 0x84, 0x0) /builds/fusionio/fusion/controller/gopath/pkg/mod/ google.golang.org/api@v0.19.0/internal/gensupport/send.go:52 +0x193 google.golang.org/api/bigquery/v2.(*JobsGetQueryResultsCall).doRequest(0xc0037a98f8, 0x32f6d9c, 0x4, 0x0, 0x0, 0x203001) /builds/fusionio/fusion/controller/gopath/pkg/mod/ google.golang.org/api@v0.19.0/bigquery/v2/bigquery-gen.go:7768 +0x765 google.golang.org/api/bigquery/v2.(*JobsGetQueryResultsCall).Do(0xc0037a98f8, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0) /builds/fusionio/fusion/controller/gopath/pkg/mod/ google.golang.org/api@v0.19.0/bigquery/v2/bigquery-gen.go:7780 +0xa8 cloud.google.com/go/bigquery.(*Job).waitForQuery.func1(0x6d183a, 0xc0037a97c0, 0xc0037a97e0) /builds/fusionio/fusion/controller/gopath/pkg/mod/ cloud.google.com/go/bigquery@v1.4.0/job.go:320 +0x48 cloud.google.com/go/internal.retry(0x393ede0, 0xc00346b8c0, 0x3b9aca00, 0xdf8475800, 0x4000000000000000, 0x0, 0xc0037a98e0, 0x3433818, 0x0, 0x203000) /builds/fusionio/fusion/controller/gopath/pkg/mod/ cloud.google.com/go@v0.53.0/internal/retry.go:38 +0x48 cloud.google.com/go/internal.Retry(...) /builds/fusionio/fusion/controller/gopath/pkg/mod/ cloud.google.com/go@v0.53.0/internal/retry.go:31 cloud.google.com/go/bigquery.(*Job).waitForQuery(0xc004080660, 0x393ede0, 0xc00346b8c0, 0xc001b74ca0, 0x7, 0x393ed60, 0xc0000d8018, 0x2d62240, 0x5b7e118, 0x3242840, ...) /builds/fusionio/fusion/controller/gopath/pkg/mod/ cloud.google.com/go/*bigquery*@v1.4.0/job.go:319 +0x4df cloud.google.com/go/bigquery.(*Job).read(0xc004080660, 0x393ede0, 0xc00346b8c0, 0xc0037a9b38, 0x3430808, 0x0, 0x0, 0x393ede0) /builds/fusionio/fusion/controller/gopath/pkg/mod/ cloud.google.com/go/*bigquery*@v1.4.0/job.go:293 +0xbd cloud.google.com/go/bigquery.(*Job).Read(0xc004080660, 0x393ed60, 0xc0000d8018, 0x0, 0x0, 0x0) /builds/fusionio/fusion/controller/gopath/pkg/mod/ cloud.google.com/go/*bigquery*@v1.4.0/job.go:280 +0x14c cloud.google.com/go/bigquery.(*Query).Read(0xc003615ca0, 0x393ed60, 0xc0000d8018, 0xc003743a00, 0x1, 0x1) /builds/fusionio/fusion/controller/gopath/pkg/mod/ cloud.google.com/go/*bigquery*@v1.4.0/query.go:333 +0x91 git.fusion.io/fusionio/fusion/utils.git/bigquery.(*BigQueryWrapper).GetTemplateSQL(0xc003afdbc0, 0x331199e, 0x11, 0xc003afda70, 0x393f0e0, 0xc00010e1c0, 0xc003afdbc0, 0x0, 0x0) /builds/fusionio/fusion/controller/gopath/pkg/mod/ git.fusion.io/fusionio/fusion/utils.git@v0.0.0-20200813012118-9675cdb4e965/bigquery/bigquery.go:105 +0x25d git.fusion.io/fusionio/fusion/controller.git/stats.(*StatsDao).executeQuery(0xc000010418, 0xc0028c3000, 0x331199e, 0x11, 0x24, 0x0, 0x0, 0x0) /builds/fusionio/fusion/controller/stats/statsdao.go:401 +0xe3 git.fusion.io/fusionio/fusion/controller.git/stats.(*StatsDao).GetTeamAppExpScore(.. .) /builds/fusionio/fusion/controller/stats/statsdao.go:545 git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).GetTeamAppExpScore(0xc001b54d40, 0xc0028c3000, 0x0, 0x42fd7a, 0x0, 0x0) /builds/fusionio/fusion/controller/stats/ins_mgr.go:459 +0x4e git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).runQuery(0xc001b54d40, 0x330e1b0, 0xf, 0xc003da9870, 0xc0024ef080, 0xc001c4fcf0) /builds/fusionio/fusion/controller/stats/ins_mgr.go:464 +0x7e created by git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).runParallelQuery /builds/fusionio/fusion/controller/stats/ins_mgr.go:485 +0x1a4 goroutine 3332154 [select, *79 minutes*]: net/http.(*http2ClientConn).*roundTrip*(0xc003759b00, 0xc00050a500, 0x0, 0x0, 0x0, 0x0) /usr/local/go/src/net/http/h2_bundle.go:7573 +0x9a0 net/http.(*http2Transport).*RoundTripOpt*(0xc0000a0480, 0xc00050a500, 0xc002de4600, 0x74aba5, 0x0, 0xc0001c8000) /usr/local/go/src/net/http/h2_bundle.go:6936 +0x172 net/http.(*http2Transport).*RoundTrip*(...) /usr/local/go/src/net/http/h2_bundle.go:6897 net/http.http2noDialH2RoundTripper.*RoundTrip*(0xc0000a0480, 0xc00050a500, 0xc0047e95f0, 0x5, 0xc0001c8088) /usr/local/go/src/net/http/h2_bundle.go:9032 +0x3e net/http.(*Transport).*roundTrip*(0x5b48c00, 0xc00050a500, 0x2b3ce20, 0x5b48801, 0xc00170db80) /usr/local/go/src/net/http/transport.go:485 +0xdc8 net/http.(*Transport).*RoundTrip*(0x5b48c00, 0xc00050a500, 0x3301658, 0xa, 0xc00170dc20) /usr/local/go/src/net/http/roundtrip.go:17 +0x35 google.golang.org/api/transport/http.(*parameterTransport).*RoundTrip*(0xc0020a7e00, 0xc00050a400, 0x5, 0x5, 0x0) /builds/fusionio/fusion/controller/gopath/pkg/mod/ google.golang.org/api@v0.19.0/transport/http/dial.go:145 +0x224 go.opencensus.io/plugin/ochttp.(*traceTransport).*RoundTrip*(0xc00284c540, 0xc00050a400, 0xc000eb78a0, 0x1, 0x1) /builds/fusionio/fusion/controller/gopath/pkg/mod/go.opencensus.io@v0.22.3/plugin/ochttp/trace.go:84 +0x456 go.opencensus.io/plugin/ochttp.statsTransport.*RoundTrip*(0x38d4e60, 0xc00284c540, 0xc00050a200, 0xc00420b910, 0xc00420b910, 0x2ecb720) /builds/fusionio/fusion/controller/gopath/pkg/mod/go.opencensus.io@v0.22.3/plugin/ochttp/client_stats.go:57 +0x5cc go.opencensus.io/plugin/ochttp.(*Transport).*RoundTrip*(0xc00400d360, 0xc00050a200, 0x0, 0x0, 0x150) /builds/fusionio/fusion/controller/gopath/pkg/mod/go.opencensus.io@v0.22.3/plugin/ochttp/client.go:99 +0x20e golang.org/x/oauth2.(*Transport).RoundTrip(0xc0022aaee0, 0xc00050a100, 0x0, 0x0, 0x0) /builds/fusionio/fusion/controller/gopath/pkg/mod/ golang.org/x/oauth2@v0.0.0-20200107190931-bf48bf16ab8d/transport.go:55 +0xf4 net/http.send(0xc00050a100, 0x38d5060, 0xc0022aaee0, 0x0, 0x0, 0x0, 0xc0036a6310, 0x1d6cede, 0x1, 0x0) /usr/local/go/src/net/http/client.go:250 +0x443 net/http.(*Client).send(0xc003b8e540, 0xc00050a100, 0x0, 0x0, 0x0, 0xc0036a6310, 0x0, 0x1, 0x0) /usr/local/go/src/net/http/client.go:174 +0xfa net/http.(*Client).do(0xc003b8e540, 0xc00050a100, 0x0, 0x0, 0x0) /usr/local/go/src/net/http/client.go:641 +0x3ce net/http.(*Client).Do(...) /usr/local/go/src/net/http/client.go:509 google.golang.org/api/internal/gensupport.send(0x393ede0, 0xc0031bb710, 0xc003b8e540, 0xc00050a000, 0x5b7fee0, 0xc002de5300, 0x1d6f876) /builds/fusionio/fusion/controller/gopath/pkg/mod/ google.golang.org/api@v0.19.0/internal/gensupport/send.go:67 +0xef google.golang.org/api/internal/gensupport.SendRequest(0x393ede0, 0xc0031bb710, 0xc003b8e540, 0xc00050a000, 0xc002de55e8, 0x84, 0x0) /builds/fusionio/fusion/controller/gopath/pkg/mod/ google.golang.org/api@v0.19.0/internal/gensupport/send.go:52 +0x193 google.golang.org/api/bigquery/v2.(*JobsGetQueryResultsCall).doRequest(0xc002de58f8, 0x32f6d9c, 0x4, 0x0, 0x0, 0x203000) /builds/fusionio/fusion/controller/gopath/pkg/mod/ google.golang.org/api@v0.19.0/*bigquery*/v2/bigquery-gen.go:7768 +0x765 google.golang.org/api/bigquery/v2.(*JobsGetQueryResultsCall).Do(0xc002de58f8, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0) /builds/fusionio/fusion/controller/gopath/pkg/mod/ google.golang.org/api@v0.19.0/*bigquery*/v2/bigquery-gen.go:7780 +0xa8 cloud.google.com/go/bigquery.(*Job).waitForQuery.func1(0x6d183a, 0xc002de57c0, 0xc002de57e0) /builds/fusionio/fusion/controller/gopath/pkg/mod/ cloud.google.com/go/*bigquery*@v1.4.0/job.go:320 +0x48 cloud.google.com/go/internal.retry(0x393ede0, 0xc0031bb710, 0x3b9aca00, 0xdf8475800, 0x4000000000000000, 0x0, 0xc002de58e0, 0x3433818, 0x0, 0x203000) /builds/fusionio/fusion/controller/gopath/pkg/mod/ cloud.google.com/go@v0.53.0/internal/retry.go:38 +0x48 cloud.google.com/go/internal.Retry(...) /builds/fusionio/fusion/controller/gopath/pkg/mod/ cloud.google.com/go@v0.53.0/internal/retry.go:31 cloud.google.com/go/bigquery.(*Job).waitForQuery(0xc003491980, 0x393ede0, 0xc0031bb710, 0xc00233d700, 0x7, 0x393ed60, 0xc0000d8018, 0x2d62240, 0x5b7e118, 0x3242840, ...) /builds/fusionio/fusion/controller/gopath/pkg/mod/ cloud.google.com/go/bigquery@v1.4.0/job.go:319 +0x4df cloud.google.com/go/bigquery.(*Job).read(0xc003491980, 0x393ede0, 0xc0031bb710, 0xc002de5b38, 0x3430808, 0x0, 0x0, 0x393ede0) /builds/fusionio/fusion/controller/gopath/pkg/mod/ cloud.google.com/go/bigquery@v1.4.0/job.go:293 +0xbd cloud.google.com/go/bigquery.(*Job).Read(0xc003491980, 0x393ed60, 0xc0000d8018, 0x0, 0x0, 0x0) /builds/fusionio/fusion/controller/gopath/pkg/mod/ cloud.google.com/go/bigquery@v1.4.0/job.go:280 +0x14c cloud.google.com/go/bigquery.(*Query).Read(0xc003c79ca0, 0x393ed60, 0xc0000d8018, 0xc001e36200, 0x1, 0x1) /builds/fusionio/fusion/controller/gopath/pkg/mod/ cloud.google.com/go/bigquery@v1.4.0/query.go:333 +0x91 git.fusion.io/fusionio/fusion/utils.git/bigquery.(*BigQueryWrapper).GetTemplateSQL(0xc003b8e5a0, 0x331199e, 0x11, 0xc003b8e450, 0x393f0e0, 0xc00010e1c0, 0xc003b8e5a0, 0x0, 0x0) /builds/fusionio/fusion/controller/gopath/pkg/mod/ git.fusion.io/fusionio/fusion/utils.git@v0.0.0-20200813012118-9675cdb4e965/bigquery/bigquery.go:105 +0x25d git.fusion.io/fusionio/fusion/controller.git/stats.(*StatsDao).executeQuery(0xc000010418, 0xc0020cb600, 0x331199e, 0x11, 0xc000eb0000, 0xbd, 0x400, 0x88) /builds/fusionio/fusion/controller/stats/statsdao.go:401 +0xe3 git.fusion.io/fusionio/fusion/controller.git/stats.(*StatsDao).GetTeamAppExpScore(.. .) /builds/fusionio/fusion/controller/stats/statsdao.go:545 git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).GetTeamAppExpScore(0xc001b54d40, 0xc0020cb600, 0x44ea57, 0x42fd7a, 0xc001470558, 0x101) /builds/fusionio/fusion/controller/stats/ins_mgr.go:459 +0x4e git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).runQuery(0xc001b54d40, 0x330e1b0, 0xf, 0xc0030d2a80, 0xc0035995c0, 0xc0015ede40) /builds/fusionio/fusion/controller/stats/ins_mgr.go:464 +0x7e created by git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).runParallelQuery /builds/fusionio/fusion/controller/stats/ins_mgr.go:485 +0x1a4 Corresponding go routines waiting on wait groups. goroutine 3332149 [semacquire, *79 minutes*]: sync.runtime_Semacquire(0xc0015ede48) /usr/local/go/src/runtime/sema.go:56 +0x42 sync.(*WaitGroup).Wait(0xc0015ede40) /usr/local/go/src/sync/waitgroup.go:130 +0x64 git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).runParallelQuery(0xc001b54d40, 0xc00249dc00, 0x330e1b0, 0xf, 0xc00249dcf0, 0x3) /builds/fusionio/fusion/controller/stats/ins_mgr.go:488 +0x1d7 git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).*GetMainUi*(0xc001b54d40, 0xc00249ddb8, 0xc003200668, 0xc00407a520, 0xc003200590, 0x46ee97) /builds/fusionio/fusion/controller/stats/ins_mgr.go:567 +0xa0d git.fusion.io/fusionio/fusion/controller.git/stats.(*Prefetcher). *fetchMainUiTeamInterval*(0xc001b56780, 0xc002356ba0, 0x24, 0x32f7b78, 0x5) /builds/fusionio/fusion/controller/stats/prefetcher.go:77 +0xf2 created by git.fusion.io/fusionio/fusion/controller.git/stats.(*Prefetcher).prefetchStats /builds/fusionio/fusion/controller/stats/prefetcher.go:100 +0xd8 goroutine 3332131 [semacquire, *79 minutes*]: sync.runtime_Semacquire(0xc001c4fcf8) /usr/local/go/src/runtime/sema.go:56 +0x42 sync.(*WaitGroup).Wait(0xc001c4fcf0) /usr/local/go/src/sync/waitgroup.go:130 +0x64 git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).runParallelQuery(0xc001b54d40, 0xc002912c00, 0x330e1b0, 0xf, 0xc002912cf0, 0x3) /builds/fusionio/fusion/controller/stats/ins_mgr.go:488 +0x1d7 git.fusion.io/fusionio/fusion/controller.git/stats.(*InsMgr).*GetMainUi*(0xc001b54d40, 0xc002912db8, 0xc001870e68, 0x746121, 0xc0010fcaf8, 0x17) /builds/fusionio/fusion/controller/stats/ins_mgr.go:567 +0xa0d git.fusion.io/fusionio/fusion/controller.git/stats.(*Prefetcher). *fetchMainUiTeamInterval*(0xc001b56780, 0xc002356810, 0x24, 0x32f7b78, 0x5) /builds/fusionio/fusion/controller/stats/prefetcher.go:77 +0xf2 created by git.fusion.io/fusionio/fusion/controller.git/stats.(*Prefetcher).prefetchStats /builds/fusionio/fusion/controller/stats/prefetcher.go:100 +0xd8 On Wed, Aug 26, 2020 at 9:28 PM Kurtis Rader <kra...@skepticism.us> wrote: > On Wed, Aug 26, 2020 at 8:51 PM Siddhesh Divekar < > siddhesh.dive...@gmail.com> wrote: > >> Right, then it looks less likely that we are blocked on a mutex. >> >> Every 2 minutes we spin up a go routine which then in turn spins up a >> bunch of go routines to collect data from big query & elastic (data >> collector routines). >> The 2 minutes go routine then in turn waits on waitgroup for data >> collector routines to come back. So its not single go routine from our side >> at least. >> From backtrace we have from sigabort we see only one such data collector >> go routine blocked and other 2 2 mins thread waiting on waitgroup. >> > > Are you spinning up a Go routine every 2 minutes regardless of whether the > previous instance had completed? Your comment is not clear whether you have > one, or more than one, goroutine issuing a BigQuery query at the time you > saw the problem and captured the backtraces. If you are, in fact, starting > those Go routines without regard to whether the prior instance had > completed that seems like a "yellow flag", "here be dragons", situation. > > -- > Kurtis Rader > Caretaker of the exceptional canines Junior and Hank > -- -Siddhesh. -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CAMjfk%2Bhd%2Bu7XeG%3Ddg9Pe73S%2BOdQ9rRu68HGDDG3xp7a4Jrh8HA%40mail.gmail.com.