Thank you for the very detailed explanation. 

What I am struggling with is that even if the routines do a little work and go back to sleep (which is probably the case here) - as I increase the number of external clients I would expect there is always a routine ready to run. 

What I am observing is that the throughput is capped. Eg with 10 clients each processes 3k requests a second, but with 20 clients each processes 1.5k requests a sec. Yet, the cpu usage is about the same and the IO rates about the same. Which points to scheduling delays - but I’ll confess after reading your response I still am not sure how to diagnose (what other tools I can look at - the trace seems too noisy). 

Is there some way to get high level scheduling statistics?

On Jul 18, 2024, at 6:31 PM, Rhys Hiltner <rhys.hilt...@gmail.com> wrote:


Hi Robert,

First, note that the contention profile for runtime-internal locks doesn't correctly blame the part of the code that _caused_ delay: with Go 1.22 and 1.23, the call stacks are of (runtime-internal) lock users that _experienced_ delay. That's https://go.dev/issue/66999, and those odd semantics are the reason that those stacks aren't shown by default (and instead require the GODEBUG). That was briefly fixed in late May, but caused some slowdowns and so couldn't be part of the Go 1.23 release.

Contention within findRunnable is probably on sched.lock, which protects the global portion of the scheduler. (Often, scheduling only needs to involve each P's local run queue.)

> Is this a side effect of the scheduler, and it is really blocked in the unlock call waiting for another routine to be runnable?

There really is a thread (an M with a P) that is blocked. It's not waiting for a G to be runnable .. depending on the exact part of findRunnable, it might be waiting to see whether a goroutine is runnable or not, or the M might be trying to go to sleep because it's determined that there's no work. The profile also shows runtime.goschedImpl, which means there's a G calling runtime.Gosched, and its M (and P) are blocked trying to add the G to the global run queue.

As for what's causing it ... consider how much work the program's goroutines do when they run (maybe "only tens of microseconds"), how and how often the program uses Gosched (maybe "a bit too much", though I can't quantify that), the setting for GOMAXPROCS (maybe "lots", though GOOS=darwin implies it won't be huge). Summing the runtime/metrics buckets for "/sched/latencies:seconds" and multiplying by 8 (runtime.gTrackingPeriod) will give an estimate of how many times the scheduler started running a goroutine ... or a runtime/trace can show all of those details at once.

Rhys

On Thursday, July 18, 2024 at 1:00:15 PM UTC-7 robert engels wrote:
Anyone have any idea on the high internal lock time on findRunnable?

On Jul 18, 2024, at 10:36 AM, robert engels <ren...@ix.netcom.com> wrote:

Looks like the name was changed to runtimecontentionstacks

Much better now…

(pprof) tree
Showing nodes accounting for 481.75ms, 100% of 481.75ms total
Dropped 39 nodes (cum <= 2.41ms)
----------------------------------------------------------+-------------
      flat  flat%   sum%        cum   cum%   calls calls% + context   
----------------------------------------------------------+-------------
                                          401.25ms 83.51% |   runtime.findRunnable (inline)
                                           35.26ms  7.34% |   runtime.sysmon (inline)
                                           17.37ms  3.62% |   runtime.stopm (inline)
                                           14.01ms  2.92% |   runtime.saveBlockEventStack (inline)
                                            6.07ms  1.26% |   runtime.goschedImpl (inline)
                                            1.52ms  0.32% |   runtime.schedule (inline)
                                            1.30ms  0.27% |   sync.(*Mutex).Unlock (inline)
                                            1.30ms  0.27% |   github.com/robaho/go-trader/internal/exchange.sendMarketData (inline)
                                            1.17ms  0.24% |   sync.(*Map).Swap (inline)
                                            1.16ms  0.24% |   github.com/robaho/go-trader/internal/exchange.(*grpcServer).Connection (inline)
  480.46ms 99.73% 99.73%   480.46ms 99.73%                | runtime.unlock

but this seems suspect, that finding the runnable should consume so much time. Is this a side effect of the scheduler, and it is really blocked in the unlock call waiting for another routine to be runnable?



On Jul 18, 2024, at 10:14 AM, 'Robert Engels' via golang-nuts <golan...@googlegroups.com> wrote:

I found this issue, https://github.com/golang/go/issues/57071 and tried running with 

GODEBUG=profileruntimelocks=1

but it made no difference in the output.

I am using Go 1.22.4 on OSX.

Any ideas how I can determine what is causing this high contention shown below:

(pprof) tree
Showing nodes accounting for 4440.35ms, 100% of 4440.35ms total
----------------------------------------------------------+-------------
      flat  flat%   sum%        cum   cum%   calls calls% + context     
----------------------------------------------------------+-------------
 4324.17ms 97.38% 97.38%  4324.17ms 97.38%                | runtime._LostContendedRuntimeLock
----------------------------------------------------------+-------------
                                          116.17ms   100% |   sync.(*Map).Swap (inline)
  116.17ms  2.62%   100%   116.17ms  2.62%                | sync.(*Mutex).Unlock
----------------------------------------------------------+-------------
                                          116.17ms   100% |   github.com/robaho/go-trader/internal/exchange.(*grpcServer).massquote
         0     0%   100%   116.17ms  2.62%                | github.com/robaho/go-trader/internal/exchange.(*exchange).Quote
                                          116.17ms   100% |   github.com/robaho/go-trader/internal/exchange.sendMarketData


--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/25E80A1A-DFA8-4BA6-A49C-0F8F656DFA81%40me.com.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/6c8a0ca9-5a87-45d9-802d-8087193d68b6n%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/22E3B22A-8904-4490-A3E3-FC9518C3C649%40ix.netcom.com.

Reply via email to