And just as an aside, I think you would be interested in the talk "Rethinking Classical Concurrency Patterns" by Bryan C. Mills https://www.youtube.com/watch?v=5zXAHh5tJqQ
On Sunday, 12 June 2022 at 05:06:47 UTC+1 ke...@burke.dev wrote: > That's a very clear explanation, it's obvious what the problem is now. > Thank you! > > On Saturday, June 11, 2022 at 9:36:59 AM UTC-7 se...@liao.dev wrote: > >> sync.Cond does not affect goroutine scheduling priority, and Signal only >> makes the waiting goroutine available to be scheduled, but not force it to >> be. >> After a Signal() (and Unlock()), every other waiting worker and the >> flusher then contends (fairly) for the lock. >> What you want appears a better fit for either channels (send everything >> to the flusher) or just inlining the check+flush logic into writeEvent, >> essentially a proper serialization of events >> >> See also: https://github.com/golang/go/issues/21165 >> > On top of that, condition variables are fiendishly difficult to use: >> they are prone to either missed or spurious signals [citation needed] >> >> - sean >> >> >> On Sat, Jun 11, 2022 at 3:27 PM Kevin Burke <ke...@burke.dev> wrote: >> >>> Hi, >>> Recently I inherited some code in production that was hitting an error >>> case that I didn't think should be possible to hit. I reduced it down to >>> this test case. To be clear, there are several different ways to improve >>> the code here, but I'd like to understand why it's behaving the way it does >>> first. >>> >>> You should be able to just do "go test ." here to reproduce the error: >>> https://github.com/kevinburke/sync-cond-experiment >>> >>> What the code is doing: >>> >>> - Multiple different goroutines are taking a sync.Cond lock and then >>> appending data to a shared buffer. >>> - A "flush" goroutine calls sync.Cond.Wait() to wait for an incoming >>> signal that data has been appended >>> - Each goroutine that appends to the buffer calls Signal() after the >>> write, to try to wake up the "flush" goroutine >>> >>> I *expect* that the flush goroutine will wake up after each call to >>> Signal(), check whether the batch is ready to be flushed, and if not go >>> back to sleep. >>> >>> What I see instead is that lots of other goroutines are taking out the >>> lock before the flush goroutine can get to it, and as a result we're >>> dropping data. >>> >>> I didn't expect that to happen based on my reading of the docs for >>> sync.Cond, which (to me) indicate that Signal() will prioritize a goroutine >>> that calls Wait() (instead of any other goroutines that are waiting on >>> sync.Cond.L). Instead it looks like it's just unlocking any goroutine? >>> Maybe this is because the thread that is calling Signal() holds the lock? >>> >>> Thanks for your help, >>> Kevin >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "golang-nuts" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to golang-nuts...@googlegroups.com. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/golang-nuts/d7e17b2f-159c-4124-a023-eb2cdb8ba423n%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/golang-nuts/d7e17b2f-159c-4124-a023-eb2cdb8ba423n%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/8d7ff74a-a623-4cce-b5fc-870ea8c1495en%40googlegroups.com.