Ahh that makes a lot of sense. Indeed, I'm guilty of doing a blocking >!! inside a go-block. I was so careful to avoid other kinds of blocking calls (like IO) that I forgot that blocking variants of core.async calls themselves were forbidden.
Thank you for pointing this out! I will rewire things to not do this. Per Gary's suggestion, I also think it'd be useful if core.async blocking ops checked a dynamic var (or a property of the thread itself) and at least warned if they are being called from a forbidden context. To resolve my original issue, I'm considering doing this in my dev environment: (doseq [v '[<!! >!!]] (alter-var-root (ns-resolve 'clojure.core.async v) (fn [f] (fn [& args] (if (.startsWith (.getName (Thread/currentThread)) "async-dispatch-") (throw (Exception. (str v " called inside async-dispatch"))) (apply f args)))))) On Tuesday, August 29, 2017 at 1:43:53 PM UTC-4, Gary Trakhman wrote: > > Hm, I came across a similar ordering invariant (No code called by a go > block should ever call the blocking variants of core.async functions) while > wrapping an imperative API, and I thought it might be useful to use > vars/binding to enforce it. Has this or other approaches been considered > in core.async? I could see a *fixed-thread-pool* var being set and >!! > checking for false. > > An analogy in existing clojure.core would be the STM commute's 'must be > running in a transaction' check that uses a threadlocal. > https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/LockingTransaction.java#L205 > > On Tue, Aug 29, 2017 at 1:30 PM Timothy Baldridge <tbald...@gmail.com > <javascript:>> wrote: > >> To add to what Alex said, look at this trace: >> https://gist.github.com/anonymous/65049ffdd37d43df8f23630928e8fed0#file-thread-dump-out-L1337-L1372 >> >> Here we see a go block calling mapcat, and inside the inner map something >> is calling >!!. As Alex mentioned this can be a source of deadlocks. No >> code called by a go block should ever call the blocking variants of >> core.async functions (<!!, >!!, alts!!, etc.). So I'd start at the code >> redacted in those lines and go from there. >> >> >> >> On Tue, Aug 29, 2017 at 11:09 AM, Alex Miller <al...@puredanger.com >> <javascript:>> wrote: >> >>> go blocks are multiplexed over a thread pool which has (by default) 8 >>> threads. You should never perform any kind of blocking activity inside a go >>> block, because if every go block in work happens to end up blocked, you >>> will prevent all go blocks from making any further progress. It sounds to >>> me like that's what has happened here. The go block threads are named >>> "async-dispatch-<n>" and it looks like there are 8 blocked ones in your >>> thread dump. >>> >>> It also looks like they are all blocking on a >!!, which is a blocking >>> call. So I would look for a go block that contains a >!! and convert that >>> to a >! or do something else to avoid blocking there. >>> >>> >>> On Tuesday, August 29, 2017 at 11:48:25 AM UTC-5, Aaron Iba wrote: >>>> >>>> My company has a production system that uses core.async extensively. >>>> We've been running it 24/7 for over a year with occasional restarts to >>>> update things and add features, and so far core.async has been working >>>> great. >>>> >>>> The other day, during a particularly high workload, the whole system >>>> got locked up. All the channels seemed blocked at once. I was able to >>>> connect with a REPL and poke around, and noticed strange behavior of >>>> core.async. Specifically, the following code, when evaluated in the REPL, >>>> blocked on the put (third expression): >>>> >>>> (def c (async/chan)) >>>> (go-loop [] >>>> (when-some [x (<! c)] >>>> (println x) >>>> (recur))) >>>> (>!! c true) >>>> >>>> Whereas on any fresh system, the above expressions obviously succeed. >>>> >>>> Puts succeeded if they went onto the channel's buffer, but not when >>>> they should go through to a consumer. For example with the following >>>> expressions, evaluated in the REPL, the first put succeeded (presumably >>>> because it went on the buffer), but subsequent puts blocked: >>>> >>>> (def c (async/chan 1)) >>>> (def m (async/mult c)) >>>> (def out (async/chan (async/sliding-buffer 3))) >>>> (async/tap m out) >>>> (>!! c true) ;; succeeds >>>> (>!! c true) ;; blocks forever >>>> >>>> This leads me to wonder if core.async itself somehow got into a bad >>>> state. It's entirely possible I caused this by misusing the API somewhere >>>> in the codebase, but we use core.async so extensively that I wouldn't know >>>> where to begin looking. >>>> >>>> I'm wondering if someone more familiar with core.async internals has an >>>> idea about what could cause the above situation. Or if we notice it >>>> happening again, what could I do to gather more helpful information. >>>> >>>> I also have a redacted thread dump, in case it's useful: >>>> >>>> https://gist.github.com/anonymous/65049ffdd37d43df8f23630928e8fed0 >>>> >>>> Any help would be much appreciated, >>>> >>>> Aaron >>>> >>>> P.S. core.async has been a godsend in terms of helping us structure and >>>> modularize our large system. Thank you to all those who contributed to >>>> this wonderful library! >>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Clojure" group. >>> To post to this group, send email to clo...@googlegroups.com >>> <javascript:> >>> Note that posts from new members are moderated - please be patient with >>> your first post. >>> To unsubscribe from this group, send email to >>> clojure+u...@googlegroups.com <javascript:> >>> For more options, visit this group at >>> http://groups.google.com/group/clojure?hl=en >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "Clojure" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to clojure+u...@googlegroups.com <javascript:>. >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> >> >> -- >> “One of the main causes of the fall of the Roman Empire was that–lacking >> zero–they had no way to indicate successful termination of their C >> programs.” >> (Robert Firth) >> >> -- >> You received this message because you are subscribed to the Google >> Groups "Clojure" group. >> To post to this group, send email to clo...@googlegroups.com >> <javascript:> >> Note that posts from new members are moderated - please be patient with >> your first post. >> To unsubscribe from this group, send email to >> clojure+u...@googlegroups.com <javascript:> >> For more options, visit this group at >> http://groups.google.com/group/clojure?hl=en >> --- >> You received this message because you are subscribed to the Google Groups >> "Clojure" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to clojure+u...@googlegroups.com <javascript:>. >> For more options, visit https://groups.google.com/d/optout. >> > -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.