To add to what Alex said, look at this trace: https://gist.github.com/anonymous/65049ffdd37d43df8f23630928e8fed0#file-thread-dump-out-L1337-L1372
Here we see a go block calling mapcat, and inside the inner map something is calling >!!. As Alex mentioned this can be a source of deadlocks. No code called by a go block should ever call the blocking variants of core.async functions (<!!, >!!, alts!!, etc.). So I'd start at the code redacted in those lines and go from there. On Tue, Aug 29, 2017 at 11:09 AM, Alex Miller <a...@puredanger.com> wrote: > go blocks are multiplexed over a thread pool which has (by default) 8 > threads. You should never perform any kind of blocking activity inside a go > block, because if every go block in work happens to end up blocked, you > will prevent all go blocks from making any further progress. It sounds to > me like that's what has happened here. The go block threads are named > "async-dispatch-<n>" and it looks like there are 8 blocked ones in your > thread dump. > > It also looks like they are all blocking on a >!!, which is a blocking > call. So I would look for a go block that contains a >!! and convert that > to a >! or do something else to avoid blocking there. > > > On Tuesday, August 29, 2017 at 11:48:25 AM UTC-5, Aaron Iba wrote: >> >> My company has a production system that uses core.async extensively. >> We've been running it 24/7 for over a year with occasional restarts to >> update things and add features, and so far core.async has been working >> great. >> >> The other day, during a particularly high workload, the whole system got >> locked up. All the channels seemed blocked at once. I was able to connect >> with a REPL and poke around, and noticed strange behavior of core.async. >> Specifically, the following code, when evaluated in the REPL, blocked on >> the put (third expression): >> >> (def c (async/chan)) >> (go-loop [] >> (when-some [x (<! c)] >> (println x) >> (recur))) >> (>!! c true) >> >> Whereas on any fresh system, the above expressions obviously succeed. >> >> Puts succeeded if they went onto the channel's buffer, but not when they >> should go through to a consumer. For example with the following >> expressions, evaluated in the REPL, the first put succeeded (presumably >> because it went on the buffer), but subsequent puts blocked: >> >> (def c (async/chan 1)) >> (def m (async/mult c)) >> (def out (async/chan (async/sliding-buffer 3))) >> (async/tap m out) >> (>!! c true) ;; succeeds >> (>!! c true) ;; blocks forever >> >> This leads me to wonder if core.async itself somehow got into a bad >> state. It's entirely possible I caused this by misusing the API somewhere >> in the codebase, but we use core.async so extensively that I wouldn't know >> where to begin looking. >> >> I'm wondering if someone more familiar with core.async internals has an >> idea about what could cause the above situation. Or if we notice it >> happening again, what could I do to gather more helpful information. >> >> I also have a redacted thread dump, in case it's useful: >> >> https://gist.github.com/anonymous/65049ffdd37d43df8f23630928e8fed0 >> >> Any help would be much appreciated, >> >> Aaron >> >> P.S. core.async has been a godsend in terms of helping us structure and >> modularize our large system. Thank you to all those who contributed to >> this wonderful library! >> >> -- > You received this message because you are subscribed to the Google > Groups "Clojure" group. > To post to this group, send email to clojure@googlegroups.com > Note that posts from new members are moderated - please be patient with > your first post. > To unsubscribe from this group, send email to > clojure+unsubscr...@googlegroups.com > For more options, visit this group at > http://groups.google.com/group/clojure?hl=en > --- > You received this message because you are subscribed to the Google Groups > "Clojure" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to clojure+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -- “One of the main causes of the fall of the Roman Empire was that–lacking zero–they had no way to indicate successful termination of their C programs.” (Robert Firth) -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.