go blocks are multiplexed over a thread pool which has (by default) 8 
threads. You should never perform any kind of blocking activity inside a go 
block, because if every go block in work happens to end up blocked, you 
will prevent all go blocks from making any further progress. It sounds to 
me like that's what has happened here. The go block threads are named 
"async-dispatch-<n>" and it looks like there are 8 blocked ones in your 
thread dump.

It also looks like they are all blocking on a >!!, which is a blocking 
call. So I would look for a go block that contains a >!! and convert that 
to a >! or do something else to avoid blocking there.


On Tuesday, August 29, 2017 at 11:48:25 AM UTC-5, Aaron Iba wrote:
>
> My company has a production system that uses core.async extensively. We've 
> been running it 24/7 for over a year with occasional restarts to update 
> things and add features, and so far core.async has been working great.
>
> The other day, during a particularly high workload, the whole system got 
> locked up. All the channels seemed blocked at once.  I was able to connect 
> with a REPL and poke around, and noticed strange behavior of core.async. 
> Specifically, the following code, when evaluated in the REPL, blocked on 
> the put (third expression):
>
> (def c (async/chan))
> (go-loop []
>   (when-some [x (<! c)]
>     (println x)
>     (recur)))
> (>!! c true)
>
> Whereas on any fresh system, the above expressions obviously succeed.
>
> Puts succeeded if they went onto the channel's buffer, but not when they 
> should go through to a consumer. For example with the following 
> expressions, evaluated in the REPL, the first put succeeded (presumably 
> because it went on the buffer), but subsequent puts blocked:
>
> (def c (async/chan 1))
> (def m (async/mult c))
> (def out (async/chan (async/sliding-buffer 3)))
> (async/tap m out)
> (>!! c true) ;; succeeds
> (>!! c true) ;; blocks forever
>
> This leads me to wonder if core.async itself somehow got into a bad state. 
> It's entirely possible I caused this by misusing the API somewhere in the 
> codebase, but we use core.async so extensively that I wouldn't know where 
> to begin looking.
>
> I'm wondering if someone more familiar with core.async internals has an 
> idea about what could cause the above situation. Or if we notice it 
> happening again, what could I do to gather more helpful information.
>
> I also have a redacted thread dump, in case it's useful:
>
> https://gist.github.com/anonymous/65049ffdd37d43df8f23630928e8fed0
>
> Any help would be much appreciated,
>
> Aaron
>
> P.S. core.async has been a godsend in terms of helping us structure and 
> modularize our large system.  Thank you to all those who contributed to 
> this wonderful library!
>
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to