Ahh that makes a lot of sense.  Indeed, I'm guilty of doing a blocking >!! 
inside a go-block.  I was so careful to avoid other kinds of blocking calls 
(like IO) that I forgot that blocking variants of core.async calls 
themselves were forbidden.

Thank you for pointing this out!  I will rewire things to not do this.

Per Gary's suggestion, I also think it'd be useful if core.async blocking 
ops checked a dynamic var (or a property of the thread itself) and at least 
warned if they are being called from a forbidden context.  To resolve my 
original issue, I'm considering doing this in my dev environment:

(doseq [v '[<!! >!!]]
  (alter-var-root (ns-resolve 'clojure.core.async v)
                  (fn [f]
                    (fn [& args]
                      (if (.startsWith (.getName (Thread/currentThread))
                                       "async-dispatch-")
                        (throw (Exception. (str v " called inside 
async-dispatch")))
                        (apply f args))))))




On Tuesday, August 29, 2017 at 1:43:53 PM UTC-4, Gary Trakhman wrote:
>
> Hm, I came across a similar ordering invariant (No code called by a go 
> block should ever call the blocking variants of core.async functions) while 
> wrapping an imperative API, and I thought it might be useful to use 
> vars/binding to enforce it.  Has this or other approaches been considered 
> in core.async?  I could see a *fixed-thread-pool* var being set and >!! 
> checking for false.
>
> An analogy in existing clojure.core would be the STM commute's 'must be 
> running in a transaction' check that uses a threadlocal. 
> https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/LockingTransaction.java#L205
>
> On Tue, Aug 29, 2017 at 1:30 PM Timothy Baldridge <tbald...@gmail.com 
> <javascript:>> wrote:
>
>> To add to what Alex said, look at this trace: 
>> https://gist.github.com/anonymous/65049ffdd37d43df8f23630928e8fed0#file-thread-dump-out-L1337-L1372
>>
>> Here we see a go block calling mapcat, and inside the inner map something 
>> is calling >!!. As Alex mentioned this can be a source of deadlocks. No 
>> code called by a go block should ever call the blocking variants of 
>> core.async functions (<!!, >!!, alts!!, etc.). So I'd start at the code 
>> redacted in those lines and go from there. 
>>
>>
>>
>> On Tue, Aug 29, 2017 at 11:09 AM, Alex Miller <al...@puredanger.com 
>> <javascript:>> wrote:
>>
>>> go blocks are multiplexed over a thread pool which has (by default) 8 
>>> threads. You should never perform any kind of blocking activity inside a go 
>>> block, because if every go block in work happens to end up blocked, you 
>>> will prevent all go blocks from making any further progress. It sounds to 
>>> me like that's what has happened here. The go block threads are named 
>>> "async-dispatch-<n>" and it looks like there are 8 blocked ones in your 
>>> thread dump.
>>>
>>> It also looks like they are all blocking on a >!!, which is a blocking 
>>> call. So I would look for a go block that contains a >!! and convert that 
>>> to a >! or do something else to avoid blocking there.
>>>
>>>
>>> On Tuesday, August 29, 2017 at 11:48:25 AM UTC-5, Aaron Iba wrote:
>>>>
>>>> My company has a production system that uses core.async extensively. 
>>>> We've been running it 24/7 for over a year with occasional restarts to 
>>>> update things and add features, and so far core.async has been working 
>>>> great.
>>>>
>>>> The other day, during a particularly high workload, the whole system 
>>>> got locked up. All the channels seemed blocked at once.  I was able to 
>>>> connect with a REPL and poke around, and noticed strange behavior of 
>>>> core.async. Specifically, the following code, when evaluated in the REPL, 
>>>> blocked on the put (third expression):
>>>>
>>>> (def c (async/chan))
>>>> (go-loop []
>>>>   (when-some [x (<! c)]
>>>>     (println x)
>>>>     (recur)))
>>>> (>!! c true)
>>>>
>>>> Whereas on any fresh system, the above expressions obviously succeed.
>>>>
>>>> Puts succeeded if they went onto the channel's buffer, but not when 
>>>> they should go through to a consumer. For example with the following 
>>>> expressions, evaluated in the REPL, the first put succeeded (presumably 
>>>> because it went on the buffer), but subsequent puts blocked:
>>>>
>>>> (def c (async/chan 1))
>>>> (def m (async/mult c))
>>>> (def out (async/chan (async/sliding-buffer 3)))
>>>> (async/tap m out)
>>>> (>!! c true) ;; succeeds
>>>> (>!! c true) ;; blocks forever
>>>>
>>>> This leads me to wonder if core.async itself somehow got into a bad 
>>>> state. It's entirely possible I caused this by misusing the API somewhere 
>>>> in the codebase, but we use core.async so extensively that I wouldn't know 
>>>> where to begin looking.
>>>>
>>>> I'm wondering if someone more familiar with core.async internals has an 
>>>> idea about what could cause the above situation. Or if we notice it 
>>>> happening again, what could I do to gather more helpful information.
>>>>
>>>> I also have a redacted thread dump, in case it's useful:
>>>>
>>>> https://gist.github.com/anonymous/65049ffdd37d43df8f23630928e8fed0
>>>>
>>>> Any help would be much appreciated,
>>>>
>>>> Aaron
>>>>
>>>> P.S. core.async has been a godsend in terms of helping us structure and 
>>>> modularize our large system.  Thank you to all those who contributed to 
>>>> this wonderful library!
>>>>
>>>> -- 
>>> You received this message because you are subscribed to the Google
>>> Groups "Clojure" group.
>>> To post to this group, send email to clo...@googlegroups.com 
>>> <javascript:>
>>> Note that posts from new members are moderated - please be patient with 
>>> your first post.
>>> To unsubscribe from this group, send email to
>>> clojure+u...@googlegroups.com <javascript:>
>>> For more options, visit this group at
>>> http://groups.google.com/group/clojure?hl=en
>>> --- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "Clojure" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to clojure+u...@googlegroups.com <javascript:>.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>>
>> -- 
>> “One of the main causes of the fall of the Roman Empire was that–lacking 
>> zero–they had no way to indicate successful termination of their C 
>> programs.”
>> (Robert Firth) 
>>
>> -- 
>> You received this message because you are subscribed to the Google
>> Groups "Clojure" group.
>> To post to this group, send email to clo...@googlegroups.com 
>> <javascript:>
>> Note that posts from new members are moderated - please be patient with 
>> your first post.
>> To unsubscribe from this group, send email to
>> clojure+u...@googlegroups.com <javascript:>
>> For more options, visit this group at
>> http://groups.google.com/group/clojure?hl=en
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "Clojure" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to clojure+u...@googlegroups.com <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to