Hi, I am working on a replication system which also works in the browser (1). So far I have come a long way with core.async and a pub-sub architecture, but I have avoided error-handling in the beginning, just using a cascading close on the pub-sub architecture on errors (e.g. disconnects). Lately I introduced <? and go-try as described here (2) and extended that to go-loops with dedicated error-channels. But this never felt right, as error handling in a distributed system should not be an afterthought. Erlang has a very successful and sound error handling concept, as Rich also pointed out (3).
The idea is basically that uncaught errors will always happen at some point and when they do they just propagate and trigger the restart of whole subsystems. Processes in Erlang terms, which are somewhat an extended version of go-routines, have ids and are explicitly wired to each other to receive exit messages when one of them fails. To retain the robustness of Erlang, I have reduced the concept to the following 2 requirements: 1) All errors (exceptions) need to be caught and propagated. 2) Resources need to be freed reliably. Since we operate with channels in core.async and not namend process as in Erlang, the supervision needs to be passed to routines by some other means unless go-routines were to be globally registered as in Erlang. The most natural way is passing a parameter to the routines, but this can be verbose. To reduce this, one could use bindings, but these are thread-local and difficult to reason about, which might break 1). For requirement 2) the Erlang VM can preemptively terminate processes. Since this is not possible in core.async, I have decided to inject "abort" exceptions on every blocking channel op, <?, >?, alt?, ... This is not perfect, but will eventually throw an exception in every go-routine and free all resources satisfying 2). The supervisor then needs to track all running go-routines under its supervision and only restart the subsystem once all routines are finished. Using the default try-catch-finally exception mechanism inside of go-routines + bubbling through <? makes the error-handling somewhat intuitive along Java/JavaScript semantics. Example (4): (let [test-fn (fn [super] (go-super super (try (<? (timeout 500)) (catch Exception e (println "Caught:" (.getMessage e))) (finally (<! (timeout 500)) (println "Cleaned up slowly."))))) start-fn (fn [super] (go-super super (go-try (throw (ex-info "stale" {}))) (test-fn super) (<? (timeout 300)) (throw (ex-info "foo" {}))))] (<?? (restarting-supervisor start-fn :retries 1 :stale-timeout 100))) Since one can accidentally leave exceptions stale in some channel without taking them, we also need to track these and act after some timeout to satisfy 1). I am really wondering what other people have thought about error handling with core.async so far and whether this is the first real attempt to satisfy these requirements. Since error-handling should be standardized to compose over library boundaries, I think something like this should move into core.async eventually. What do you think? Christian (1) https://github.com/replikativ/replikativ (2) http://swannodette.github.io/2013/08/31/asynchronous-error-handling/ (3) http://www.erlang.org/download/armstrong_thesis_2003.pdf (4) https://github.com/whilo/full.monty/blob/3ba439c6971c8255d34b2d39f1b7619b3b016236/full.async/src/full/async.clj#L451 -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.