Hi All, This e-mail is the product of some musing about refs, agents, and atoms. I would like to offer some thoughts on a possible fourth reference type that allows coordinated, asynchronous modification of state.
I would greatly appreciate comments from the community. Maybe some of these ideas have already been discussed on the list or incorporated into development, in which case I'm sorry for repeating what is already known---and please point me in the right direction so I can read more. I hope the ideas are not too far off base, but please let me know if you think they are. Let me start at the beginning. In Clojure, we have standard reference types for: * uncoordinated, synchronous state change, * coordinated, synchronous state change, * uncoordinated, asynchronous state change. Most Clojure users are familiar with the following diagram expressing the reference type taxonomy. (The ASCII diagrams in this e-mail are best viewed in a fixed-width font.) | Synchronous | Asynchronous --------------+--------------+----------------- Uncoordinated | Atoms | Agents Coordinated | Refs | It occurred to me recently that there is no built-in reference type for the final quadrant: coordinated, asynchronous change. There is no reference type, in other words, allowing one to write a "doasync" block similar to a dosync block: (doasync ;short for do asynchronously (send a inc) (send b inc)) causing a and b to change together atomically in the background. I thought to myself that it would be nice to have this ability in some applications, especially simulation, artificial life, and agent-based modeling. It allows one thread to effect coordinated change in other threads while getting on with its own work. I pondered more and realized that, although there is no built-in reference type for this purpose, one can make a first stab at the same effect using a combination of futures and refs: (future (dosync (alter a inc) (alter b inc))) The future wraps the atomic transaction and causes it to take place in a background thread. This solution satisfied me for a day or two. Upon some more thought, however, I realized that combining refs with futures is a recipe for livelock disaster. Here is why. In a massively concurrent program with heavy asynchronous state change, it is unlikely that any one ref will stay the same for very long. The kind of coordinated change offered by a ref transaction is unlikely to be successful in this kind of constantly-changing environment, because the values of the refs are constantly being changed out from under the transaction attempts. Consider the following illustration, in which I try to modify a ref at the same time that it is being changed continually from another thread: ;;;;;;;;;;;; begin example ;;;;;;;;;;;; (def a (ref 1)) ; the code for start-frequent-changes is at ; http://gist.github.com/373143 (start-frequent-changes a) ; this code will never print its message (let [done? (agent nil)] (future (dosync (alter a (fn negate [val] (Thread/sleep 300) (- val))) ; negate is a long-running function of its argument (send done? (fn [_] (println "Done!")))))) ;;;;;;;;;;;; end example ;;;;;;;;;;;; The principle illustrated by this example is the following: when one or more other threads are changing a ref, a computationally expensive transaction on that ref is likely to encounter livelock. The more expensive the transaction and the more frequent the changes, the less likely it is that the transaction will ever go through. It will retry over and over, consuming resources and potentially retrying forever. Rich introduced ensure in part to try to mitigate this problem. Livelock can be avoided in the example above by placing a simple "(ensure a)" call at the beginning of the dosync block. Ensure ensures that the transaction holds "write permissions" on the ref during the transaction, so that other threads cannot modify it. However, the combination of futures, refs, and ensure is not a terribly good solution to coordinated asynchronicity, either, because it has a problem of its own: The order in which futures are executed is not deterministic. Race conditions will occur, and this is a big problem when dealing with state. To illustrate this, suppose one has three refs, a, b, and c. (In a real scenario, instead of just three refs one might have three groups of 20 refs, but three refs is enough to illustrate.) One wants to complete three asynchronous transactions on these refs in the following order: first, modify a, then modify both a and b, then modify both a and c. The problem is this: There is no way, using futures, refs, and ensure, to guarantee that the three asynchronous transactions will happen in the order one intends. Running the following code will usually print 3, but sometimes it will print 2, because there is a race condition affecting the second and third futures. ;;;;;;;;;;;; begin example ;;;;;;;;;;;; (let [a (ref 1) b (ref 2) c (ref 3)] (future (dosync (ensure a) (alter a #(do (Thread/sleep 200) %)))) (future (dosync (ensure a) (alter a (fn [_] @b)) (alter b identity))) (future (dosync (ensure a) (alter a (fn [_] @c)) (alter c identity))) (Thread/sleep 300) (println @a)) ;;;;;;;;;;;; end example ;;;;;;;;;;;; This race condition should not be surprising, because the whole point of a futures is to evaluate its body quickly and asynchronously, without any guarantees as to timing. However, this kind of flexibility is not amenable to modifications of state. One usually wants modifications of state to happen in order, because such operations are usually not commutable. The solution to the race-condition problem already exists in the language, of course, in the semantics of agents. When using agents, one can rely on the fact that the functions sent by send and send-off will always be evaluated in the order they were sent. If one rewrites the last example using agents, one sees there is never the potential for a race condition. The following code always prints 3: ;;;;;;;;;;;; begin example ;;;;;;;;;;;; (let [a (agent 1) b (agent 2) c (agent 3)] (send a #(do (Thread/sleep 200) %)) (do (send a (fn [_] @b)) (send b identity)) (do (send a (fn [_] @c)) (send c identity)) (Thread/sleep 300) (println @a)) ;;;;;;;;;;;; end example ;;;;;;;;;;;; The reason this code is safe, as everyone on this list already knows, is that agents have a built-in queue. Unlike refs, which rely on a CAS+retry model, agents guarantee that modifications will be queued and take place in the order they were requested. All of these observations led me to try to expand my mental model of Clojure's reference types. Recall the diagram I started with: | Synchronous | Asynchronous --------------+--------------+----------------- Uncoordinated | Atoms | Agents Coordinated | Refs | This diagram is overly simplistic. In reality, the diagram should have three dimensions, not two. The three dimensions are: * synchronous vs. asynchronous * uncoordinated vs. coordinated * CAS+retry vs. queued <--- this dimension is important Redrawing the diagram in three dimensions, we now have eight cells instead of four. We can fill in six of the cells by combining the standard reference types with futures and await: Uncoordinated: | Synchronous | Asynchronous ----------+----------------+----------------- CAS+Retry | Atoms | Futures + atoms Queued | Agents + await | Agents Coordinated: | Synchronous | Asynchronous ----------+----------------+----------------- CAS+Retry | Refs | Futures + refs Queued | ? | ? Looking at this diagram, I realized that as powerful as the existing reference types are, it is impossible to fill in the ?'s easily using them alone. There is no reason this has to be so. I decided to try to prototype a new reference type to fill in the ?'s. For now, let me call this new reference type a serf (as in a worker you can order around). The semantics of serfs are as follows. This is tentative and very much subject to suggestions and criticism. Just as a dosync block wraps a ref transaction, a "send-dosync" block wraps a serf transaction. Within a send-dosync block, serfs are treated just like refs. You can alter and commute them, just as you would a ref. There are two differences between serfs and refs. First, the transaction as a whole is executed asynchronously, not synchronously. Second, each serf has a queue, just like agents do. Before a given transaction is executed, the transaction must reach the front of the queue of every serf involved in the transaction. In other words, the transaction will be held until every serf involved is free. Accordingly, the transaction will be executed once and only once; it will never retry, because it will never contend with another transaction. Because a serf's queue is blocked while waiting for other serfs to get ready, serfs offer less concurrency than refs+ensure, which offer less concurrency still than plain refs. However, serfs allow for deterministic order of execution in the context of asynchronous, coordinated transactions, which is something neither refs nor agents can provide. For many applications, this fact may be enough to outweigh the loss of some concurrency. Here is a simple example. Remember, within each send-dosync block, the serfs are treated just like refs. (Indeed, by macro magic they *are* refs within that scope.) ;;;;;;;;;;;; begin example ;;;;;;;;;;;; (do (def foo (serf 0)) (println "Before send-dosync: " (serf-deref foo)) (send-dosync [foo] (alter foo inc)) (Thread/sleep 100) (println "Some time after send-dosync: " (serf-deref foo))) ;;;;;;;;;;;; end example ;;;;;;;;;;;; ;;;;;;;;;;;; begin example ;;;;;;;;;;;; (let [a (serf 1)] (send-dosync [a] (println "Type of a: " (type a)))) ; prints "clojure.lang.Ref" ;;;;;;;;;;;; end example ;;;;;;;;;;;; And here is our old example, the one for which refs+ensure resulted in race-condition behavior: ;;;;;;;;;;;; begin example ;;;;;;;;;;;; (let [a (serf 1) b (serf 2) c (serf 3)] (send-dosync [a] (alter a #(do (Thread/sleep 200) %))) (send-dosync [a b] (alter a (fn [_] @b)) (alter b identity)) (send-dosync [a c] (alter a (fn [_] @c)) (alter c identity)) (Thread/sleep 300) (println (serf-deref a))) ;;;;;;;;;;;; end example ;;;;;;;;;;;; This will always print 3, never 2, because serfs offer deterministic order of execution. The prototype code for serf, serf-deref, and send-dosync is at http://gist.github.com/373416 The syntax is far from pretty. Ideally, send-dosync would not require a first vector argument at all; it should be able to extract the relevant serfs from the calls to alter and commute inside it, just like dosync does. For now, please overlook the shoddy syntax and code. I would really appreciate any and all comments on the idea of this reference type and on the semantics. As you'll quickly see if you look at the code, serfs are internally just a ref with an agent standing sentinel. send-dosync is a simple promise-based function that makes sure all serfs are ready before starting the dosync transaction on the underlying refs. There's no special STM magic involved. The semantics of serfs are built up using the semantics of refs, agents, and promises. Nonetheless, I think the whole may be qualitatively different than the sum of the parts, because serfs could be said to fill out the "periodic table" of Clojure reference types: Uncoordinated: | Synchronous | Asynchronous ----------+----------------+----------------- CAS+Retry | Atoms | Futures + atoms Queued | Agents + await | Agents Coordinated: | Synchronous | Asynchronous ----------+----------------+----------------- CAS+Retry | Refs | Futures + refs Queued | Serfs + await | Serfs (In theory, serfs may be able to support CAS+retry semantics and queued semantics on the same piece of state. However, I'll leave this thought for another day because I've gone on long enough already.) If you have read this far, thank you very much. I look forward to hearing the community's reactions and thoughts. Sincerely, Garth -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en