Hi all, I would really appreciate some comments and criticism on my cells implementation, which I've uploaded to this group's files as dataflow.clj . My goal was to fully exploit the available concurrency, but not do any unnecessary cell evaluations.
My solution is lazy rather than push-based; if you change a ref that cells depend on, nothing happens to the cells until you actually ask for their values. If you call (evaluate [cell1 cell2 cell3 ...]) first cell1, cell2, cell3, ... and all their ancestors have their values set to {:updating}, and the root cells of the computation (all ancestors of cell1 etc. whose parents are non-cell identities like refs) are identified. Then, messages are sent to the root cells telling them to recompute their values. Each time a cell p is able to compute its value, it sends a message to each of its children c that needs to update. The message extends c's value, which is a map, with the key and value {p (p's value)}. If c has received reports from all its cell parents, it retrieves values of its non-cell parents, computes its value, and reports to its children. When cell1, cell2, cell3 ... are all able to compute, the computation is over. You can optionally call (sync-evaluate [cell1 cell2 cell3 ...]) which blocks until the computation finishes using a Java countdown latch. This approach (I think) guarantees that each cell recomputes at most once per computation. This helps certain models perform much better than with push-based cells. For example, think about m generations of n cells each, where every cell in a generation is a parent of every cell in the next generation. Say all the root cells depend on a single ref. If you change the value of the ref, all nm cells need to update. If you do it with pushes, you first have to update all the root cells, for n updates. Then each root cell pushes to all of its children, for n^2 updates. You end up doing waaay more than nm updates. You could update the generations sequentially to get back to good performance... but I feel like my approach will incur less overhead than finding all of the generations. The example in the testing section of dataflow.clj is less dramatic. The dependency graph, with -> pointing from parents to child, is: a b -> c a c -> d a -> e c e -> f The roots are a and b, and each update function takes 1s. With my implementation, it runs in 3s with 6 updates: - a and b update together - c and e update together - d and f update together whereas if you were using a push-based scheme I think it would take 4s with 9 updates. It would take 4s because: - a and b update together - c has to update twice in sequence, once in response to a and once in response to b - d and f have to update together in response to c's final update. My implementation runs the test, but: - It's pretty ugly, - I'd eventually like it to be fast & am sure it's nowhere near its potential, - I feel like I'm not taking full advantage of the language. After the basic idea gets some scrutiny, I'd like to include a system for handling exceptions. I'll also make the cells maintain caches keyed by their non-cell ancestors. This will save a lot of coordination overhead in addition to actual computation because if you evaluate a cell twice in a row it won't have to bother its cell ancestors at all the second time. I haven't given much thought to circular dependencies, because I don't currently need them. Thanks very much! Anand --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~----------~----~----~----~------~----~------~--~---