Sorry, you had mentioned 1.1. The hint should be #^Callable
On Aug 4, 11:24 am, Lee Spector <lspec...@hampshire.edu> wrote: > In Clojure 1.1.0 (which is what I have running on the big machines) I get a > warning and then an error from your ^Callable line: > > WARNING: reader macro ^ is deprecated; use meta instead > Exception in thread "main" java.lang.IllegalArgumentException: let requires > an even number of forms in binding vector (concur.clj:42) > > What's the right way to patch that? > > -Lee > > On Aug 4, 2010, at 2:08 PM, Armando Blancas wrote: > > > > > What about a more direct way of creating your threads. This code is > > too simple and more is needed to collect results with futures, but I > > wonder how something like this would perform on your machine: > > > (defn burn-via-pool [n] > > (print n " burns via a thread pool: ") > > (time > > (let [cores (.. Runtime getRuntime availableProcessors) > > pool (java.util.concurrent.Executors/newFixedThreadPool > > cores) > > ^Callable func (fn [] (burn))] > > (dotimes [_ n] (.submit pool func)) > > (.shutdown pool) > > (.awaitTermination pool 1 java.util.concurrent.TimeUnit/ > > HOURS)))) > > > On Aug 4, 7:36 am, Lee Spector <lspec...@hampshire.edu> wrote: > >> Apologies for the length of this message -- I'm hoping to be complete, but > >> that made the message pretty long. > > >> Also BTW most of the tests below were run using Clojure 1.1. If part of > >> the answer to my questions is "use 1.2" then I'll upgrade ASAP (but I > >> haven't done so yet because I'd prefer to be confused by one thing at a > >> time :-). I don't think that can be the full answer, though, since the > >> last batch of runs below WERE run under 1.2 and they're also problematic... > > >> Also, for most of the runs described here (with the one exception noted > >> below) I am running under Linux: > > >> [lspec...@fly ~]$ cat /proc/version > >> Linux version 2.6.18-164.6.1.el5 (mockbu...@builder10.centos.org) (gcc > >> version 4.1.2 20080704 (Red Hat 4.1.2-46)) #1 SMP Tue Nov 3 16:12:36 EST > >> 2009 > > >> with this Java version: > > >> [lspec...@fly ~]$ java -version > >> java version "1.6.0_16" > >> Java(TM) SE Runtime Environment (build 1.6.0_16-b01) > >> Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode) > > >> SO: Most of the documentation and discussion about clojure concurrency is > >> about managing state that may be shared between concurrent processes, but > >> I have what I guess are more basic questions about how concurrent > >> processes can/should be started even in the absence of shared state (or > >> when all that's shared is immutable) and about how to get the most out of > >> concurrency on multiple cores. > > >> I often have large numbers of relatively long, independent processes and I > >> want to farm them out to multiple cores. (For those who care this is often > >> in the context of evolutionary computation systems, with each of the > >> processes being a fitness test.) I had thought that I was farming these > >> out in the right way to multiple cores, using agents or sometimes just > >> pmap, but then I noticed that my runtimes weren't scaling in the way that > >> I expected across machines with different numbers of cores (even though I > >> usually saw near total utilization of all cores in "top"). > > >> This led me to do some more systematic testing and I'm confused/concerned > >> about what I'm seeing, so I'm going to present my tests and results here > >> in the hope that someone can clear things up for me. I know that timing > >> things in clojure can be complicated both on account of laziness and on > >> account of optimizations that happen on the Java side, but I think I've > >> done the right things to avoid getting tripped up too much by these > >> issues. Still, it's quite possible that I've coded some things incorrectly > >> and/or that I'm misunderstanding some basic concepts, and I'd appreciate > >> any help that anyone can provide. > > >> First I defined a function that would take a non-trivial amount of time to > >> execute, as follows: > > >> (defn burn > >> ([] (count > >> (take 1E6 > >> (repeatedly > >> #(* 9999999999 9999999999))))) > >> ([_] (burn))) > > >> The implementation with an ignored argument just serves to make some of my > >> later calls neater -- I suppose I might incur a tiny additional cost when > >> calling it that way but this will be swamped by the things I'm timing. > > >> Then I defined functions for calling this multiple times either > >> sequentially or concurrently, using three different techniques for > >> starting the concurrent processes: > > >> (defn burn-sequentially [n] > >> (print n " sequential burns: ") > >> (time (dotimes [i n] (burn)))) > > >> (defn burn-via-pmap [n] > >> (print n " burns via pmap: ") > >> (time (doall (pmap burn (range n))))) > > >> (defn burn-via-futures [n] > >> (print n " burns via futures: ") > >> (time (doall (pmap deref (map (fn [_] (future (burn))) > >> (range n)))))) > > >> (defn burn-via-agents [n] > >> (print n " burns via agents: ") > >> (time (let [agents (map #(agent %) (range n))] > >> (dorun (map #(send % burn) agents)) > >> (apply await agents)))) > > >> Finally, since there's often quite a bit of variability in the run time of > >> these things (maybe because of garbage collection? Optimization? I'm not > >> sure), I define a simple macro to execute a call three times: > > >> (defmacro thrice [expression] > >> `(do ~expression ~expression ~expression)) > > >> Now I can do some timings, and I'll first show you what happens in one of > >> the cases where everything performs as expected. > > >> On a 16-core machine (details > >> athttp://fly.hampshire.edu/ganglia/?p=2&c=Rocks-Cluster&h=compute-4-1.l...), > >> running four burns thrice, with the code: > > >> (thrice (burn-sequentially 4)) > >> (thrice (burn-via-pmap 4)) > >> (thrice (burn-via-futures 4)) > >> (thrice (burn-via-agents 4)) > > >> I get: > > >> 4 sequential burns: "Elapsed time: 2308.616 msecs" > >> 4 sequential burns: "Elapsed time: 1510.207 msecs" > >> 4 sequential burns: "Elapsed time: 1182.743 msecs" > >> 4 burns via pmap: "Elapsed time: 470.988 msecs" > >> 4 burns via pmap: "Elapsed time: 457.015 msecs" > >> 4 burns via pmap: "Elapsed time: 446.84 msecs" > >> 4 burns via futures: "Elapsed time: 417.368 msecs" > >> 4 burns via futures: "Elapsed time: 401.444 msecs" > >> 4 burns via futures: "Elapsed time: 398.786 msecs" > >> 4 burns via agents: "Elapsed time: 421.103 msecs" > >> 4 burns via agents: "Elapsed time: 426.775 msecs" > >> 4 burns via agents: "Elapsed time: 408.416 msecs" > > >> The improvement from the first line to the second is something I always > >> see (along with frequent improvements across the three calls in a > >> "thrice"), and I assume this is due to optimizations talking place in the > >> JVM. Then we see that all of the ways of starting concurrent burns perform > >> about the same, and all produce a speedup over the sequential burns of > >> somewhere in the neighborhood of 3x-4x. Pretty much exactly what I would > >> expect and want. So far so good. > > >> However, in the same JVM launch I then went on to do the same thing but > >> with 16 and then 48 burns in each call: > > >> (thrice (burn-sequentially 16)) > >> (thrice (burn-via-pmap 16)) > >> (thrice (burn-via-futures 16)) > >> (thrice (burn-via-agents 16)) > > >> (thrice (burn-sequentially 48)) > >> (thrice (burn-via-pmap 48)) > >> (thrice (burn-via-futures 48)) > >> (thrice (burn-via-agents 48)) > > >> This produced: > > >> 16 sequential burns: "Elapsed time: 5821.574 msecs" > >> 16 sequential burns: "Elapsed time: 6580.684 msecs" > >> 16 sequential burns: "Elapsed time: 6648.013 msecs" > >> 16 burns via pmap: "Elapsed time: 5953.194 msecs" > >> 16 burns via pmap: "Elapsed time: 7517.196 msecs" > >> 16 burns via pmap: "Elapsed time: 7380.047 msecs" > >> 16 burns via futures: "Elapsed time: 1168.827 msecs" > >> 16 burns via futures: "Elapsed time: 1068.98 msecs" > >> 16 burns via futures: "Elapsed time: 1048.745 msecs" > >> 16 burns via agents: "Elapsed time: 1041.05 msecs" > >> 16 burns via agents: "Elapsed time: 1030.712 msecs" > >> 16 burns via agents: "Elapsed time: 1041.139 msecs" > >> 48 sequential burns: "Elapsed time: 15909.333 msecs" > >> 48 sequential burns: "Elapsed time: 14825.631 msecs" > >> 48 sequential burns: "Elapsed time: 15232.646 msecs" > >> 48 burns via pmap: "Elapsed time: 13586.897 msecs" > >> 48 burns via pmap: "Elapsed time: 3106.56 msecs" > >> 48 burns via pmap: "Elapsed time: 3041.272 msecs" > >> 48 burns via futures: "Elapsed time: 2968.991 msecs" > >> 48 burns via futures: "Elapsed time: 2895.506 msecs" > >> 48 burns via futures: "Elapsed time: 2818.724 msecs" > >> 48 burns via agents: "Elapsed time: 2802.906 msecs" > >> 48 burns via agents: "Elapsed time: 2754.364 msecs" > >> 48 burns via agents: "Elapsed time: 2743.038 msecs" > > >> Looking first at the 16-burn runs, we see that concurrency via pmap is > >> actually generally WORSE than sequential. I cannot understand why this > >> should be the case. I guess if I were running on a single core I would > >> expect to see a slight loss when going to pmap because there would be some > >> cost for managing the 16 threads that wouldn't be compensated for by > >> actual concurrency. But I'm running on 16 cores and I should be getting a > >> major speedup, not a slowdown. There are only 16 threads, so there > >> shouldn't be a lot of time lost to overhead. > > >> Also interesting, in this case when I start the processes using futures or > >> agents I DO see a speedup. It's on the order of 6x-7x, not close to the > >> 16x that I would hope for, but at least it's a speedup. Why is this so > >> different from the case with pmap? (Recall that my pmap-based method DID > >> produce about the same speedup as my other methods when doing only 4 > >> burns.) > > >> For the calls with 48 burns we again see nearly the expected, reasonably > >> good pattern with all concurrent calls performing nearly equivalently (I > >> suppose that the steady improvement over all of the calls is again some > >> kind of JVM optimization), with a speedup in the concurrent calls over the > >> sequential calls in the neighborhood of 5x-6x. Again, not the ~16x that I > >> might hope for, but at least it's in the right direction. The very first > >> of the pmap calls with 48 burns is an > > ... > > read more »- Hide quoted text - > > - Show quoted text - -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en