A weakness of my pmapall: #<Exception java.lang.Exception: Can't await in agent action>
Which means, I think, that I can't call pmapall within a function that I pass to pmapall. Unfortunate. Is there a better way? -Lee PS to see these exceptions one must change the call to agent in my definition with something like #(agent % :error-handler (fn [agnt except] (println except))). On Oct 10, 2011, at 4:07 PM, Lee Spector wrote: > > I think that the following partially answers my own question and that it > provides a way to get decent multicore performance for collections of > non-uniform but compute-intensive tasks through a simple, pmap-like interface. > > But I'm not sure if it's the best approach and I'd like some feedback. If it > *is* a good approach then maybe we should refine it and make it more widely > available. If it's not the best approach then I'd love some advice on how to > do it better. > > The use case here -- which I think must be shared by at least some others -- > is that I have a finite, non-lazy collection of inputs on which I'd like to > run an expensive (but not uniformly expensive) function, gathering the > results in a non-lazy sequence. This is a *very* common need in my own > projects. I don't care about the order in which the function calls are made, > and I'm not concerned about the memory overhead of retaining all of the > results since I want to keep them all anyway. I just want all of the > computations done as quickly as possible, using all of my available cores. > The pmap function seems at first to provide an elegant way to do what's > needed, e.g. with (doall (pmap f inputs)), but as discussed earlier in this > thread it will often wait for the completion of earlier calls before starting > later calls, and this will be particularly problematic when the runtimes of > the calls are uneven. > > The medusa package provides something that would seem to fit the bill better, > but it comes with other baggage that I don't need or want. I just want a > version of pmap that will use all available CPU resources to aggressively > complete all of the computations in a non-lazy context. > > Here's my stab at doing this using agents: > > (defn pmapall > "Like pmap but: 1) coll should be finite, 2) the returned sequence > will not be lazy, 3) calls to f may occur in any order, to maximize > multicore processor utilization, and 4) takes only one coll so far." > [f coll] > (let [agents (map agent coll)] > (dorun (map #(send % f) agents)) > (apply await agents) > (doall (map deref agents)))) > > I should make a version that takes multiple colls, but I for now've written > it to take just one for clarity. > > This does appear to speed things up pretty significantly in certain > circumstances, but maybe not as much as possible. Is it the best approach? > > To show that it beats pmap I define a time-wasting function (I want to see > real cpu utilization so I'm not using delays) like this: > > (defn burn > [] > (dotimes [i 10000] > (reduce * (map float (take 1000 (iterate inc i)))))) > > And then I define a function that takes a lot or a little bit of time > depending on its argument: > > (defn fast-or-slow > [n] > (if (zero? n) > :done > (do (burn) > :done))) > > And then I create an vector of inputs in which the slow ones are scattered > sparsely: > > (def inputs (take 1000 (cycle (conj (repeat 20 0) 1)))) > > And then on a 48 core node I get timings like this: > > user=> (time (last (pmapall fast-or-slow inputs))) > "Elapsed time: 37244.151 msecs" > :done > > user=> (time (last (doall (pmap fast-or-slow inputs)))) > "Elapsed time: 110862.187 msecs" > :done > > And by the way, plain old serial map does this: > > user=> (time (last (doall (map fast-or-slow inputs)))) > "Elapsed time: 260941.836 msecs" > > So we've improved things; pmap is a little better than twice as fast as map, > and pmapall is roughly 3 times faster than pmap. So I think I'm ready to > switch all of my pmaps to pmapalls. But that's still nothing close to a 48x > speedup, even though all of the tasks should be completely independent and I > wouldn't expect a huge loss for coordination. And another confusing thing is > that even with pmapall, when I look at the CPU utilization I see numbers > close to 4800% in some cases (like the one above) but numbers topping out at > something more like 1900% in some others (e.g. with different input vectors). > > So I feel like I'm moving in the right direction but that I'm still probably > missing something. > > Is there a better way to do this? Surely it will come in handy for others as > well if there's a simple way to more effectively dispatch tasks to multiple > cores. > > Thoughts? Code? > > Thanks, > > -Lee > > > On Oct 9, 2011, at 7:24 PM, Lee Spector wrote: > >> >> I've been playing with medusa and it sometimes does what I expect, but >> sometimes it's doing something strange and I'm wondering if someone can help >> me to do one specific medusa-like thing but more simply (and without the >> strangeness, which I haven't fully traced but I hope to avoid having to). >> >> What I want is a version of pmap that will always use any available cores to >> compute remaining values (except of course the last couple of values, when >> there are less remaining values than cores). >> >> In other words, I want the behavior that Andy Fingerhut describes medusa as >> having here: >> >>> On Sep 22, 2011, at 11:34 PM, Andy Fingerhut wrote: >>> >>>> pmap will limit the maximum number of simultaneous threads. So will the >>>> medusa library's medusa-pmap. >>>> >>>> The difference is that if one job early in the list takes, e.g., 100 times >>>> longer than the next 20, and you have 4 cpus available, pmap will start >>>> the first (4+2)=6 in parallel threads, let jobs 2 through 6 complete, and >>>> then wait for the first one to finish before starting number 7. Thus most >>>> of the time will be spent running one thread. This has the advantage of >>>> limiting the memory required to store intermediate results, but the >>>> disadvantage of low usage of multiple cpu cores. >>>> >>>> Medusa has a medusa-pmap that will also limit the parallelism, but it lets >>>> you pick the level of parallelism (it isn't fixed at (# cpus + 2)), and it >>>> will continue starting new threads, even if that means using more memory >>>> when one job takes much longer than the following jobs. >>>> >> >> FWIW I'll always be calling this on a finite sequence (although it may have >> 10000 elements or so, so I presume that I shouldn't start threads for all of >> them at once), and I will want a fully realized result sequence so I'll >> surround calls with doall if necessary. I will want all of the pmapped >> computations to finish before I continue doing anything else. >> >> I know that I could launch the threads in other ways -- e.g. I sometimes use >> agents for something similar -- but pmap is much more elegant in many cases, >> and anyway I'm not even sure I'm getting full utilization with my other >> methods. >> >> The medusa-pmap function is very close to what I want but medusa seems to do >> more that what I need, it requires initi/startup calls, it involves timeouts >> which I will never want, and it behaves strangely when I run my code on a 48 >> core node. (It runs fine on my 4-core laptop, and then it seems to work >> beautifully on the 48 core node too for a while, giving me nearly 4800% >> utilization, but then it does something that looks like it might be caused >> by everything hitting the timeouts... which I bumped up by several orders of >> magnitude but I'm still getting the weird behavior). >> >> So: Is there a simpler way to get "pmap but not lazy and use all cores fully >> until the whole result sequence is completed"? This is usually what I really >> want when I'm writing code that utilizes concurrency. >> >> I've looked at the pmap source but it isn't obvious to me how to change that >> to do what I want... So any pointers would be appreciated. >> >> Thanks, >> >> -Lee > > -- > You received this message because you are subscribed to the Google > Groups "Clojure" group. > To post to this group, send email to clojure@googlegroups.com > Note that posts from new members are moderated - please be patient with your > first post. > To unsubscribe from this group, send email to > clojure+unsubscr...@googlegroups.com > For more options, visit this group at > http://groups.google.com/group/clojure?hl=en -- Lee Spector, Professor of Computer Science Cognitive Science, Hampshire College 893 West Street, Amherst, MA 01002-3359 lspec...@hampshire.edu, http://hampshire.edu/lspector/ Phone: 413-559-5352, Fax: 413-559-5438 -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en