I have the same use case: walking a seq of an input file, and doing file/db operations for each row. pmap is working very well, but it has required a lot of attention to the data flow, to make sure that no significant compute is done in the main thread. Otherwise IO blocks the compute.
I briefly tried working with the reducers library, which generally made things 2-3 times slower, presumably because I'm using it incorrectly. I would really like to see more reducers examples, e.g. for this case: reading a seq larger than memory, doing transforms on the data, and then executing side effects. On Thursday, October 17, 2013 4:04:51 AM UTC-7, Mikera wrote: > > On Thursday, 17 October 2013 10:34:18 UTC+8, Pradeep Gollakota wrote: > >> Hi All, >> >> I’m (very) new to clojure (and loving it)… and I’m trying to wrap my head >> around how to correctly choose doseq vs dorun for my particular use case. >> I’ve read this earlier post >> https://groups.google.com/forum/#!msg/clojure/8ebJsllH8UY/mXtixH3CRRsJand I >> had a clarifying question. >> >> From what I gathered in the above post, it’s more efficient to use doseq >> instead of dorun since map creates another seq. However, if the fn you want >> to apply on the seq can be parallelized, doseq wouldn’t give you the >> ability to parallelize. With dorun you can use pmap instead of map and get >> parallelization. >> >> (doseq [i some-lazy-seq] side-effect-fn) >> (dorun (pmap side-effect-fn some-lazy-seq)) >> >> What is the idiomatic way of parallelizing a computation on a lazy seq? >> > I don't think there is a single idiomatic way. It depends on lots of > things, e.g.: > - How expensive is each side-effect-fn? If it is cheap, then the ovehead > of making things parallel may not be worth it > - Do you want to constrain the thread pool or have a separate thread for > each element? For the later, futures are an option > - Where is the actual bottleneck? If an external resource is constrained, > CPU parallelization may not help you at all..... > - How is the lazy sequence being produced? Is it already realised, or > being computed on the fly? > - Is there any concern about ordering / concurrent access to resources / > race conditions? > > Assuming that side-effect-fn is relatively CPU-expensive and that the > runtimes of each call to it are reasonably similar, then I'd say that your > (dorun (pmap .....)) version is a decent choice. Otherwise you make want to > take a look at the "reducers" library - the Fork/Join capabilities are very > impressive and should do what you need. > -- -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.