I have the same use case: walking a seq of an input file, and doing file/db 
operations for each row. pmap is working very well, but it has required a 
lot of attention to the data flow, to make sure that no significant compute 
is done in the main thread. Otherwise IO blocks the compute.

I briefly tried working with the reducers library, which generally made 
things 2-3 times slower, presumably because I'm using it incorrectly. I 
would really like to see more reducers examples, e.g. for this case: 
reading a seq larger than memory, doing transforms on the data, and then 
executing side effects.

On Thursday, October 17, 2013 4:04:51 AM UTC-7, Mikera wrote:
>
> On Thursday, 17 October 2013 10:34:18 UTC+8, Pradeep Gollakota wrote:
>
>> Hi All,
>>
>> I’m (very) new to clojure (and loving it)… and I’m trying to wrap my head 
>> around how to correctly choose doseq vs dorun for my particular use case. 
>> I’ve read this earlier post 
>> https://groups.google.com/forum/#!msg/clojure/8ebJsllH8UY/mXtixH3CRRsJand I 
>> had a clarifying question.
>>
>> From what I gathered in the above post, it’s more efficient to use doseq 
>> instead of dorun since map creates another seq. However, if the fn you want 
>> to apply on the seq can be parallelized, doseq wouldn’t give you the 
>> ability to parallelize. With dorun you can use pmap instead of map and get 
>> parallelization.
>>
>> (doseq [i some-lazy-seq] side-effect-fn)
>> (dorun (pmap side-effect-fn some-lazy-seq))
>>
>> What is the idiomatic way of parallelizing a computation on a lazy seq?
>>
> I don't think there is a single idiomatic way. It depends on lots of 
> things, e.g.:
> - How expensive is each side-effect-fn? If it is cheap, then the ovehead 
> of making things parallel may not be worth it
> - Do you want to constrain the thread pool or have a separate thread for 
> each element? For the later, futures are an option
> - Where is the actual bottleneck? If an external resource is constrained, 
> CPU parallelization may not help you at all.....
> - How is the lazy sequence being produced? Is it already realised, or 
> being computed on the fly?
> - Is there any concern about ordering / concurrent access to resources / 
> race conditions?
>
> Assuming that side-effect-fn is relatively CPU-expensive and that the 
> runtimes of each call to it are reasonably similar, then I'd say that your 
> (dorun (pmap .....)) version is a decent choice. Otherwise you make want to 
> take a look at the "reducers" library - the Fork/Join capabilities are very 
> impressive and should do what you need.
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to