Hm. float-seq may not fit in memory. Perhaps I can read it in blocks.

On Monday, October 14, 2013 11:01:52 AM UTC-7, Herwig Hochleitner wrote:
>
> + make sure to pour float-seq into a vector before r/map, to make full use 
> of parallel folding
>
>
> 2013/10/14 Herwig Hochleitner <hhochl...@gmail.com <javascript:>>
>
>> Try
>>
>> (require '[clojure.core.reducers :as r])
>> (reduce (fn [res val] (get-ids val))
>>         nil (r/map encode float-seq))
>>
>> This should parallel fold encode over float-seq (r/map) and then map 
>> get-ids in order, but without allocation.
>>
>>
>> 2013/10/14 Brian Craft <craft...@gmail.com <javascript:>>
>>
>>> I'm walking a seq of many millions of floats, encoding them for the 
>>> persistence layer, and getting sequence ids from the db. So, conceptually 
>>> there are two parts: the slow part, and the side-effecting part. Vaguely 
>>> like
>>>
>>> (map get-ids (map encode float-seq))
>>>
>>> which is later reduced while writing to disk. In the get-ids step the 
>>> order matters. My first attempt to make the slow part parallel was to 
>>> use pmap,
>>>
>>> (map get-ids (pmap encode float-seq))
>>>
>>> However that's actually slower. I expect this is because even though 
>>> "encode" is the bottleneck, it's still faster than the overhead of pmap. I 
>>> next tried pmap over groups of floats, a bit like
>>>
>>> (map get-ids (flatten (pmap #(map encode %) (partition-all 20000 
>>> float-seq))))
>>>
>>> (sorry for any typos, I'm just pseudo-coding here) This was still 
>>> slower, which surprised me. I understand the first pmap result, but this 
>>> one is puzzling to me. Even if I partition half the length of the seq (so 
>>> in theory it can run two threads, each of which will run five or six 
>>> seconds), it's no faster than map.  Part of this seems to be the 
>>> overhead of creating more intermediate seqs. Perhaps I'm misunderstanding 
>>> what's happening during partition-all.
>>>
>>> Is there some obvious way to approach this scenario? I looked briefly at 
>>> the reducers library, however it was unclear to me how to deal with the 
>>> side-effecting portion of the operation. The second (fast) map operation 
>>> needs to be done in order.
>>>  
>>> -- 
>>> -- 
>>> You received this message because you are subscribed to the Google
>>> Groups "Clojure" group.
>>> To post to this group, send email to clo...@googlegroups.com<javascript:>
>>> Note that posts from new members are moderated - please be patient with 
>>> your first post.
>>> To unsubscribe from this group, send email to
>>> clojure+u...@googlegroups.com <javascript:>
>>> For more options, visit this group at
>>> http://groups.google.com/group/clojure?hl=en
>>> --- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "Clojure" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to clojure+u...@googlegroups.com <javascript:>.
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>
>>
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to