Hm. float-seq may not fit in memory. Perhaps I can read it in blocks. On Monday, October 14, 2013 11:01:52 AM UTC-7, Herwig Hochleitner wrote: > > + make sure to pour float-seq into a vector before r/map, to make full use > of parallel folding > > > 2013/10/14 Herwig Hochleitner <hhochl...@gmail.com <javascript:>> > >> Try >> >> (require '[clojure.core.reducers :as r]) >> (reduce (fn [res val] (get-ids val)) >> nil (r/map encode float-seq)) >> >> This should parallel fold encode over float-seq (r/map) and then map >> get-ids in order, but without allocation. >> >> >> 2013/10/14 Brian Craft <craft...@gmail.com <javascript:>> >> >>> I'm walking a seq of many millions of floats, encoding them for the >>> persistence layer, and getting sequence ids from the db. So, conceptually >>> there are two parts: the slow part, and the side-effecting part. Vaguely >>> like >>> >>> (map get-ids (map encode float-seq)) >>> >>> which is later reduced while writing to disk. In the get-ids step the >>> order matters. My first attempt to make the slow part parallel was to >>> use pmap, >>> >>> (map get-ids (pmap encode float-seq)) >>> >>> However that's actually slower. I expect this is because even though >>> "encode" is the bottleneck, it's still faster than the overhead of pmap. I >>> next tried pmap over groups of floats, a bit like >>> >>> (map get-ids (flatten (pmap #(map encode %) (partition-all 20000 >>> float-seq)))) >>> >>> (sorry for any typos, I'm just pseudo-coding here) This was still >>> slower, which surprised me. I understand the first pmap result, but this >>> one is puzzling to me. Even if I partition half the length of the seq (so >>> in theory it can run two threads, each of which will run five or six >>> seconds), it's no faster than map. Part of this seems to be the >>> overhead of creating more intermediate seqs. Perhaps I'm misunderstanding >>> what's happening during partition-all. >>> >>> Is there some obvious way to approach this scenario? I looked briefly at >>> the reducers library, however it was unclear to me how to deal with the >>> side-effecting portion of the operation. The second (fast) map operation >>> needs to be done in order. >>> >>> -- >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Clojure" group. >>> To post to this group, send email to clo...@googlegroups.com<javascript:> >>> Note that posts from new members are moderated - please be patient with >>> your first post. >>> To unsubscribe from this group, send email to >>> clojure+u...@googlegroups.com <javascript:> >>> For more options, visit this group at >>> http://groups.google.com/group/clojure?hl=en >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "Clojure" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to clojure+u...@googlegroups.com <javascript:>. >>> For more options, visit https://groups.google.com/groups/opt_out. >>> >> >> >
-- -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.