2009/5/7 Bradbev <brad.beveri...@gmail.com>: > > I have a 25Mb CSV text file that I want to process. Simply running > (time (dorun (read-lines "file"))) gives me about 1 second of read > time, which is about as fast as you'll get (on my machine) I think. > I believe that it should be possible to overlap the IO cost of reading > from a file with processing cost, so that I should be able to do > almost 1 second of processing on the data entirely in parallel. But I > can't do it! > > I was trying things like > (let [lines (read-lines "file")] > (future (dorun lines)) ; pre-fetch lines in the background > (time (dorun (map some-func lines)))) > > Which is a bit hacky, but should basically work in my mind. > (As an aside, how does the seq caching work? Where in the code is it > implemented?) > > But it doesn't work :( - the time it takes to map some-func across the > list is IO + compute, not (max IO-time compute-time). If I sleep for > a while between, then the compute time goes way down.
Are you sure you have more than one core ? (ok, just joking :-) > This also leads me to think that it would be useful to have a function > that precached a lazy seq, ie > (pre-cache-seq 5 (range 1000)); returns a new lazy-seq that will keep > 5 elements ahead by precaching on another thread. wouldn't that be your (future (take 5 (range 1000))) trick ? --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~----------~----~----~----~------~----~------~--~---