2009/5/7 Bradbev <brad.beveri...@gmail.com>:
>
> I have a 25Mb CSV text file that I want to process.  Simply running
> (time (dorun (read-lines "file"))) gives me about 1 second of read
> time, which is about as fast as you'll get (on my machine) I think.
> I believe that it should be possible to overlap the IO cost of reading
> from a file with processing cost, so that I should be able to do
> almost 1 second of processing on the data entirely in parallel.  But I
> can't do it!
>
> I was trying things like
> (let [lines (read-lines "file")]
>  (future (dorun lines)) ; pre-fetch lines in the background
>  (time (dorun (map some-func lines))))
>
> Which is a bit hacky, but should basically work in my mind.
> (As an aside, how does the seq caching work?  Where in the code is it
> implemented?)
>
> But it doesn't work :( - the time it takes to map some-func across the
> list is IO + compute, not (max IO-time compute-time).  If I sleep for
> a while between, then the compute time goes way down.

Are you sure you have more than one core ? (ok, just joking :-)

> This also leads me to think that it would be useful to have a function
> that precached a lazy seq, ie
> (pre-cache-seq 5 (range 1000)); returns a new lazy-seq that will keep
> 5 elements ahead by precaching on another thread.

wouldn't that be your (future (take 5 (range 1000))) trick ?

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To post to this group, send email to clojure@googlegroups.com
To unsubscribe from this group, send email to 
clojure+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to