Hi Nurullah, Nurullah Akkaya <nurul...@nakkaya.com> writes:
> Yes but AFAIK you only get availableProcessors + 2 threads with pmap That's good to know, is this documented somewhere? > > which is fine when the task is CPU bound but for downloading web pages > most of the time will be lost at waiting for I/O so having more > threads would speed things up. > That certainly does seem to be the case, as this simple, quick, unscientific test would suggest ,----[using the code at http://gist.github.com/399269] | scraper> (let [start (.getTime (Date.)) | whole (future (dorun (pmap scrape (line-seq (BufferedReader. (FileReader. "url.list"))))))] | (deref whole) | (println (- (.getTime (Date.)) start))) | | 4261 | nil | scraper> (let [start (.getTime (Date.)) | urls (line-seq (BufferedReader. (FileReader. "url.list"))) | half (/ (count urls) 2) | first-half (future (dorun (pmap scrape (take half urls)))) | second-half (future (dorun (pmap scrape (drop half urls))))] | (deref first-half) (deref second-half) | (println (- (.getTime (Date.)) start))) | 2863 | nil | scraper> `---- Thanks -- Eric > > Regards... > -- > Nurullah Akkaya > http://nakkaya.com > > > > On Thu, May 13, 2010 at 2:26 AM, Eric Schulte <schulte.e...@gmail.com> wrote: >> Wouldn't this be simpler with pmap, e.g. http://gist.github.com/399269 >> >> although to be honest I don't really know how the automatically >> parallelized clojure functions decide how many threads to use. Is the >> JVM smart enough to only create as many system-level threads as make >> sense on my hardware? >> >> Best -- Eric >> >> Nurullah Akkaya <nurul...@nakkaya.com> writes: >> >>> Since you don't need coordination or keep some sort of state, IMHO >>> future is better suited for this. >>> Following gist is my take, it first reads the file that contains the >>> list of URLs to be downloaded then splits the list into number of >>> thread pieces. Each future object gets a piece of the list and start >>> processing in its own thread. Each URL is written to disk using a >>> UUID. >>> >>> http://gist.github.com/399127 >>> >>> Cheers... >>> -- >>> Nurullah Akkaya >>> http://nakkaya.com >>> >>> >>> >>> On Wed, May 12, 2010 at 9:29 PM, nickikt <nick...@gmail.com> wrote: >>>> Hallo all, >>>> >>>> A friend of mine ask if there is a smart way to get the html code of >>>> couple thousand links but with a script, it takes for ever since it >>>> always has takes a couple of seconds to get the connection. >>>> >>>> I needs to be multi threaded so we can use all of the download rate. >>>> So I sad I could try it with clojure but I am pretty new in clojure >>>> and java (almost through the Programming Clojure book but nothing >>>> practical) and thought i just ask here instead of trying to copy some >>>> java code in clojure and hack something bad. >>>> >>>> My idea would be to span a agent for every dump and control that there >>>> won't be more then a 10-20 threads. And dump it into a file. >>>> >>>> How would you implement this? >>>> A function that does all of it in a agent and control it with a >>>> counter from a other function? >>>> >>>> Is there a clojure way to write a file or should I use the java way? >>>> Same for reading a file. >>>> >>>> Here is some javacode that does part of what I want. Is there a better >>>> way then proxy all of it? >>>> >>>> url = new URL("random page"); >>>> URLConnection conn = url.openConnection(); >>>> DataInputStream in = new DataInputStream ( conn.getInputStream >>>> ( ) ) ; >>>> BufferedReader d = new BufferedReader(new InputStreamReader(in)); >>>> while(d.ready()) >>>> { >>>> System.out.println( d.readLine()); >>>> } >>>> >>>> Just that this prints to console instead of a file. >>>> >>>> So tanks for reading. I'm working on it any tip or suggestion would >>>> help. >> >> -- >> You received this message because you are subscribed to the Google >> Groups "Clojure" group. >> To post to this group, send email to clojure@googlegroups.com >> Note that posts from new members are moderated - please be patient with your >> first post. >> To unsubscribe from this group, send email to >> clojure+unsubscr...@googlegroups.com >> For more options, visit this group at >> http://groups.google.com/group/clojure?hl=en >> -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en