Hi Nurullah,

Nurullah Akkaya <nurul...@nakkaya.com> writes:

> Yes but AFAIK you only get availableProcessors + 2 threads with pmap

That's good to know, is this documented somewhere?

> 
> which is fine when the task is CPU bound but for downloading web pages
> most of the time will be lost at waiting for I/O so having more
> threads would speed things up.
>

That certainly does seem to be the case, as this simple, quick, unscientific 
test
would suggest

,----[using the code at http://gist.github.com/399269]
| scraper>   (let [start (.getTime (Date.))
|         whole (future (dorun (pmap scrape (line-seq (BufferedReader. 
(FileReader. "url.list"))))))]
|     (deref whole)
|     (println (- (.getTime (Date.)) start)))
| 
| 4261
| nil
| scraper> (let [start (.getTime (Date.))
|         urls (line-seq (BufferedReader. (FileReader. "url.list")))
|         half (/ (count urls) 2)
|         first-half (future (dorun (pmap scrape (take half urls))))
|         second-half (future (dorun (pmap scrape (drop half urls))))]
|     (deref first-half) (deref second-half)
|     (println (- (.getTime (Date.)) start)))
| 2863
| nil
| scraper> 
`----

Thanks -- Eric

>
> Regards...
> --
> Nurullah Akkaya
> http://nakkaya.com
>
>
>
> On Thu, May 13, 2010 at 2:26 AM, Eric Schulte <schulte.e...@gmail.com> wrote:
>> Wouldn't this be simpler with pmap, e.g. http://gist.github.com/399269
>>
>> although to be honest I don't really know how the automatically
>> parallelized clojure functions decide how many threads to use.  Is the
>> JVM smart enough to only create as many system-level threads as make
>> sense on my hardware?
>>
>> Best -- Eric
>>
>> Nurullah Akkaya <nurul...@nakkaya.com> writes:
>>
>>> Since you don't need coordination or keep some sort of state, IMHO
>>> future is better suited for this.
>>> Following gist is my take, it first reads the file that contains the
>>> list of URLs to be downloaded then splits the list into number of
>>> thread pieces. Each future object gets a piece of the list and start
>>> processing in its own thread. Each URL is written to disk using a
>>> UUID.
>>>
>>> http://gist.github.com/399127
>>>
>>> Cheers...
>>> --
>>> Nurullah Akkaya
>>> http://nakkaya.com
>>>
>>>
>>>
>>> On Wed, May 12, 2010 at 9:29 PM, nickikt <nick...@gmail.com> wrote:
>>>> Hallo all,
>>>>
>>>> A friend of mine ask if there is a smart way to get the html code of
>>>> couple thousand links but with a script, it takes for ever since it
>>>> always has  takes a couple of seconds to get the connection.
>>>>
>>>> I needs to be multi threaded so we can use all of the download rate.
>>>> So I sad I could try it with clojure but I am pretty new in clojure
>>>> and java (almost through the Programming Clojure book but nothing
>>>> practical) and thought i just ask here instead of trying to copy some
>>>> java code in clojure and hack something bad.
>>>>
>>>> My idea would be to span a agent for every dump and control that there
>>>> won't be more then a 10-20  threads. And dump it into a file.
>>>>
>>>> How would you implement this?
>>>> A function that does all of it in a agent and control it with a
>>>> counter from a other function?
>>>>
>>>> Is there a clojure way to write a file or should I use the java way?
>>>> Same for reading a file.
>>>>
>>>> Here is some javacode that does part of what I want. Is there a better
>>>> way then proxy all of it?
>>>>
>>>> url = new URL("random page");
>>>> URLConnection conn = url.openConnection();
>>>> DataInputStream in = new DataInputStream ( conn.getInputStream
>>>> (  )  ) ;
>>>> BufferedReader d = new BufferedReader(new InputStreamReader(in));
>>>> while(d.ready())
>>>> {
>>>>        System.out.println( d.readLine());
>>>> }
>>>>
>>>> Just that this prints to console instead of a file.
>>>>
>>>> So tanks for reading. I'm working on it any tip or suggestion would
>>>> help.
>>
>> --
>> You received this message because you are subscribed to the Google
>> Groups "Clojure" group.
>> To post to this group, send email to clojure@googlegroups.com
>> Note that posts from new members are moderated - please be patient with your 
>> first post.
>> To unsubscribe from this group, send email to
>> clojure+unsubscr...@googlegroups.com
>> For more options, visit this group at
>> http://groups.google.com/group/clojure?hl=en
>>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to