aditya shukla wrote:
Hello people,
I have 5 directories corresponding 5 different urls .I want to
download images from those urls and place them in the respective
directories.I have to extract the contents and download them
simultaneously.I can extract the contents and do then one by one. My
questions is for doing it simultaneously do I have to use threads?
Please point me in the right direction.
Thanks
Aditya
You've been given some bad advice here.
First -- threads are lighter-weight than processes, so threads are
probably *more* efficient. However, with only five thread/processes,
the difference is probably not noticeable. (If the prejudice against
threads comes from concerns over the GIL -- that also is a misplaced
concern in this instance. Since you only have network connection, you
will receive only one packet at a time, so only one thread will be
active at a time. If the extraction process uses a significant enough
amount of CPU time so that the extractions are all running at the same
time *AND* if you are running on a machine with separate CPU/cores *AND*
you would like the extractions to be running truly in parallel on those
separate cores, *THEN*, and only then, will processes be more efficient
than threads.)
Second, running 5 wgets is equivalent to 5 processes not 5 threads.
And third -- you don't have to use either threads *or* processes. There
is another possibility which is much more light-weight: asynchronous
I/O, available through the low level select module, or more usefully
via the higher-level asyncore module. (Although the learning curve
might trip you up, and some people find the programming model for
asyncore hard to fathom, I find it more intuitive in this case than
threads/processes.)
In fact, the asyncore manual page has a ~20 line class which implements
a web page retrieval. You could replace that example's single call to
http_client with five calls, one for each of your ULRs. Then when you
enter the last line (that is the asyncore.loop() call) the five will be
downloading simultaneously.
See http://docs.python.org/library/asyncore.html
Gary Herron
--
http://mail.python.org/mailman/listinfo/python-list