On Wed, 2010-01-06, Gary Herron wrote: > aditya shukla wrote: >> Hello people, >> >> I have 5 directories corresponding 5 different urls .I want to >> download images from those urls and place them in the respective >> directories.I have to extract the contents and download them >> simultaneously.I can extract the contents and do then one by one. My >> questions is for doing it simultaneously do I have to use threads? >> >> Please point me in the right direction. >> >> >> Thanks >> >> Aditya > > You've been given some bad advice here. > > First -- threads are lighter-weight than processes, so threads are > probably *more* efficient. However, with only five thread/processes, > the difference is probably not noticeable. (If the prejudice against > threads comes from concerns over the GIL -- that also is a misplaced > concern in this instance. Since you only have network connection, you > will receive only one packet at a time, so only one thread will be > active at a time. If the extraction process uses a significant enough > amount of CPU time
I wonder what that "extraction" would be, by the way. Unless you ask for compression of the HTTP data, the images come as-is on the TCP stream. > so that the extractions are all running at the same > time *AND* if you are running on a machine with separate CPU/cores *AND* > you would like the extractions to be running truly in parallel on those > separate cores, *THEN*, and only then, will processes be more efficient > than threads.) I can't remember what the bad advice was, but here processes versus threads clearly doesn't matter performance-wise. I generally recommend processes, because how they work is well-known and they're not as vulnerable to weird synchronization bugs as threads. > Second, running 5 wgets is equivalent to 5 processes not 5 threads. > > And third -- you don't have to use either threads *or* processes. There > is another possibility which is much more light-weight: asynchronous > I/O, available through the low level select module, or more usefully > via the higher-level asyncore module. Yeah, that would be my first choice too for a problem which isn't clearly CPU-bound. Or my second choice -- the first would be calling on a utility like wget(1). /Jorgen -- // Jorgen Grahn <grahn@ Oo o. . . \X/ snipabacken.se> O o . -- http://mail.python.org/mailman/listinfo/python-list