Anders Eriksson wrote: > I have made a short program that given an url will download all referenced > files on that url. > > It works, but I'm thinking it could use some optimization since it's very > slow.
What's slow about it? Is downloading each file slow, is it the overhead of connecting to the server before the download, or is it more the feeling that the overall process could use your bandwidth better? > I create a list of tuples where each tuple consist of the url to the file > and the path to where I want to save it. E.g (http://somewhere.com/foo.mp3, > c:\Music\foo.mp3) > > The downloading part (which is the part I need help with) looks like this: > def GetFiles(): > """do the actual copying of files""" > for url,path in hreflist: > print(url,end=" ") > srcdata = urlopen(url).read() > dstfile = open(path,mode='wb') > dstfile.write(srcdata) > dstfile.close() > print("Done!") > > hreflist if the list of tuples. > > at the moment the print(url,end=" ") will not be printed before the actual > download, instead it will be printed at the same time as print("Done!"). > This I would like to have the way I intended. > > Is downloading a binary file using: srcdata = urlopen(url).read() > the best way? Is there some other way that would speed up the downloading? Yes. Instead of running the downloads in a sequential loop, put the code for downloading one file into a function and start one thread per file, each of which runs that function (see the threading module). That way, each thread can happily sit and wait for data coming from its server, without preventing other threads from receiving data from their server at the same time. That should get your bandwidth usage up. You may have to take care that you do not run too many threads against the same server (which may get upset and block your requests, depending on the site), or that you limit the number of threads when you download a large number of files. Running too many threads can slow things down again. But you'll see that when you try. Stefan -- http://mail.python.org/mailman/listinfo/python-list