rzimerman ha scritto:
> I'm hoping to write a program that will read any number of urls from
> stdin (1 per line), download them, and process them. So far my script
> (below) works well for small numbers of urls. However, it does not
> scale to more than 200 urls or so, because it issues HTTP requests for
> all of the urls simultaneously, and terminates after 25 seconds.
> Ideally, I'd like this script to download at most 50 pages in parallel,
> and to time out if and only if any HTTP request is not answered in 3
> seconds. What changes do I need to make?

Take a look at

And read

You can pass a timeout to the constructor.

To download at most 50 pages in parallel you can use a download queue.

Here is a quick example, ABSOLUTELY NOT TESTED:

class DownloadQueue(object):
    SIZE = 50

    def init(self):
        self.requests = [] # queued requests
        self.deferreds = [] # waiting requests

    def addRequest(self, url, timeout):
        if len(self.deferreds) >= sels.SIZE:
            # wait for completion of all previous requests
            self.deferreds = []

            # queue the request
            deferred = Deferred()
            self.requests.append((url, timeout, deferred))

            return deferred
            # execute the request now
            deferred = getPage(url, timeout=timeout)

            return deferred

    def _callback(self):
        if len(self.requests) > self.SIZE:
            queue = self.requests[:self.SIZE]
            self.requests = self.requests[self.SIZE:]
            queue = self.requests[:]
            self.requests = []

        # execute the requests
        for (url, timeout, deferredHelper) in queue:
            deferred = getPage(url, timeout=timeout)


Regards  Manlio Perillo

Reply via email to