On Wed, 2011-10-26 at 10:02 -0400, Jason Rennie wrote: > The background: > > > I've been using DeferredSemaphore and DeferredList to manage the > running of tasks with a resource constraint (only so many tasks can > run at the same time). This worked great until I tried to use it to > manage millions of tasks. Simply setting them up to run > (DeferredSemaphore.run() calls) took appx. 2 hours and used ~5 gigs of > ram. This was less efficient than I expected. Note that these > numbers don't include time/memory for actually running the tasks, only > time/memory to set up the running of the tasks. I've since written a > custom task runner that has uses comparatively little setup > time/memory by adding a "manager" callback to each task which starts > additional tasks as appropriate. > > > My questions: > * Is the behavior I'm seeing expected? i.e. are DS/DL only > recommended for task management if the # of tasks not too > large? Is there a better way to use DS/DL that I might not be > thinking of?
DeferredList is intended for the case where you want to wait for all results to have arrived. Given its API, you basically *have* to create all the millions of input Deferreds first (although not the tasks themselves, if you're clever). So this is going to be slow, and use a lot of memory... although 5 gigs is rather surprising, unless each task has a lot of state. > * Is there a Twisted pattern for managing tasks efficiently that > I might be missing? It seems like you've figured it out, if you've written a custom task runner. Probably Twisted should include some better abstraction for doing this sort of thing, since it does come up regularly. _______________________________________________ Twisted-Python mailing list [email protected] http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
