[issue30782] Allow limiting the number of concurrent tasks in asyncio.as_completed

Andy Balaam Mon, 26 Jun 2017 18:06:08 -0700

New submission from Andy Balaam:

asyncio.as_completed allows us to provide lots of coroutines (or Futures) to 
schedule, and then deal with the results as soon as they are available, in a 
loop, or a streaming style.


I propose to allow as_completed to work on very large numbers of coroutines, 
provided through a generator (rather than a list).  In order to make this 
practical, we need to limit the number of coroutines that are scheduled 
simultaneously to a reasonable number.

For tasks that open files or sockets, a reasonable number might be 1000 or 
fewer.  For other tasks, a much larger number might be reasonable, but we would 
still like some limit to prevent us running out of memory.

I suggest adding a "limit" argument to as_completed that limits the number of 
coroutines that it schedules simultaneously.

For me, the key advantage of as_completed (in the proposed modified form) is 
that it enables a streaming style that looks quite like synchronous code, but 
is efficient in terms of memory usage (as you'd expect from a streaming style):


#!/usr/bin/env python3

import asyncio
import sys

limit = int(sys.argv[1])

async def double(x):
    await asyncio.sleep(1)
    return x * 2

async def print_doubles():
    coros = (double(x) for x in range(1000000))
    for res in asyncio.as_completed(coros, limit=limit):
        r = await res
        if r % 100000 == 0:
            print(r)

loop = asyncio.get_event_loop()
loop.run_until_complete(print_doubles())
loop.close()


Using my prototype implementation, this runs faster and uses much less memory 
on my machine when you run it with a limit of 100K instead of 1 million 
concurrent tasks:

$ /usr/bin/time --format "Memory usage: %MKB\tTime: %e seconds" ./example 
1000000
Memory usage: 2234552KB Time: 97.52 seconds

$ /usr/bin/time --format "Memory usage: %MKB\tTime: %e seconds" ./example 100000
Memory usage: 252732KB  Time: 94.13 seconds

I have been working on an implementation and there is some discussion in my 
blog posts: 
http://www.artificialworlds.net/blog/2017/06/12/making-100-million-requests-with-python-aiohttp/
 and 
http://www.artificialworlds.net/blog/2017/06/27/adding-a-concurrency-limit-to-pythons-asyncio-as_completed/

Possibly the most controversial thing about this proposal is the fact that we 
need to allow passing a generator to as_completed instead of enforcing that it 
be a list.  This is fundamental to allowing the style I outlined above, but 
it's possible that we can do better than the blanket allowing of all generators 
that I did.

----------
components: asyncio
messages: 296982
nosy: andybalaam, yselivanov
priority: normal
severity: normal
status: open
title: Allow limiting the number of concurrent tasks in asyncio.as_completed
type: enhancement
versions: Python 3.7

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue30782>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue30782] Allow limiting the number of concurrent tasks in asyncio.as_completed

Reply via email to