[issue36607] asyncio.all_tasks() crashes if asyncio is used in multiple threads
New submission from Nick Davies : This problem was identified in https://bugs.python.org/issue34970 but I think the fix might have been incorrect. The theory in issue34970 was that GC was causing the weakrefset for `all_tasks` to change during iteration. However Weakset provides an `_IterationGuard` class to prevent GC from changing the set during iteration and hence preventing this problem in a single thread. My thoughts on this problem are: - `asyncio.tasks._all_tasks` is shared for all users of asyncio (https://github.com/python/cpython/blob/3.7/Lib/asyncio/tasks.py#L818) - Any new Task constructed mutates `_all_tasks` (https://github.com/python/cpython/blob/3.7/Lib/asyncio/tasks.py#L117) - _IterationGuard won't protect iterations in this case because calls to Weakset.add will always commit changes even if there is something iterating (https://github.com/python/cpython/blob/3.6/Lib/_weakrefset.py#L83) - calls to `asyncio.all_tasks` or `asyncio.tasks.Task.all_tasks` crash if any task is started on any thread during iteration. Repro code: ``` import asyncio from threading import Thread async def do_nothing(): await asyncio.sleep(0) async def loop_tasks(): loop = asyncio.get_event_loop() while True: loop.create_task(do_nothing()) await asyncio.sleep(0.01) def old_thread(): loop = asyncio.new_event_loop() while True: asyncio.tasks.Task.all_tasks(loop=loop) def new_thread(): loop = asyncio.new_event_loop() while True: asyncio.all_tasks(loop=loop) old_t = Thread(target=old_thread) new_t = Thread(target=new_thread) old_t.start() new_t.start() asyncio.run(loop_tasks()) ``` Output: ``` Exception in thread Thread-2: Traceback (most recent call last): File "/usr/lib/python3.7/threading.py", line 917, in _bootstrap_inner self.run() File "/usr/lib/python3.7/threading.py", line 865, in run self._target(*self._args, **self._kwargs) File "tmp/test_asyncio.py", line 25, in new_thread asyncio.all_tasks(loop=loop) File "/usr/lib/python3.7/asyncio/tasks.py", line 40, in all_tasks return {t for t in list(_all_tasks) File "/usr/lib/python3.7/_weakrefset.py", line 60, in __iter__ for itemref in self.data: RuntimeError: Set changed size during iteration Exception in thread Thread-1: Traceback (most recent call last): File "/usr/lib/python3.7/threading.py", line 917, in _bootstrap_inner self.run() File "/usr/lib/python3.7/threading.py", line 865, in run self._target(*self._args, **self._kwargs) File "tmp/test_asyncio.py", line 19, in old_thread asyncio.tasks.Task.all_tasks(loop=loop) File "/usr/lib/python3.7/asyncio/tasks.py", line 52, in _all_tasks_compat return {t for t in list(_all_tasks) if futures._get_loop(t) is loop} File "/usr/lib/python3.7/_weakrefset.py", line 60, in __iter__ for itemref in self.data: RuntimeError: Set changed size during iteration ``` -- components: asyncio messages: 339991 nosy: Nick Davies, asvetlov, yselivanov priority: normal severity: normal status: open title: asyncio.all_tasks() crashes if asyncio is used in multiple threads versions: Python 3.7 ___ Python tracker <https://bugs.python.org/issue36607> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue36607] asyncio.all_tasks() crashes if asyncio is used in multiple threads
Change by Nick Davies : -- type: -> behavior versions: +Python 3.6 ___ Python tracker <https://bugs.python.org/issue36607> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue36607] asyncio.all_tasks() crashes if asyncio is used in multiple threads
Nick Davies added the comment: My preference would actually be number 3 because: 1: I agree that this isn't really a safe option because it could slow things down (possibly a lot) 2: I haven't found this to be rare in my situation but I am not sure how common my setup is. We have a threaded server with a mix of sync and asyncio so we use run in a bunch of places inside threads. Any time the server gets busy any task creation that occurs during the return of run crashes. My two main reservations about this approach are: - There is a potentially unbounded number of times that this could need to retry. - Also this is covering up a thread unsafe operation and we are pretty lucky based on the current implementation that it explodes in a consistent and sane way that we can catch and retry. 3: Loop is already expected to be hashable in 3.7 as far as I can tell (https://github.com/python/cpython/blob/3.7/Lib/asyncio/tasks.py#L822) so other than the slightly higher complexity this feels like the cleanest solution. > The fix can be applied to 3.7 and 3.8 only, sorry. Python 3.6 is in security > mode now. Thats fine, you can work around the issue using asyncio.set_task_factory to something that tracks the tasks per loop with some care. -- ___ Python tracker <https://bugs.python.org/issue36607> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue36607] asyncio.all_tasks() crashes if asyncio is used in multiple threads
Nick Davies added the comment: > Would you prepare a patch for number 3? I will give it a try and see what I come up with. > I am afraid we can add another hard-to-debug multi-threaded problem by > complicating the data structure. Yeah this was my concern too, the adding and removing from the WeakDict[AbstractEventLoop, WeakSet[Task]] for `_all_tasks` could still cause issues. Specifically the whole WeakSet class is not threadsafe so I would assume WeakDict is the same, there may not be a nice way of ensuring a combination of GC + the IterationGuard doesn't come and mess up the dict even if I wrap it in a threading lock. Another option would be to have the WeakSet[Task] attached to the loop itself then because using the same loop in multiple threads not at all thread safe already that would contain the problem. You mentioned "third-party loops" which may make this option impossible. > I'm just curious why do you call `all_tasks()` at all? > In my mind, the only non-debug usage is `asyncio.run()` In reality we aren't using `all_tasks()` directly. We are calling `asyncio.run()` from multiple threads which triggers the issue. The repro I provided was just a more reliable way of triggering the issue. I will paste a slightly more real-world example of how this happened below. This version is a little more messy and harder to see exactly what the problem is which is why I started with the other one. ``` import asyncio from threading import Thread async def do_nothing(n=0): await asyncio.sleep(n) async def loop_tasks(): loop = asyncio.get_event_loop() while True: loop.create_task(do_nothing()) await asyncio.sleep(0.01) async def make_tasks(n): loop = asyncio.get_event_loop() for i in range(n): loop.create_task(do_nothing(1)) await asyncio.sleep(1) def make_lots_of_tasks(): while True: asyncio.run(make_tasks(1)) for i in range(10): t = Thread(target=make_lots_of_tasks) t.start() asyncio.run(loop_tasks()) ``` -- ___ Python tracker <https://bugs.python.org/issue36607> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com