New submission from STINNER Victor:

On Windows using the IOCP (proactor) event loop, I noticed race conditions when 
running the test suite of Trollius. For examples, sometimes the returncode of a 
process is None, which should never happen. It looks like wait_for_handle() has 
an invalid behaviour.

When I run the test suite of asyncio in debug mode (PYTHONASYNCIODEBUG=1), 
sometimes I see the message "GetQueuedCompletionStatus() returned an unexpected 
event" which should never occur neither.

I added debug traces. I saw that the IocpProactor.wait_for_handle() calls later 
PostQueuedCompletionStatus() through its internal C callback 
(PostToQueueCallback). It looks like sometimes the callback is called whereas 
the wait was cancelled/acked with UnregisterWait().

I didn't understand the logic between RegisterWaitForSingleObject(), 
UnregisterWait() and the callback.

It looks like sometimes the overlapped object created in Python ("ov = 
_overlapped.Overlapped(NULL)") is destroyed, before PostToQueueCallback() was 
called. In the unit tests, it doesn't crash because a different overlapped 
object is created and it gets the same memory address (probably because we are 
lucky!).

The current implementation of wait_for_handle() has an optimization: it polls 
immediatly the wait to check if it already completed. I tried to remove it, but 
I got some different issues. If I understood correctly, this optimization hides 
other bugs and reduce the probability of getting the race condition.

wait_for_handle() in used to wait for the completion of a subprocess, so by all 
unit tests running subprocesses, but also in test_wait_for_handle() and 
test_wait_for_handle_cancel() tests. I suspect that running 
test_wait_for_handle() or test_wait_for_handle_cancel() schedule the bug.

Note: Removing "_winapi.CloseHandle(self._iocp)" in IocpProactor.close() works 
around the bug. The bug looks to be an expected call to PostToQueueCallback() 
which calls PostQueuedCompletionStatus() on an IOCP. Not closing the IOCP means 
using a different IOCP for each test, so the unexpected call to 
PostQueuedCompletionStatus() has no effect on following tests.

--

I rewrote some parts of the IOCP code in asyncio. Maybe I introduced this issue 
during the refactoring. Maybe it already existed before but nobody noticed it, 
asyncio had fewer unit tests before.

At the beginning, I wanted to fix this crash:
https://code.google.com/p/tulip/issues/detail?id=195
"_WaitHandleFuture.cancel() crash if the wait event was already unregistered"

Later, I tried to make the code more reliable in this issue:
https://code.google.com/p/tulip/issues/detail?id=196
"_OverlappedFuture.set_result() should clear the its reference to the 
overlapped object"

Read Trollius 1.0.1 changelog which lists these changes:
http://trollius.readthedocs.org/changelog.html#version-1-0-1

--

Note: The IOCP code still has code which can be enhanced:

- "Investigate IocpProactor.accept_pipe() special case (don't register 
overlapped)"
  https://code.google.com/p/tulip/issues/detail?id=204

- "Rewrite IocpProactor.connect_pipe() with non-blocking calls to avoid non 
interruptible QueueUserWorkItem()"
  https://code.google.com/p/tulip/issues/detail?id=197

----------
components: Windows, asyncio
messages: 232987
nosy: gvanrossum, haypo, steve.dower, tim.golden, yselivanov, zach.ware
priority: normal
severity: normal
status: open
title: asyncio: race condition in the IOCP code (proactor event loop)
versions: Python 3.4, Python 3.5

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue23095>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to