[issue22114] You cannot call communicate() safely after receiving EAGAIN
New submission from Amrith Kumar: Environment: - Linux (Ubuntu 14.04 LTS, x64) running Python 2.7. - Code uses eventlet.green.subprocess Code establishes subprocess as: subprocess.Popen(command, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE, ...) It then calls communicate() communicate() throws EAGAIN The code calls communicate() again This fails with "ValueError: I/O operation on closed file" This exception comes from eventlet/greenio.py because an operation (flush) has been attempted on a closed file. The complete explanation of this failure is this. When communicate() is called in this way (WITH NO INPUT but with all three handles stdin, stdout, stderr), code in communicate() bypasses the "1 handle optimization" and goes directly to _communicate() _communicate() will first send any input on stdin along, and flush and close stdin(). >From that point forward, if anything to do with stderr and stdout returns >EAGAIN, an attempt to call communicate() again will fail because stdin has now >been closed(). This is because the code in _communicate() preemptively closes stdin if there is no input but does not reset stdin. def _communicate(self, input): if self.stdin: # Flush stdio buffer. This might block, if the user has # been writing to .stdin in an uncontrolled fashion. self.stdin.flush() if not input: self.stdin.close() The fix for this is relatively straightforward (conceptually). def _communicate(self, input): if self.stdin: # Flush stdio buffer. This might block, if the user has # been writing to .stdin in an uncontrolled fashion. self.stdin.flush() if not input: self.stdin.close() self.stdin = None # This should take care of it. However, a partial workaround from the client perspective is this. 1. If you have no input, set stdin to None 2. If you do have input and you get EAGAIN, you cannot safely retry because your input may have already been completely or partially sent to the subprocess; you have to assume failure. Backtrace: Traceback (most recent call last): File "/opt/stack/trove/trove/tests/unittests/guestagent/test_mongodb_manager.py", line 58, in test_prepare_from_backup self._prepare_dynamic(backup_id='backup_id_123abc') File "/opt/stack/trove/trove/tests/unittests/guestagent/test_mongodb_manager.py", line 91, in _prepare_dynamic backup_info=backup_info) File "trove/guestagent/datastore/mongodb/manager.py", line 66, in prepare operating_system.update_owner('mongodb', 'mongodb', mount_point) File "trove/guestagent/common/operating_system.py", line 109, in update_owner run_as_root=True, root_helper="sudo") File "trove/common/utils.py", line 278, in execute_with_timeout return execute(*args, **kwargs) File "trove/openstack/common/processutils.py", line 186, in execute result = obj.communicate() File "/usr/lib/python2.7/subprocess.py", line 799, in communicate return self._communicate(input) File "/usr/lib/python2.7/subprocess.py", line 1396, in _communicate self.stdin.flush() File "/opt/stack/trove/.tox/py27/local/lib/python2.7/site-packages/eventlet/greenio.py", line 419, in _operationOnClosedFile raise ValueError("I/O operation on closed file") ValueError: I/O operation on closed file -- components: IO messages: 224398 nosy: amrith priority: normal severity: normal status: open title: You cannot call communicate() safely after receiving EAGAIN versions: Python 2.7 ___ Python tracker <http://bugs.python.org/issue22114> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22114] You cannot call communicate() safely after receiving EAGAIN
Amrith Kumar added the comment: I'll supply a patch for this proposed change. -- ___ Python tracker <http://bugs.python.org/issue22114> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22114] You cannot call communicate() safely after receiving EAGAIN
Amrith Kumar added the comment: I see three comments, one from r.david.murray, one from haypo and one from pitrou. I'll try and address all three. r.david.murray: The code in question is in https://github.com/openstack/oslo-incubator/blob/master/openstack/common/processutils.py#L177-L189 note that we're catching EAGAIN and EINTR. I have not been able to isolate this down to a simple repro without the rest of this paraphernalia but I'm trying. So, we are 'catching' EAGAIN or EINTR here and we're trying to handle it to the best of our ability. However, if the underlying layer is not setup to handle a retry, our best efforts will be fruitless. That is what is happening here. The reason for this code (ignoring the retry of 20) was put in place exactly because a call to communicate() received an EAGAIN. The issue therefore is that in order for the higher level to properly handle this, communicate() should be setup to handle a second call, which it currently is not. haypo and pitrou: that may be true; I'm not competent to comment on that. But, as pointed out in earlier comment (and modulo this may be eventlet specific), just catching more exceptions isn't the answer. if the descriptor is closed, the thing that communicate/_communicate() call should be able to handle that situation. And this bug illustrates that at least eventlet doesn't handle that. However, I submit to you that this is NOT an eventlet issue. Here's why. The failure here is that a flush call is being attempted on a closed descriptor. I believe that the implementation of flush (in eventlet) is legitimately throwing an exception indicating that the state machine was violated (cannot flush on closed descriptor). The close() was invoked by subprocess.py after it finished doing what it thought it had to do with stdin on the first invocation. therefore I believe it must be the responsibility of subprocess.py to make sure that when invoked again, it doesn't step on itself. Either that, or subprocess.py's communicate() implementation should indicate that it can only be called once, capture all exceptions that would point a user to retry (such as EAGAIN and EINTR) and mask them and return some EFATAL. -- components: +IO -Library (Lib) ___ Python tracker <http://bugs.python.org/issue22114> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22114] You cannot call communicate() safely after receiving an exception (EINTR or EAGAIN)
Changes by Amrith Kumar : -- title: You cannot call communicate() safely after receiving EAGAIN -> You cannot call communicate() safely after receiving an exception (EINTR or EAGAIN) ___ Python tracker <http://bugs.python.org/issue22114> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22114] You cannot call communicate() safely after receiving an exception (EINTR or EAGAIN)
Amrith Kumar added the comment: haypo; no argument from me. I just updated the description to make it more exact. I may have received EINTR or EAGAIN. In either event, I need to debug some more and try and do a clean repro and then figure out a proper fix. In the meanwhile, I proposed a fix for the openstack code; if you are so inclined, please take a look. https://review.openstack.org/#/c/110933/ -- ___ Python tracker <http://bugs.python.org/issue22114> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22114] You cannot call communicate() safely after receiving an exception (EINTR or EAGAIN)
Amrith Kumar added the comment: r.david.murray, I guess the issue we have is that something (hypothesis: eventlet) is causing the blocking API to communicate() to throw an EAGAIN. This (https://bugs.launchpad.net/nova/+bug/1271331) is the bug that was fixed a while ago in OpenStack that introduced the notion of calling communicate() multiple times. -- ___ Python tracker <http://bugs.python.org/issue22114> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22114] You cannot call communicate() safely after receiving an exception (EINTR or EAGAIN)
Amrith Kumar added the comment: After some debugging and reading code in python's subprocess.py (v2.7), here's what I'm seeing. (a) the error has nothing to do with eventlet and monkey-patching. (b) the code in _communicate_with_select() and potentially _communicate_with_poll() are the problem. What's the problem? --- The code in _communicate_with_select() correctly sets up to handle the read and write calls without blocking on any of them. It does this by establishing the two (read and write) lists of descriptors and calls select without no timeout specified. When select returns, and indicates that a socket is in (for example) the read list, what that means is that an attempt to read() will not block. However, it is possible on some systems (Linux for sure) that (a) a socket is non-blocking, and (b) a call to select indicates that the socket is ready to read, and (c) a call to read the socket returns an error EAGAIN (aka EWOULDBLOCK). Callers of read() and write() on non-blocking sockets should be prepared to handle this situation. The python code in _communicate_with_select() is not. It assumes that if select() returns that a read fd is ready for read, that a call to read it will produce 0 or more bytes. The calls to read() for stdout and stderr are not guarded with exception handlers. However, if a socket is setup as non-blocking, any call can produce EWOULDBLOCK, EAGAIN, ... Adding some debugging code it was possible to recreate the problem and show that the backtrace was (in this case it happened with python 2.6) Traceback (most recent call last): [...] File "trove/openstack/common/processutils.py", line 186, in execute result = obj.communicate() File "/usr/lib64/python2.6/subprocess.py", line 732, in communicate stdout, stderr = self._communicate(input, endtime) File "/usr/lib64/python2.6/subprocess.py", line 1318, in _communicate stdout, stderr = self._communicate_with_select(input, endtime) File "/usr/lib64/python2.6/subprocess.py", line 1483, in _communicate_with_select data = os.read(self.stdout.fileno(), 1024) OSError: [Errno 11] Resource temporarily unavailable The correct fix for this is to make _communicate_with_select() and maybe _communicate_with_poll() properly handle the read() and write() calls and be better prepared to handle a thrown condition of EAGAIN from os.read or os.write. -- ___ Python tracker <http://bugs.python.org/issue22114> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22114] You cannot call communicate() safely after receiving an exception (EINTR or EAGAIN)
Amrith Kumar added the comment: The issue turned out to be something funky in monkey patching. os.read() shouldn't be getting called if the monkey patch was correct in the first place. -- resolution: -> not a bug status: open -> closed ___ Python tracker <http://bugs.python.org/issue22114> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue28841] urlparse.urlparse() parses invalid URI without generating an error (examples provided)
New submission from Amrith Kumar: The urlparse library incorrectly accepts a URI which specifies an invalid host identifier. Example: http://www.example.com:/abc Looking at the URI specifications, this is an invalid URI. By definition, you are supposed to specify a "hostport" and a hostport is defined as: https://www.w3.org/Addressing/URL/uri-spec.html hostport host [ : port ] The BNF indicates that : is only valid if a port is also specified. See current behavior; I submit to you that this should generate an exception. https://gist.github.com/anonymous/8504f160ff90649890b5a2a82f8028b0 -- components: Library (Lib) messages: 282086 nosy: amrith priority: normal severity: normal status: open title: urlparse.urlparse() parses invalid URI without generating an error (examples provided) type: behavior versions: Python 2.7, Python 3.4 ___ Python tracker <http://bugs.python.org/issue28841> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue28841] urlparse.urlparse() parses invalid URI without generating an error (examples provided)
Amrith Kumar added the comment: As requested ... >>> urlparse.urlparse('http://www.google.com:/abc') ParseResult(scheme='http', netloc='www.google.com:', path='/abc', params='', query='', fragment='') I submit to you that this should generate an error. -- ___ Python tracker <http://bugs.python.org/issue28841> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue28841] urlparse.urlparse() parses invalid URI without generating an error (examples provided)
Amrith Kumar added the comment: Eric, Martin, I'm sure this is an interpretation (and I'll be fair and give you both) that a URL of the form: http://hostname.domain.tld:/path should be held invalid. The rationale is per the section of the spec you quoted. The other side of the argument is that the hostname and port are defined as: hostname [ : port ] where port is defined as *DIGIT This implies that 0 digits is also valid. I submit to you that the ambiguity would be removed if they: - removed the paragraph telling people not to emit a : if there was no port, or - changing the port definition to 1*DIGIT Absent that, I believe that the paragraph implies that the intent was 1*DIGIT. And therefore a : with no following number should generate an error. I raise this because the behavior of urlparse() (the way it does things now) is being cited as the reason why other routines in OpenStack should handle this when the reason we were getting URL's with a : and no port was in fact an oversight. So, in hearing the rationalization as being that "urlparse() does it, so ..." I figured I'd come to the source and see if we could make urlparse() do things unambiguously. Now, if the reason it does this is because someone who came before me made the argument that a : with no port should be accepted, I guess I'm out of luck. -- ___ Python tracker <http://bugs.python.org/issue28841> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com