[issue12958] test_socket failures on Mac OS X
Changes by David Watson : -- nosy: +baikie ___ Python tracker <http://bugs.python.org/issue12958> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12981] rewrite multiprocessing (senfd|recvfd) in Python
David Watson added the comment: I had a look at this patch, and the FD passing looked OK, except that calculating the buffer size with CMSG_SPACE() may allow more than one file descriptor to be received, with the extra one going unnoticed - it should use CMSG_LEN() instead (the existing C implementation has the same problem, I see). CMSG_SPACE() exists to allow calculating the space required to hold multiple control messages, so it essentially gives the offset for the next cmsghdr struct such that any alignment requirements are satisfied. 64-bit systems will probably want to ensure that all CMSG_DATA() payloads are aligned on 8-byte boundaries, and so have CMSG_SPACE(4) == CMSG_SPACE(8) == CMSG_LEN(8) (the Linux headers, for instance, align to sizeof(size_t)). So with a 32-bit int, a buffer size of CMSG_SPACE(sizeof(int)) would allow *two* file descriptors to be received. CMSG_LEN() omits the padding, thus allowing only one. I'm not familiar with how the FD-passing facility is used in multiprocessing, but this seems as if it could be an avenue for DoS attacks that exhaust the number of file descriptors allowed for the receiving process. -- nosy: +baikie ___ Python tracker <http://bugs.python.org/issue12981> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8623] Aliasing warnings in socketmodule.c
David Watson added the comment: For reference, the warnings are partially explained here: http://gcc.gnu.org/onlinedocs/gcc-4.6.1/gcc/Optimize-Options.html#index-fstrict_002daliasing-825 http://gcc.gnu.org/onlinedocs/gcc-4.6.1/gcc/Warning-Options.html#index-Wstrict_002daliasing-337 I get these warnings with GCC (Ubuntu/Linaro 4.4.4-14ubuntu5) 4.4.5 [i386], plus an additional one from the new recvmsg() code. I haven't tried GCC 4.5 or later, but as the docs imply, the warnings will not appear in debugging builds. I take it GCC is referring to C99 section 6.5, paragraphs 6 and 7 here, but I'm not sure exactly how much these are intended to prohibit with regard to the (mis)use of unions, or how strictly GCC actually enforces them. The attached socket-aliasing-sas2sa.diff is enough to get rid of the warnings with GCC 4.4.4 - it adds add a "struct sockaddr" member to the sock_addr_t union type, changes the SAS2SA() macro to take the address of this member instead of using a cast, and modifies socket_gethostbyaddr() and socket_gethostbyname_ex() to use SAS2SA() (sock_recvmsg_guts() already uses it). Changing SAS2SA() also gets rid of most of the additional warnings produced by the "aggressive" warning setting -Wstrict-aliasing=2. However, the gethostby* functions still point to the union object with a pointer variable not matching the type actually stored in it, which the GCC docs warn against. To be more conservative, socket-aliasing-union-3.2.diff applies on top to get rid of these pointers, and instead directly access the union for each use other than providing a pointer argument to a function. socket-aliasing-union-recvmsg-3.3.diff does the same for 3.3, and makes the complained-about line in sock_recvmsg_guts() access the union directly as well. One other consideration here is that the different sockaddr_* struct types used are likely to come under the "common initial sequence" rule for unions (C99 6.5.2.3, paragraph 5, or section A8.3 of K&R 2nd ed.), which might make some more questionable uses valid. That said, technically POSIX appears to require only that the s*_family members of the various sockaddr struct types have the same offset and type, not that they form part of a common initial sequence (s*_family need not be the first structure member - the BSDs for instance place it second, although it can still be part of a common initial sequence). -- keywords: +patch nosy: +baikie versions: +Python 3.3 Added file: http://bugs.python.org/file23186/socket-aliasing-sas2sa.diff ___ Python tracker <http://bugs.python.org/issue8623> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8623] Aliasing warnings in socketmodule.c
Changes by David Watson : Added file: http://bugs.python.org/file23187/socket-aliasing-union-3.2.diff ___ Python tracker <http://bugs.python.org/issue8623> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8623] Aliasing warnings in socketmodule.c
Changes by David Watson : Added file: http://bugs.python.org/file23188/socket-aliasing-union-3.3.diff ___ Python tracker <http://bugs.python.org/issue8623> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13001] test_socket.testRecvmsgTrunc failure on FreeBSD 7.2 buildbot
Changes by David Watson : -- nosy: +baikie ___ Python tracker <http://bugs.python.org/issue13001> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12981] rewrite multiprocessing (senfd|recvfd) in Python
David Watson added the comment: On Sun 18 Sep 2011, Charles-François Natali wrote: > > I had a look at this patch, and the FD passing looked OK, except > > that calculating the buffer size with CMSG_SPACE() may allow more > > than one file descriptor to be received, with the extra one going > > unnoticed - it should use CMSG_LEN() instead > > > (the existing C implementation has the same problem, I see). > > I just checked, and the C version uses CMSG_SPACE() as the buffer size, but > passes CMSG_LEN() to cmsg->cmsg_len and msg.msg_controllen. Or am I missing > something? Ah, no, you're right - that's fine. Sorry for the false alarm. -- ___ Python tracker <http://bugs.python.org/issue12981> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13022] _multiprocessing.recvfd() doesn't check that file descriptor was actually received
New submission from David Watson : The function _multiprocessing.recvfd() calls recvmsg() and expects to receive a file descriptor in an SCM_RIGHTS control message, but doesn't check that such a control message is actually present. So if the sender sends data without an accompanying file descriptor, recvfd() will the return the integer value of the uninitialized CMSG_DATA() buffer. The attached recvfd-check.diff checks for a complete control message of the correct type, and raises RuntimeError if it isn't there. This matches the behaviour of the proposed pure-Python implementation at issue #12981. The patch includes a test case, but like the other recently-added tests for the function, it isn't guarded against multiprocessing.reduction being unavailable. Issue #12981 has a patch "skip_reduction.diff" (already in 3.3) to fix this, and I'm attaching recvfd-skip-reduction-fix.diff to apply on top of it and guard the new test case as well. -- components: Extension Modules files: recvfd-check.diff keywords: patch messages: 144351 nosy: baikie priority: normal severity: normal status: open title: _multiprocessing.recvfd() doesn't check that file descriptor was actually received type: behavior versions: Python 2.7, Python 3.2, Python 3.3 Added file: http://bugs.python.org/file23214/recvfd-check.diff ___ Python tracker <http://bugs.python.org/issue13022> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13022] _multiprocessing.recvfd() doesn't check that file descriptor was actually received
Changes by David Watson : Added file: http://bugs.python.org/file23215/recvfd-skip-reduction-fix.diff ___ Python tracker <http://bugs.python.org/issue13022> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12981] rewrite multiprocessing (senfd|recvfd) in Python
David Watson added the comment: On Tue 20 Sep 2011, Charles-François Natali wrote: > I committed the patch to catch the ImportError in test_multiprocessing. This should go in all branches, I think - see issue #13022. -- ___ Python tracker <http://bugs.python.org/issue12981> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6560] socket sendmsg(), recvmsg() methods
David Watson added the comment: On Mon 23 May 2011, Gergely Kálmán wrote: > It's been a while I had a look at that code. As far as I remember though > the code is fairly decent not > taking the missing unit tests into account. There are a few todos, and > also a pretty bad bug that I've fixed > but not committed. The TODOs include better parsing of auxiliary data, > support for scatter-gather, addressed > messages. If you wish I can send you the latest patch that has the bug > fixed and applies to 3.2. Erm, have you seen the separately-implemented patch I posted at http://bugs.python.org/file19962/baikie-hwundram-v5.diff ? It's basically complete IIRC. -- ___ Python tracker <http://bugs.python.org/issue6560> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8372] socket: Buffer overrun while reading unterminated AF_UNIX addresses
David Watson added the comment: On Sun 12 Jun 2011, Charles-François Natali wrote: > The patches look good to me, except that instead of passing > (addrlen > buflen) ? buflen : addrlen > as addrlen argument every time makesockaddr is called, I'd > prefer if this min was done inside makesockaddr itself, > i.e. perform min(addrlen, sizeof(struct sockaddr_un)) in the > AF_UNIX switch case (especially since addrlen is only used for > AF_UNIX). Actually, I think it should be clamped at the top of the function, since the branch for unknown address families ought to use the length as well (it doesn't, but that's a separate issue). I'm attaching new patches to do the check in makesockaddr(), which also change the length parameters from int to socklen_t, in case the OS returns a really huge value. I'm also attaching new return-unterminated patches to handle the possibility that addrlen is unsigned (socklen_t may be unsigned, and addrlen *is* now unsigned in 3.x). This entailed specifying what to do if addrlen < offsetof(struct sockaddr_un, sun_path), i.e. if the address is truncated at least one byte before the start of sun_path. This may well never happen (Python's existing code would raise SystemError if it did, due to calling PyString_FromStringAndSize() with a negative length), but I've made the new patches return None if it does, as None is already returned if addrlen is 0. As another precedent of sorts, Linux currently returns None (i.e. addrlen = 0) when receiving a datagram from an unbound Unix socket, despite returning an empty string (i.e. addrlen = offsetof(..., sun_path)) for the same unbound address in other situations. (I think the decoders for other address families should also return None if addrlen is less than the size of the appropriate struct, but again, that's a separate issue.) Also, I noticed that on Linux, Python 3.x currently returns empty addresses as bytes objects rather than strings, whereas the patches I've provided make it return strings. In case this change isn't acceptable for the 3.x maintenance branches, I'm attaching return-unterminated-3.x-maint-new.diff which still returns them as bytes (on Linux only). To sum up the patch order: 2.x: linux-pass-unterminated-4spc.diff test-2.x-new.diff return-unterminated-2.x-new.diff addrlen-makesockaddr-2.x.diff (or addrlen-2.x-4spc.diff) 3.2: linux-pass-unterminated-4spc.diff test-3.x-new.diff return-unterminated-3.x-maint-new.diff addrlen-makesockaddr-3.x.diff (or addrlen-3.x-4spc.diff) default: linux-pass-unterminated-4spc.diff test-3.x-new.diff return-unterminated-3.x-trunk-new.diff addrlen-makesockaddr-3.x.diff (or addrlen-3.x-4spc.diff) -- Added file: http://bugs.python.org/file22384/addrlen-makesockaddr-2.x.diff Added file: http://bugs.python.org/file22385/addrlen-makesockaddr-3.x.diff Added file: http://bugs.python.org/file22386/return-unterminated-2.x-new.diff Added file: http://bugs.python.org/file22387/return-unterminated-3.x-maint-new.diff Added file: http://bugs.python.org/file22388/return-unterminated-3.x-trunk-new.diff ___ Python tracker <http://bugs.python.org/issue8372> ___If accept(), etc. return a larger addrlen than was supplied, ignore it and use the original buffer length. diff --git a/Modules/socketmodule.c b/Modules/socketmodule.c --- a/Modules/socketmodule.c +++ b/Modules/socketmodule.c @@ -969,13 +969,22 @@ makebdaddr(bdaddr_t *bdaddr) /*ARGSUSED*/ static PyObject * -makesockaddr(int sockfd, struct sockaddr *addr, int addrlen, int proto) +makesockaddr(int sockfd, struct sockaddr *addr, socklen_t addrlen, + socklen_t buflen, int proto) { if (addrlen == 0) { /* No address -- may be recvfrom() from known socket */ Py_INCREF(Py_None); return Py_None; } +/* buflen is the length of the buffer containing the address, and + addrlen is either the same, or is the length returned by the OS + after writing an address into the buffer. Some systems return + the length they would have written if there had been space + (e.g. when an oversized AF_UNIX address has its sun_path + truncated). */ +if (addrlen > buflen) +addrlen = buflen; #ifdef __BEOS__ /* XXX: BeOS version of accept() doesn't set family correctly */ @@ -1632,6 +1641,7 @@ sock_accept(PySocketSockObject *s) sock_addr_t addrbuf; SOCKET_T newfd; socklen_t addrlen; +socklen_t buflen; PyObject *sock = NULL; PyObject *addr = NULL; PyObject *res = NULL; @@ -1639,6 +1649,7 @@ sock_accept(PySocketSockObject *s) if (!getsockaddrlen(s, &addrlen)) return NULL; +buflen = addrlen; memset(&addrbuf, 0, addrlen); #ifdef MS_WINDOWS @@ -1680,7 +1691,7 @@ sock_accept(PySocketSockObject *s) goto finally;
[issue12835] Missing SSLSocket.sendmsg() wrapper allows programs to send unencrypted data by mistake
New submission from David Watson : Changeset fd10d042b41d removed the wrappers on ssl.SSLSocket for the new socket.send/recvmsg() methods (since I forgot to check for the existence of the underlying methods - see issue #6560), but this leaves SSLSocket with send/recvmsg() methods inherited from the underlying socket type; thus SSLSocket.sendmsg() will insert the given data into the stream without encrypting it (or wrapping it in SSL in any way). This immediately screws up the SSL connection, resulting in receive errors at both ends ("SSL3_GET_RECORD:wrong version number" and the like), but the data is clearly visible in a packet capture, so it's too late if it was actually something secret. Correspondingly, recvmsg() and recvmsg_into() return the encrypted data, and screw up the connection by removing it from the SSL stream. Of course, these methods don't make sense over SSL anyway, but if the programmer naively assumes they do, then ideally they should not expose any secret information. Attaching a patch implementing Antoine Pitrou's suggestion that the methods should simply raise NotImplementedError. I don't know if these versions should also be added only if present on the underlying socket - they're Not Implemented either way :-) -- components: Library (Lib) files: ssl_sendrecvmsg_notimplemented.diff keywords: patch messages: 142900 nosy: baikie priority: normal severity: normal status: open title: Missing SSLSocket.sendmsg() wrapper allows programs to send unencrypted data by mistake versions: Python 3.3 Added file: http://bugs.python.org/file23030/ssl_sendrecvmsg_notimplemented.diff ___ Python tracker <http://bugs.python.org/issue12835> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6560] socket sendmsg(), recvmsg() methods
David Watson added the comment: On Tue 23 Aug 2011, Nick Coghlan wrote: > As you can see, I just pushed a change that removed the new > methods from SSLSocket objects. If anyone wants to step up with > a valid use case (not already covered by wrap_socket), > preferably with a patch to add them back that includes proper > tests and documentation changes, please open a new feature > request and attach the new patch to that issue. Hi, sorry about the trouble caused by the broken tests, but SSLSocket should at least override sendmsg() to stop misguided programs sending data in the clear: http://bugs.python.org/issue12835 -- ___ Python tracker <http://bugs.python.org/issue6560> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12837] Patch for issue #12810 removed a valid check on socket ancillary data
New submission from David Watson : Changeset 4736e172fa61 for issue #12810 removed the test "msg->msg_controllen < 0" from socketmodule.c, where msg_controllen happened to be unsigned on the reporter's system. I included this test deliberately, because msg_controllen may be of signed type (POSIX allows socklen_t to be signed, as objects of that type historically were - as the Rationale says: "All socklen_t types were originally (in BSD UNIX) of type int."). Attaching a patch to replace the check and add an accompanying comment. -- components: Extension Modules files: restore_controllen_check.diff keywords: patch messages: 142934 nosy: baikie priority: normal severity: normal status: open title: Patch for issue #12810 removed a valid check on socket ancillary data type: behavior versions: Python 3.3 Added file: http://bugs.python.org/file23036/restore_controllen_check.diff ___ Python tracker <http://bugs.python.org/issue12837> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12837] Patch for issue #12810 removed a valid check on socket ancillary data
David Watson added the comment: On Wed 24 Aug 2011, Charles-François Natali wrote: > > I included this test deliberately, because msg_controllen may be > > of signed type [...] POSIX allows socklen_t to be signed > > http://pubs.opengroup.org/onlinepubs/007908799/xns/syssocket.h.html > """ > makes available a type, socklen_t, which is an unsigned opaque > integral type of length of at least 32 bits. To forestall portability > problems, it is recommended that applications should not use values larger > than 2**32 - 1. > """ That has since been changed. I'm reading from POSIX.1-2008, which says: The header shall define the socklen_t type, which is an integer type of width of at least 32 bits http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_socket.h.html The warning against using values larger than 2**32 - 1 is still there, I presume because they would not fit in a 32-bit signed int. > Also, I'm not convinced by this: > >/* Check for empty ancillary data as old CMSG_FIRSTHDR() >implementations didn't do so. */ > for (cmsgh = ((msg.msg_controllen > 0) ? CMSG_FIRSTHDR(&msg) : NULL); > cmsgh != NULL; cmsgh = CMSG_NXTHDR(&msg, cmsgh)) { > > Did you really have reports of CMSG_NXTHDR not returning NULL upon empty > ancillary data (it's also raquired by POSIX)? I take it you mean CMSG_FIRSTHDR here; RFC 3542 says that: One possible implementation could be #define CMSG_FIRSTHDR(mhdr) \ ( (mhdr)->msg_controllen >= sizeof(struct cmsghdr) ? \ (struct cmsghdr *)(mhdr)->msg_control : \ (struct cmsghdr *)NULL ) (Note: Most existing implementations do not test the value of msg_controllen, and just return the value of msg_control... IIRC, I saw an implementation in old FreeBSD headers that did not check msg_controllen, and hence did not return NULL as RFC 3542 requires. Now you come to mention it though, this check in the for loop does actually protect against the kernel returning a negative msg_controllen, so the only remaining possibility would be that the CMSG_* macros fiddle with it. That said, the fact remains that the compiler warning is spurious if msg_controllen can be signed on some systems, and I still don't think decreasing the robustness of the code (particularly against any future modifications to that code) just for the sake of silencing a spurious warning is a good thing to do. People can read the comment above the "offending" line and see that the compiler has got it wrong. -- ___ Python tracker <http://bugs.python.org/issue12837> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12835] Missing SSLSocket.sendmsg() wrapper allows programs to send unencrypted data by mistake
David Watson added the comment: On Thu 25 Aug 2011, Antoine Pitrou wrote: > Adding an explanation message to the NotImplementedError would be more > helpful. Otherwise, good catch. OK, I've copied the messages from the ValueErrors the other methods raise. -- Added file: http://bugs.python.org/file23048/ssl_sendrecvmsg_notimplemented-2.diff ___ Python tracker <http://bugs.python.org/issue12835> ___# HG changeset patch # User David Watson # Date 1314305189 -3600 # Node ID 23cdc358bbfb0ad40607b1c54bda2f7b5abe39f0 # Parent 80f814dca274b5d848dbd306c1513263e69011ce Make SSLSocket.sendmsg/recvmsg/recvmsg_into() raise NotImplementedError. diff --git a/Lib/ssl.py b/Lib/ssl.py --- a/Lib/ssl.py +++ b/Lib/ssl.py @@ -355,6 +355,12 @@ class SSLSocket(socket): else: return socket.sendto(self, data, flags_or_addr, addr) +def sendmsg(self, *args, **kwargs): +# Ensure programs don't send data unencrypted if they try to +# use this method. +raise NotImplementedError("sendmsg not allowed on instances of %s" % + self.__class__) + def sendall(self, data, flags=0): self._checkClosed() if self._sslobj: @@ -413,6 +419,14 @@ class SSLSocket(socket): else: return socket.recvfrom_into(self, buffer, nbytes, flags) +def recvmsg(self, *args, **kwargs): +raise NotImplementedError("recvmsg not allowed on instances of %s" % + self.__class__) + +def recvmsg_into(self, *args, **kwargs): +raise NotImplementedError("recvmsg_into not allowed on instances of " + "%s" % self.__class__) + def pending(self): self._checkClosed() if self._sslobj: diff --git a/Lib/test/test_ssl.py b/Lib/test/test_ssl.py --- a/Lib/test/test_ssl.py +++ b/Lib/test/test_ssl.py @@ -1651,6 +1651,11 @@ else: # consume data s.read() +self.assertRaises(NotImplementedError, s.sendmsg, [b"data"]) +self.assertRaises(NotImplementedError, s.recvmsg, 100) +self.assertRaises(NotImplementedError, + s.recvmsg_into, bytearray(100)) + s.write(b"over\n") s.close() finally: ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9569] Add tests for posix.mknod() and posix.mkfifo()
New submission from David Watson : Attaching simple tests for these functions, which aren't currently tested. -- components: Extension Modules files: test-mknod-mkfifo-3.x.diff keywords: patch messages: 113609 nosy: baikie priority: normal severity: normal status: open title: Add tests for posix.mknod() and posix.mkfifo() type: feature request Added file: http://bugs.python.org/file18478/test-mknod-mkfifo-3.x.diff ___ Python tracker <http://bugs.python.org/issue9569> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9569] Add tests for posix.mknod() and posix.mkfifo()
Changes by David Watson : Added file: http://bugs.python.org/file18479/test-mknod-mkfifo-2.x.diff ___ Python tracker <http://bugs.python.org/issue9569> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9570] PEP 383: os.mknod() and os.mkfifo() do not accept surrogateescape arguments
New submission from David Watson : These functions still use the "s" format for their arguments; the attached patch fixes them to use PyUnicode_FSConverter() in 3.2. Some simple tests for these functions (not for PEP 383 behaviour) are at issue #9569. -- components: Extension Modules files: mknod-mkfifo-pep383-3.2.diff keywords: patch messages: 113611 nosy: baikie priority: normal severity: normal status: open title: PEP 383: os.mknod() and os.mkfifo() do not accept surrogateescape arguments type: behavior versions: Python 3.1, Python 3.2 Added file: http://bugs.python.org/file18480/mknod-mkfifo-pep383-3.2.diff ___ Python tracker <http://bugs.python.org/issue9570> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9579] In 3.x, os.confstr() returns garbage if value is longer than 255 bytes
New submission from David Watson : It may be hard to find a configuration string this long, but you can see the problem if you apply the attached confstr-reduce-bufsize.diff to reduce the size of the local array buffer that posix_confstr() uses. With it applied: >>> import os >>> print(ascii(os.confstr("CS_PATH"))) '\x00\ucbcb\ucbcb\ucbcb\ucbcb\ucbcb\ucbcb\ucbcb\ucbcb\ucbcb\ucbcb\ucbcb\ucbcb' The problem arises because the code first tries to receive the configuration string into the local buffer (char buffer[256], reduced to char buffer[1] above), but then tries to receive it directly into a string object if it doesn't fit. You can see what's gone wrong by comparing the working code in 2.x: if ((unsigned int)len >= sizeof(buffer)) { result = PyString_FromStringAndSize(NULL, len-1); if (result != NULL) confstr(name, PyString_AS_STRING(result), len); } else result = PyString_FromStringAndSize(buffer, len-1); with the code in 3.x: if ((unsigned int)len >= sizeof(buffer)) { result = PyUnicode_FromStringAndSize(NULL, len-1); if (result != NULL) confstr(name, _PyUnicode_AsString(result), len); } else result = PyUnicode_FromStringAndSize(buffer, len-1); Namely, that in 3.x it tries to receive the string into the bytes object returned by _PyUnicode_AsString(), not the str object it has just allocated (which has the wrong format anyway - Py_UNICODE as opposed to char). The attached confstr-long-result.diff fixes this by allocating a separate buffer when necessary to receive the result, before creating the string object from it. By putting the confstr() call and allocation in a loop, it also handles the possibility that the value's length might change between calls. -- components: Extension Modules files: confstr-reduce-bufsize.diff keywords: patch messages: 113699 nosy: baikie priority: normal severity: normal status: open title: In 3.x, os.confstr() returns garbage if value is longer than 255 bytes type: behavior versions: Python 3.1, Python 3.2 Added file: http://bugs.python.org/file18486/confstr-reduce-bufsize.diff ___ Python tracker <http://bugs.python.org/issue9579> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9579] In 3.x, os.confstr() returns garbage if value is longer than 255 bytes
Changes by David Watson : Added file: http://bugs.python.org/file18487/confstr-long-result.diff ___ Python tracker <http://bugs.python.org/issue9579> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9580] os.confstr() doesn't decode result according to PEP 383
New submission from David Watson : The attached patch applies on top of the patch from issue #9579 to make it use PyUnicode_DecodeFSDefaultAndSize(). (You could use it in the existing code, but until that issue is fixed, there is sometimes nothing to decode!) -- components: Extension Modules files: confstr-pep383.diff keywords: patch messages: 113700 nosy: baikie priority: normal severity: normal status: open title: os.confstr() doesn't decode result according to PEP 383 type: behavior versions: Python 3.1, Python 3.2 Added file: http://bugs.python.org/file18488/confstr-pep383.diff ___ Python tracker <http://bugs.python.org/issue9580> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9579] In 3.x, os.confstr() returns garbage if value is longer than 255 bytes
David Watson added the comment: The returned string should also be decoded with the file system encoding and surrogateescape error handler, as per PEP 383 - there's a patch at issue #9580 to do this. -- ___ Python tracker <http://bugs.python.org/issue9579> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9569] Add tests for posix.mknod() and posix.mkfifo()
David Watson added the comment: I'm not quite sure what you mean, but the man page for FreeBSD 5.3 specifies EPERM for an unprivileged user and EINVAL for an attempt to create something other than a device node. POSIX requires creating a FIFO to work for any user, and just says that EINVAL is for an "invalid argument". -- ___ Python tracker <http://bugs.python.org/issue9569> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9569] Add tests for posix.mknod() and posix.mkfifo()
David Watson added the comment: OK, these patches work on FreeBSD 5.3 (root and non-root) if you want to check the errno. I don't know what other systems might return though. I did also find that the 2.x tests were failing on cleanup because the test class used os.unlink rather than support.unlink (which ignores missing files) as its 3.x counterpart does, so I've updated the patch to change that as well. -- Added file: http://bugs.python.org/file18489/test-mknod-mkfifo-2.x-2.diff ___ Python tracker <http://bugs.python.org/issue9569> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9569] Add tests for posix.mknod() and posix.mkfifo()
Changes by David Watson : Added file: http://bugs.python.org/file18490/add-errno-check-2.x.diff ___ Python tracker <http://bugs.python.org/issue9569> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9569] Add tests for posix.mknod() and posix.mkfifo()
Changes by David Watson : Added file: http://bugs.python.org/file18491/add-errno-check-3.x.diff ___ Python tracker <http://bugs.python.org/issue9569> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9580] os.confstr() doesn't decode result according to PEP 383
David Watson added the comment: The CS_PATH variable is a colon-separated list of directories ("the value for the PATH environment variable that finds all standard utilities"), so the file system encoding is certainly correct there. I don't see any reference to an encoding in the POSIX spec for confstr(). -- ___ Python tracker <http://bugs.python.org/issue9580> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9579] In 3.x, os.confstr() returns garbage if value is longer than 255 bytes
David Watson added the comment: I don't see why confstr() values shouldn't change. sysconf() values can change between calls, IIRC. Implementations can also define their own confstr variables - they don't have to stick to the POSIX stuff. And using a loop means the confstr() call only appears once in the source, which is more elegant, right? :) -- ___ Python tracker <http://bugs.python.org/issue9579> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9603] os.ttyname() and os.ctermid() don't decode result according to PEP 383
New submission from David Watson : These functions each return the path to a terminal, so they should use PyUnicode_DecodeFSDefault(). Patch attached. -- components: Extension Modules files: ttyname-ctermid-pep383.diff keywords: patch messages: 113920 nosy: baikie priority: normal severity: normal status: open title: os.ttyname() and os.ctermid() don't decode result according to PEP 383 type: behavior versions: Python 3.1, Python 3.2 Added file: http://bugs.python.org/file18529/ttyname-ctermid-pep383.diff ___ Python tracker <http://bugs.python.org/issue9603> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9604] os.initgroups() doesn't accept PEP 383 usernames returned by pwd module
New submission from David Watson : The pwd module decodes usernames using PyUnicode_DecodeFSDefault(), so initgroups() should use PyUnicode_FSConverter() for the username. Patch attached. -- components: Extension Modules files: initgroups-pep383.diff keywords: patch messages: 113921 nosy: baikie priority: normal severity: normal status: open title: os.initgroups() doesn't accept PEP 383 usernames returned by pwd module type: behavior versions: Python 3.2 Added file: http://bugs.python.org/file18530/initgroups-pep383.diff ___ Python tracker <http://bugs.python.org/issue9604> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9605] os.getlogin() should use PEP 383 decoding to match the pwd module
New submission from David Watson : The pwd module decodes usernames with PyUnicode_DecodeFSDefault(), and the LOGNAME environment variable (suggested as an alternative to getlogin()) is decoded the same way. Attaching a patch to use PyUnicode_DecodeFSDefault() in getlogin(). -- components: Extension Modules files: getlogin-pep383.diff keywords: patch messages: 113922 nosy: baikie priority: normal severity: normal status: open title: os.getlogin() should use PEP 383 decoding to match the pwd module type: behavior versions: Python 3.1, Python 3.2 Added file: http://bugs.python.org/file18531/getlogin-pep383.diff ___ Python tracker <http://bugs.python.org/issue9605> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9580] os.confstr() doesn't decode result according to PEP 383
David Watson added the comment: > CS_PATH is hardcoded to "/bin:/usr/bin" in the GNU libc for UNIX. Do you know > another key for which the value can be controled by the user (or the system > administrator)? No, not a specific example, but CS_PATH could conceivably refer to some POSIX compatibility suite that's been installed in a non-ASCII location, and implementations can add their own variables for whatever they want. > CS_PATH is just an example, there are other keys. I'm not sure that all > values > are encoded to the filesystem encodings, it might be another encoding? > > Well, if we really doesn't know the encoding, a solution is to use a bytes > API > (which may avoid the question of the usage of the PEP 383). The other variables defined by POSIX refer to environment variables and command-line options for the C compiler and the getconf utility, all of which would use the FS encoding in Python, but I agree there's no way to know the appropriate encoding in general, or even whether anything cares about encodings. Personally, I have no objections to making it return bytes. -- ___ Python tracker <http://bugs.python.org/issue9580> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9579] In 3.x, os.confstr() returns garbage if value is longer than 255 bytes
David Watson added the comment: > I just fear that the loop is "endless". Imagine the worst case: confstr() > returns a counter (n, n+1, n+2, ...). In 64 bits, it can be long. The returned length is supposed to be determined by the length of the variable, not the length of the buffer passed by the caller, so I don't see why the OS would have a bug like that, and it would probably be exposed by the test suite anyway (there's currently a simple test using CS_PATH). > I would prefer to see a condition to stop after 2 steps. It should maybe stop > when an error at the 3rd step. That is, raise an exception? Yeah, possibly, but I think it's better to just believe what the OS tells you rather than have an exception that's only raised once in a blue moon for something that may just be a low-probability event, and not an error. -- ___ Python tracker <http://bugs.python.org/issue9579> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9644] PEP 383: os.statvfs() does not accept surrogateescape arguments
New submission from David Watson : The statvfs() function still converts its argument with the "s" format; the attached patch (for 3.2) fixes it to use PyUnicode_FSConverter(). -- components: Extension Modules files: statvfs-pep383-3.2.diff keywords: patch messages: 114392 nosy: baikie priority: normal severity: normal status: open title: PEP 383: os.statvfs() does not accept surrogateescape arguments type: behavior versions: Python 3.1, Python 3.2 Added file: http://bugs.python.org/file18578/statvfs-pep383-3.2.diff ___ Python tracker <http://bugs.python.org/issue9644> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9645] PEP 383: os.pathconf() does not accept surrogateescape arguments
New submission from David Watson : The pathconf() function still converts its argument with the "s" format; the attached pathconf-pep383-3.2.diff fixes it to use PyUnicode_FSConverter() (in 3.2). Also attaching pathconf-cleanup.diff to clean up the indentation, which otherwise makes the code rather confusing to look at. -- components: Extension Modules files: pathconf-pep383-3.2.diff keywords: patch messages: 114393 nosy: baikie priority: normal severity: normal status: open title: PEP 383: os.pathconf() does not accept surrogateescape arguments type: behavior versions: Python 3.1, Python 3.2 Added file: http://bugs.python.org/file18579/pathconf-pep383-3.2.diff ___ Python tracker <http://bugs.python.org/issue9645> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9645] PEP 383: os.pathconf() does not accept surrogateescape arguments
Changes by David Watson : Added file: http://bugs.python.org/file18580/pathconf-cleanup.diff ___ Python tracker <http://bugs.python.org/issue9645> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9647] os.confstr() does not handle value changing length between calls
New submission from David Watson : This came up in relation to issue #9579; there is some discussion of it there. Basically, if os.confstr() has to call confstr() twice because the buffer wasn't big enough the first time, the existing code assumes the string is the same length that the OS reported in the first call instead of using the length from the second call and resizing the buffer if necessary. This means the returned value will be truncated or contain trailing garbage if the string changed its length betweeen calls. I don't know of an actual environment where configuration strings can change at runtime, but it's not forbidden by POSIX as far as I can see (the strings are described as "variables", after all, and sysconf() values such as CHILD_MAX can change at runtime). Implementations can also provide additional confstr() variables not specified by POSIX. The patch confstr-long-result.diff at issue #9579 would fix this (for 3.x), but Victor Stinner has expressed concern that a buggy confstr() could create a near-infinite loop with that patch applied. -- components: Extension Modules messages: 114396 nosy: baikie priority: normal severity: normal status: open title: os.confstr() does not handle value changing length between calls type: behavior versions: Python 2.6, Python 2.7, Python 3.1, Python 3.2 ___ Python tracker <http://bugs.python.org/issue9647> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9579] In 3.x, os.confstr() returns garbage if value is longer than 255 bytes
David Watson added the comment: I've opened a separate issue for the changing-length problem (issue #9647; it affects 2.x as well). Here is a patch that fixes the 255-byte issue only, and has similar results to the 2.x code if the value changes length between calls (except that it could raise a UnicodeError if the string is truncated inside a multibyte character encoding). -- Added file: http://bugs.python.org/file18581/confstr-minimal.diff ___ Python tracker <http://bugs.python.org/issue9579> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9580] os.confstr() doesn't decode result according to PEP 383
David Watson added the comment: I wrote this patch to make confstr() return bytes (with code similar to 2.x), and document the change in "Porting to Python 3.2" and elsewhere, but it then occurred to me that you might have been talking about making a separate bytes API like os.environb. Which did you have in mind? There is another option for a str API, which is to decode the value as ASCII with the surrogateescape error handler. The returned string will then round-trip correctly through PyUnicode_FSConverter(), etc., as long as the file system encoding is compatible with ASCII, which PEP 383 requires it to be. This is how undecodable command line arguments are currently handled when mbrtowc() is unavailable. -- Added file: http://bugs.python.org/file18582/confstr-bytes-3.2.diff ___ Python tracker <http://bugs.python.org/issue9580> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9660] PEP 383: socket module doesn't handle undecodable protocol or service names
New submission from David Watson : The protocol and service/port number databases are typically implemented as text files on Unix and can contain non-ASCII names in any encoding (presumably for local services), but the socket module tries to decode them as strict UTF-8. In particular, getservbyport() and getnameinfo() will raise UnicodeError when this fails. Attached is a patch for 3.2 to use the file system encoding and surrogateescape handler instead, in line with PEP 383. This is what Python already does for the passwd and group databases, and it will allow protocol and service names to be given correctly as command line arguments. -- components: Extension Modules files: proto-service-pep383-3.2.diff keywords: patch messages: 114687 nosy: baikie priority: normal severity: normal status: open title: PEP 383: socket module doesn't handle undecodable protocol or service names type: behavior versions: Python 3.1, Python 3.2 Added file: http://bugs.python.org/file18608/proto-service-pep383-3.2.diff ___ Python tracker <http://bugs.python.org/issue9660> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names
David Watson added the comment: I noticed that try-surrogateescape-first.diff missed out one of the string references that needed to be changed to point to the bytes object, and also used PyBytes_AS_STRING() in an unlocked section. This version fixes these things by taking the generally safer approach of setting the original char * variable to the hostname immediately after using hostname_converter(). -- Added file: http://bugs.python.org/file18609/try-surrogateescape-first-3.diff ___ Python tracker <http://bugs.python.org/issue9377> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1027206] unicode DNS names in socket, urllib, urlopen
David Watson added the comment: Updated the socket module patch to include gethostbyaddr() - it happens to accept hostnames and is used this way in the standard library. -- Added file: http://bugs.python.org/file18610/socket-idna.diff ___ Python tracker <http://bugs.python.org/issue1027206> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9660] PEP 383: socket module doesn't handle undecodable protocol or service names
David Watson added the comment: Come to think of it, I'm not sure if the patch is correct for Windows, as PyUnicode_DecodeFSDefault() appears to do strict MBCS decoding by default (similarly with PyUnicode_FSConverter() for encoding). Can Windows return service names that won't decode with MBCS? Or does it use a different encoding? I don't have a system to experiment with. -- ___ Python tracker <http://bugs.python.org/issue9660> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1027206] unicode DNS names in socket, urllib, urlopen
David Watson added the comment: > Thanks for the patch. Committed as r84261. > > I'm not sure what the point is of supporting IDNA in getnameinfo, so I have > removed that from the patch. If you think it's needed, please elaborate. I don't see the point of it either, but if it's not supposed to accept hostnames, it should use AI_NUMERICHOST in the call it makes to getaddrinfo(). As it is, it does both forward and reverse lookups when called with a hostname. Attaching a patch to use AI_NUMERICHOST. Also, this issue # isn't really resolved yet as Python does not support IRIs (AFAIK). -- Added file: http://bugs.python.org/file18615/getnameinfo-numerichost.diff ___ Python tracker <http://bugs.python.org/issue1027206> ___diff --git a/Modules/socketmodule.c b/Modules/socketmodule.c --- a/Modules/socketmodule.c +++ b/Modules/socketmodule.c @@ -3969,6 +3969,7 @@ socket_getnameinfo(PyObject *self, PyObj memset(&hints, 0, sizeof(hints)); hints.ai_family = AF_UNSPEC; hints.ai_socktype = SOCK_DGRAM; /* make numeric port happy */ +hints.ai_flags = AI_NUMERICHOST;/* don't do any name resolution */ Py_BEGIN_ALLOW_THREADS ACQUIRE_GETADDRINFO_LOCK error = getaddrinfo(hostp, pbuf, &hints, &res); ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names
David Watson added the comment: > Is this patch in response to an actual problem, or a theoretical problem? > If "actual problem": what was the specific application, and what was the > specific host name? It's about environments, not applications - the local network may be configured with non-ASCII bytes in hostnames (either in the local DNS *or* a different lookup mechanism - I mentioned /etc/hosts as a simple example), or someone might deliberately connect from a garbage hostname as a denial of service attack against a server which tries to look it up with gethostbyaddr() or whatever (this may require a "non-strict" resolver library, as noted above). > If theoretical, I recommend to close it as "won't fix". I find it perfectly > reasonable if Python's socket module gives an error if the hostname can't be > clearly decoded. Applications that run into it as a result of gethostbyaddr > should treat that as "no reverse name available". There are two points here. One is that the decoding can fail; I do think that programmers will find this surprising, and the fact that Python refuses to return what was actually received is a regression compared to 2.x. The other is that the encoding and decoding are not symmetric - hostnames are being decoded with UTF-8 but encoded with IDNA. That means that when a decoded hostname contains a non-ASCII character which is not prohibited by IDNA/Nameprep, that string will, when used in a subsequent call, not refer to the hostname that was actually received, because it will be re-encoded using a different codec. Attaching a refreshed version of try-surrogateescape-first.diff. I've separated out the change to getnameinfo() as it may be superfluous (issue #1027206). -- Added file: http://bugs.python.org/file18616/try-surrogateescape-first-4.diff Added file: http://bugs.python.org/file18617/try-surrogateescape-first-getnameinfo-4.diff ___ Python tracker <http://bugs.python.org/issue9377> ___Accept ASCII/surrogateescape strings as hostname arguments. diff --git a/Doc/library/socket.rst b/Doc/library/socket.rst --- a/Doc/library/socket.rst +++ b/Doc/library/socket.rst @@ -49,6 +49,28 @@ supported. The address format required b automatically selected based on the address family specified when the socket object was created. +When a hostname is returned by a system interface, it is decoded into +a string using the ``'ascii'`` codec and the ``'surrogateescape'`` +error handler; this leaves ASCII bytes as ASCII, including IDNA +ASCII-compatible encodings (see :mod:`encodings.idna`), but converts +any non-ASCII bytes to the Unicode lone surrogate codes +U+DC80...U+DCFF. + +Hostname arguments can be passed as strings or :class:`bytes` objects. +The latter are passed to the system unchanged, while strings are +encoded as follows: if a string contains only ASCII characters and/or +the Unicode lone surrogate codes U+DC80...U+DCFF, it is encoded using +the ``'ascii'`` codec and the ``'surrogateescape'`` error handler; +otherwise it is converted to IDNA ASCII-compatible form using the +``'idna'`` codec, and if this is not possible, :exc:`UnicodeError` is +raised. + +.. versionchanged:: XXX + Previously, hostnames were decoded as UTF-8 and encoded using IDNA + or UTF-8; ``surrogateescape`` was not used; some interfaces + formerly accepted :class:`bytearray` objects, or did not accept + :class:`bytes` objects. + For IPv4 addresses, two special forms are accepted instead of a host address: the empty string represents :const:`INADDR_ANY`, and the string ``''`` represents :const:`INADDR_BROADCAST`. The behavior is not diff --git a/Lib/test/test_socket.py b/Lib/test/test_socket.py --- a/Lib/test/test_socket.py +++ b/Lib/test/test_socket.py @@ -322,6 +322,51 @@ class GeneralModuleTests(unittest.TestCa except socket.error: pass +def tryHostnameArgs(self, function, notfounderror): +# Call the given one-argument function with various valid and +# invalid representations of nonexistent hostnames. Check +# that it raises notfounderror for valid representations, and +# UnicodeError for invalid ones. + +# An RFC 1123-compliant host name (".invalid" TLD is reserved +# under RFC 2606). +self.assertRaises(notfounderror, function, "host.domain.invalid") +# Previous name as a bytes object. +self.assertRaises(notfounderror, function, b"host.domain.invalid") +# A domain name with a non-ASCII octet, as bytes. +self.assertRaises(notfounderror, function, b"\xff.domain.invalid") +# Previous domain name as ASCII/surrogateescape string representation. +self.assertRaises(notfounderror, f
[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names
David Watson added the comment: > > It's about environments, not applications > > Still, my question remains. Is it a theoretical problem (i.e. one > of your imagination), or a real one (i.e. one you observed in real > life, without explicitly triggering it)? If real: what was the > specific environment, and what was the specific host name? Yes, I did reproduce the problem on my own system (Ubuntu 8.04). No, it is not from a real application, nor do I know anyone with their network configured like this (except possibly Dan "djbdns" Bernstein: http://cr.yp.to/djbdns/idn.html ). I reported this bug to save anyone who *is* in such an environment from crashing applications and erroneous name resolution. > > That means that when a decoded hostname contains a non-ASCII > > character which is not prohibited by IDNA/Nameprep, that string > > will, when used in a subsequent call, not refer to the hostname > > that was actually received, because it will be re-encoded using a > > different codec. > > Again, I fail to see the problem in this. It won't happen in > real life. However, if you worried that this could be abused, > I think it should decode host names as ASCII, not as UTF-8. > Then it will be symmetric again (IIUC). That would be an improvement. The idea of the patches I posted is to combine this with the existing surrogateescape mechanism, which handles situations like this perfectly well. I don't see how getting a UnicodeError is better than getting a string with some lone surrogates in it. In fact, it was my understanding of PEP 383 that it is in fact better to get the lone surrogates. -- title: socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names -> socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names ___ Python tracker <http://bugs.python.org/issue9377> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1512163] mailbox (2.5b1): locking doesn't work (esp. on FreeBSD)
David Watson added the comment: > Is this still an issue on later versions of Python and/or FreeBSD? Yes, there is still an issue. There is no longer a deadlock on FreeBSD because the module been changed to use only lockf() and dot-locking (on all platforms), but the issue is now about how users can enable other locking mechanisms that they need, such as flock(), without causing a deadlock on platforms where they refer to the same lock as lockf(). They can't just override the classes' .lock() and .unlock() methods, because some parts of the code perform locking operations directly without calling those methods. -- ___ Python tracker <http://bugs.python.org/issue1512163> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names
David Watson added the comment: > The surrogateescape mechanism is a very hackish approach, and > violates the principle that errors should never pass silently. I don't see how a name resolution API returning non-ASCII bytes would indicate an error. If the host table contains a non-ASCII byte sequence for a host, then that is the host's name - it works just as well as an ASCII name, both forwards and backwards. What is hackish is representing char * data as a Unicode string when there is no native Unicode API to feed it to - there is no issue here such as file names being bytes on Unix and Unicode on Windows, so the clean thing to do would be to return a bytes object. I suggested the surrogateescape mechanism in order to retain backwards compatibility. > However, it solves a real problem - people do run into the problem > with file names every day. With this problem, I'd say "if it hurts, > don't do it, then". But to be more explicit, that's like saying "if it hurts, get your sysadmin to reconfigure the company network". -- title: socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names -> socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names ___ Python tracker <http://bugs.python.org/issue9377> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names
David Watson added the comment: > > I don't see how a name resolution API returning non-ASCII bytes > > would indicate an error. > > It's in violation of RFC 952 (slightly relaxed by RFC 1123). That's bad if it's on the public Internet, but it's not an error. The OS is returning the name by which it knows the host. If you look at POSIX, you'll see that what getaddrinfo() and getnameinfo() look up and return is referred to as a "node name", which can be an address string or a "descriptive name", and that if used with Internet address families, descriptive names "include" host names. It doesn't say that the string can only be an address string or a hostname (RFC 1123 compliant or otherwise). > > But to be more explicit, that's like saying "if it hurts, get > > your sysadmin to reconfigure the company network". > > Which I consider perfectly reasonable. The sysadmin should have > known (and, in practice, *always* knows) not to do that in the first > place (the larger the company, the more cautious the sysadmin). It's not reasonable when addressed to a customer who might go elsewhere. And I still don't see a technical reason for making such a demand. Python 2.x seems to work just fine using 8-bit strings. -- title: socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names -> socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names ___ Python tracker <http://bugs.python.org/issue9377> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names
David Watson added the comment: OK, I still think this issue should be addressed, but here is a patch for the part we agree on: that decoding should not return any Unicode characters except ASCII. -- Added file: http://bugs.python.org/file18674/decode-strict-ascii.diff ___ Python tracker <http://bugs.python.org/issue9377> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names
David Watson added the comment: The rest of the issue could also be straightforwardly addressed by adding bytes versions of the name lookup APIs. Attaching a patch which does that (applies on top of decode-strict-ascii.diff). -- Added file: http://bugs.python.org/file18675/hostname-bytes-apis.diff ___ Python tracker <http://bugs.python.org/issue9377> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names
Changes by David Watson : Removed file: http://bugs.python.org/file18675/hostname-bytes-apis.diff ___ Python tracker <http://bugs.python.org/issue9377> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names
David Watson added the comment: Oops, forgot to refresh the last change into that patch. This should fix it. -- Added file: http://bugs.python.org/file18676/hostname-bytes-apis.diff ___ Python tracker <http://bugs.python.org/issue9377> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9777] test_socket.GeneralModuleTests.test_idna should require the "network" resource
New submission from David Watson : This test requires network access as it tries to resolve a domain name at python.org. Patch attached. -- components: Tests files: idna-test-resource.diff keywords: patch messages: 115593 nosy: baikie priority: normal severity: normal status: open title: test_socket.GeneralModuleTests.test_idna should require the "network" resource type: behavior versions: Python 3.2 Added file: http://bugs.python.org/file18751/idna-test-resource.diff ___ Python tracker <http://bugs.python.org/issue9777> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8372] socket: Buffer overrun while reading unterminated AF_UNIX addresses
David Watson added the comment: > > baikie: why did the test pass for you? > > The test passes (I assume) if linux-pass-unterminated.diff is applied. The > latter patch is only meant to exhibit the issue, though, not to be checked in. No, I meant for linux-pass-unterminated.diff to be checked in so that applications could always send datagrams back to the address they got them from, even when it was 108 bytes long. As it is run only on Linux, testMaxPathLen does not leave space for a null terminator because Linux just ignores it (that is what makes it possible to bind to a 108-byte address and thus trigger the bug). -- title: socket: Buffer overrun while reading unterminated AF_UNIX addresses -> socket: Buffer overrun while reading unterminated AF_UNIX addresses ___ Python tracker <http://bugs.python.org/issue8372> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8372] socket: Buffer overrun while reading unterminated AF_UNIX addresses
David Watson added the comment: > baikie, coming back to your original message: what precisely makes you > believe that sun_path does not need to be null-terminated on Linux? That's the way I demonstrated the bug - the only way to bind to a 108-byte path is to pass it without null termination, because Linux will not accept an oversized sockaddr_un structure (e.g. a 108-byte path plus null terminator). Also, unless it's on OS/2, Python's existing code never includes the null terminator in the address size it passes to the system call, so a correctly- behaving OS should never see it. However, it does now occur to me that a replacement libc implementation for Linux could try to do something with sun_path during the call and assume it's null-terminated even though the kernel doesn't, so it may be best to keep the null termination requirement after all. In that case, there would be no way to test for the bug from within Python, so the test problems would be somewhat moot (although the test code could still be used by changing UNIX_PATH_MAX from 108 to 107). Attaching four-space-indent versions of the original patches (for 2.x and 3.x), and tests incorporating Antoine's use of assertRaisesRegexp. -- Added file: http://bugs.python.org/file18770/linux-pass-unterminated-4spc.diff Added file: http://bugs.python.org/file18771/return-unterminated-2.x-4spc.diff Added file: http://bugs.python.org/file18772/return-unterminated-3.x-4spc.diff Added file: http://bugs.python.org/file18773/addrlen-2.x-4spc.diff Added file: http://bugs.python.org/file18774/addrlen-3.x-4spc.diff Added file: http://bugs.python.org/file18775/test-2.x-new.diff Added file: http://bugs.python.org/file18776/test-3.x-new.diff ___ Python tracker <http://bugs.python.org/issue8372> ___Allow AF_UNIX pathnames up to the maximum 108 bytes on Linux, since it does not require sun_path to be null terminated. diff --git a/Modules/socketmodule.c b/Modules/socketmodule.c --- a/Modules/socketmodule.c +++ b/Modules/socketmodule.c @@ -1187,27 +1187,16 @@ getsockaddrarg(PySocketSockObject *s, Py addr = (struct sockaddr_un*)addr_ret; #ifdef linux -if (len > 0 && path[0] == 0) { -/* Linux abstract namespace extension */ -if (len > sizeof addr->sun_path) { -PyErr_SetString(socket_error, -"AF_UNIX path too long"); -return 0; -} -} -else -#endif /* linux */ -{ -/* regular NULL-terminated string */ -if (len >= sizeof addr->sun_path) { -PyErr_SetString(socket_error, -"AF_UNIX path too long"); -return 0; -} -addr->sun_path[len] = 0; +if (len > sizeof(addr->sun_path)) { +#else +if (len >= sizeof(addr->sun_path)) { +#endif +PyErr_SetString(socket_error, "AF_UNIX path too long"); +return 0; } addr->sun_family = s->sock_family; memcpy(addr->sun_path, path, len); +memset(addr->sun_path + len, 0, sizeof(addr->sun_path) - len); #if defined(PYOS_OS2) *len_ret = sizeof(*addr); #else When parsing sockaddr_un structures returned by accept(), etc., only examine bytes up to supplied addrlen and do not require null termination. diff --git a/Modules/socketmodule.c b/Modules/socketmodule.c --- a/Modules/socketmodule.c +++ b/Modules/socketmodule.c @@ -998,19 +998,22 @@ makesockaddr(int sockfd, struct sockaddr #if defined(AF_UNIX) case AF_UNIX: { +Py_ssize_t len, splen; struct sockaddr_un *a = (struct sockaddr_un *) addr; +splen = addrlen - offsetof(struct sockaddr_un, sun_path); #ifdef linux -if (a->sun_path[0] == 0) { /* Linux abstract namespace */ -addrlen -= offsetof(struct sockaddr_un, sun_path); -return PyString_FromStringAndSize(a->sun_path, - addrlen); +if (splen > 0 && a->sun_path[0] == 0) { +/* Linux abstract namespace */ +len = splen; } else #endif /* linux */ { -/* regular NULL-terminated string */ -return PyString_FromString(a->sun_path); +/* String, up to null terminator if present */ +for (len = 0; len < splen && a->sun_path[len] != 0; len++) +; } +return PyString_FromStringAndSize(a->sun_path, len); } #endif /* AF_UNIX */ When parsing sockaddr_un structures returned by accept(), etc., only examine bytes up to supplied addrlen and do not require null termination. diff --git a/Modules/socketmodule.c b/Modules/socketmodu
[issue8373] socket: AF_UNIX socket paths not handled according to PEP 383
David Watson added the comment: Updated the patches for Python 3.2 - these are now simpler as they do not support bytearray arguments, as these are no longer used for filenames (the existing code does not support bytearrays either). I've put the docs and tests in one patch, and made separate patches for the code, one for if the linux-pass-unterminated patch from issue #8372 is applied, and one for if it isn't. One point I neglected to comment on before is the ability to specify an address in the Linux abstract namespace as a filesystem-encoded string prefixed with a null character. This may seem strange, but as well as simplifying the code, it does support an actual use case, as on Linux systems the abstract namespace is sometimes used to hold names based on real filesystem paths such as "\x00/var/run/hald/dbus-XAbemUfDyQ", or imaginary ones, such as "\x00/com/ubuntu/upstart". In fact, running "netstat" on my own system did not reveal any non-textual abstract names in use (although they are of course allowed). -- Added file: http://bugs.python.org/file18850/af_unix-pep383-docs-tests-3.2.diff ___ Python tracker <http://bugs.python.org/issue8373> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8373] socket: AF_UNIX socket paths not handled according to PEP 383
Changes by David Watson : Added file: http://bugs.python.org/file18851/af_unix-pep383-3.2-with-linux-unterminated.diff ___ Python tracker <http://bugs.python.org/issue8373> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8373] socket: AF_UNIX socket paths not handled according to PEP 383
Changes by David Watson : Added file: http://bugs.python.org/file18852/af_unix-pep383-3.2-without-linux-unterminated.diff ___ Python tracker <http://bugs.python.org/issue8373> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8372] socket: Buffer overrun while reading unterminated AF_UNIX addresses
David Watson added the comment: I've updated the PEP 383 patches at issue #8373 with separate versions for if the linux-pass-unterminated patch is applied or not. If it's not essential to have unit tests for the overrun issue, I'd suggest applying just the return-unterminated and addrlen patches and omitting linux-pass-unterminated, for now at least. This will leave Linux no worse off than a typical BSD-derived platform. -- ___ Python tracker <http://bugs.python.org/issue8372> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8373] socket: AF_UNIX socket paths not handled according to PEP 383
David Watson added the comment: One of the tests got broken by the removal of sys.setfilesystemencoding(). Replaced it. -- Added file: http://bugs.python.org/file18853/af_unix-pep383-docs-tests-3.2-2.diff ___ Python tracker <http://bugs.python.org/issue8373> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8372] socket: Buffer overrun while reading unterminated AF_UNIX addresses
David Watson added the comment: > With all the effort that went into the patch, I recommend to get it right: if > there is space for the \0, include it. If the string size is exactly 108, and > it's linux, write it unterminated. Else fail. > > As for testing: we should then definitely have a test that, if you can create > an 108 byte unix socket that its socket name is what we said it should be. The attached patches do those things, if I understand you correctly (the test patches add such a test for Linux, and linux-pass-unterminated uses memset() to zero out the area between the end of the actual path and the end of the sun_path array). If you're talking about including the null in the address passed to the system call, that does no harm on Linux, but I think the more common practice is not to include it. The FreeBSD SUN_LEN macro, for instance, is provided to calculate the address length and does not include the null. -- ___ Python tracker <http://bugs.python.org/issue8372> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8372] socket: Buffer overrun while reading unterminated AF_UNIX addresses
David Watson added the comment: I meant to say that FreeBSD provides the SUN_LEN macro, but it turns out that Linux does as well, and its version behaves the same as FreeBSD's. The FreeBSD man pages state that the terminating null is not part of the address: http://www.freebsd.org/cgi/man.cgi?query=unix&apropos=0&sektion=0&manpath=FreeBSD+8.1-RELEASE&format=html The examples in Stevens/Rago's "Advanced Programming in the Unix Environment" also pass address lengths to bind(), etc. that do not include the null. -- ___ Python tracker <http://bugs.python.org/issue8372> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9647] os.confstr() does not handle value changing length between calls
David Watson added the comment: > If I understood correctly, you don't want the value to be truncated if the > variable grows between the two calls to confstr(). Which behaviour would you > expect? A Python exception? A return size larger than the buffer is *supposed* to indicate that the current value is larger than the supplied buffer, so I would just expect it to reallocate the buffer, call confstr() again and return the new value, unless it was known that such a situation indicated an actual problem. In other words, I would not expect it to do anything special. I didn't write the original patch the way I did in order to fix this (potential) bug - it just seemed like the most natural way to write the code. -- ___ Python tracker <http://bugs.python.org/issue9647> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names
David Watson added the comment: > platform.system() fails with UnicodeEncodeError on systems that have their > computer name set to a name containing non-ascii characters. The > implementation of platform.system() uses at some point socket.gethostname() ( > see http://www.pasteall.org/16215 for a stacktrace of such usage) This trace is from a Windows system, where the platform module uses gethostname() in its cross-platform uname() function, which platform.system() and various other functions in the module rely on. On a Unix system, platform.uname() depends on os.uname() working, meaning that these functions still fail when the hostname cannot be decoded, as it is part of os.uname()'s return value. Given that os.uname() is a primary source of information about the platform on Unix systems, this sort of collateral damage from an undecodable hostname is likely to occur in more places. > It would be more than great if this error could be fixed. If another 3.1 > release is planned, preferrably for that. If you'd like to try the surrogateescape patches, they ought to fix this. The relevant patches are ascii-surrogateescape-2.diff, try-surrogateescape-first-4.diff and uname-surrogateescape.diff. -- title: socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names -> socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names ___ Python tracker <http://bugs.python.org/issue9377> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names
David Watson added the comment: > As a further note: I think socket.gethostname() is a special case, since this > is just about a local setting (i.e. not related to DNS). But the hostname *is* commonly intended to be looked up in the DNS or whatever name resolution mechanisms are used locally - socket.getfqdn(), for instance, works by looking up the result using gethostbyaddr() (actually the C function getaddrinfo(), followed by gethostbyaddr()). So I don't see the rationale for treating it differently from the results of gethostbyaddr(), getnameinfo(), etc. POSIX says of the name lookup functions that "in many cases" they are implemented by the Domain Name System, not that they always are, so a name intended for lookup need not be ASCII-only either. > We should then assume that it is encoded in the locale encoding (in > particular, that it is encoded in mbcs on Windows). I can see the point of returning the characters that were intended, but code that looked up the returned name would then have to be changed to re-encode it to bytes to avoid the round-tripping issue when non-ASCII characters are returned. -- ___ Python tracker <http://bugs.python.org/issue9377> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names
David Watson added the comment: > The result from gethostname likely comes out of machine-local > configuration. It may have non-ASCII in it, which is then likely > encoded in the local encoding. When looking it up in DNS, IDNA > should be applied. I would have thought that someone who intended a Unicode hostname to be looked up in its IDNA form would have encoded it using IDNA, rather than an 8-bit encoding - how many C programs would transcode the name that way, rather than just passing the char * from one interface to another? In fact, I would think that non-ASCII bytes in a hostname most probably indicated that a name resolution mechanism other than the DNS was in use, and that the byte string should be passed unaltered just as a typical C program would. > OTOH, output from gethostbyaddr likely comes out of the DNS itself. > Guessing what encoding it may have is futile - other than guessing > that it really ought to be ASCII. Sure, but that doesn't mean the result can't be made to round-trip if it turns out not to be ASCII. The guess that it will be ASCII is, after all, still a guess (as is the guess that it comes from the DNS). > Python's socket module is clearly focused on the internet, and > intends to support that well. So if you pass a non-ASCII > string, it will have to encode that using IDNA. If that's > not what you want to get, tough luck. I don't object to that, but it does force a choice between decoding an 8-bit name for display (e.g. by using the locale encoding), and decoding it to round-trip automatically (e.g. by using ASCII/surrogateescape, with support on the encoding side). Using PyUnicode_DecodeFSDefault() for the hostname or other returned names (thus decoding them for display) would make this issue solvable with programmer intervention - for instance, "socket.gethostbyaddr(socket.gethostname())" could be replaced by "socket.gethostbyaddr(os.fsencode(socket.gethostname()))", but programmers might well neglect to do this, given that no encoding was needed in Python 2. Also, even displaying a non-ASCII name decoded according to the locale creates potential for confusion, as when the user types the same characters into a Python program for lookup (again, barring programmer intervention), they will not represent the same byte sequence as the characters the user sees on the screen (as they will instead represent their IDNA ASCII-compatible equivalent). So overall, I do think it is better to decode names for automatic round-tripping rather than for display, but my main concern is simply that it should be possible to recover the original bytes so that round-tripping is at least possible. PyUnicode_DecodeFSDefault() would accomplish that much at least. -- title: socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names -> socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names ___ Python tracker <http://bugs.python.org/issue9377> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names
David Watson added the comment: > > In fact, I would think that non-ASCII bytes in a hostname most > > probably indicated that a name resolution mechanism other than > > the DNS was in use, and that the byte string should be passed > > unaltered just as a typical C program would. > > I'm not talking about byte strings, but character strings. I mean that passing the str object from socket.gethostname() to the Python lookup function ought to result in the same byte string being passed to the C lookup function as was returned by the C gethostname() function (or else that the programmer must re-encode the str to ensure that that result is obtained). > > I don't object to that, but it does force a choice between > > decoding an 8-bit name for display (e.g. by using the locale > > encoding), and decoding it to round-trip automatically (e.g. by > > using ASCII/surrogateescape, with support on the encoding side). > > In the face of ambiguity, refuse the temptation to guess. Yes, I would interpret that to mean not using the locale encoding for data obtained from the network. That's another reason why the ASCII/surrogateescape scheme appeals to me more. > Well, Python is not C. In Python, you would pass a str, and > expect it to work, which means it will get automatically encoded > with IDNA. I think there might be a misunderstanding here - I've never proposed changing the interpretation of Unicode characters in hostname arguments. The ASCII/surrogateescape scheme I suggested only changes the interpretation of unpaired surrogate codes, as they do not occur in IDNs or any other genuine Unicode data; all IDNs, including those solely consisting of ASCII characters, would be encoded to the same byte sequence as before. ASCII/surrogateescape decoding could also be used without support on the encoding side - that would satisfy the requirement to "refuse the temptation to guess", would allow the original bytes to be recovered, and would mean that attempting to look up a non-ASCII result in str form would raise an exception rather than looking up the wrong name. > Marc-Andre wants gethostname to use the Wide API on Windows, which, > in theory, allows for cases where round-tripping to bytes is > impossible. Well, the name resolution APIs wrapped by Python are all byte-oriented, so if the computer name were to have no bytes equivalent then it wouldn't be possible to resolve it anyway, and an exception rightly ought be raised at some point in the process of trying to do so. -- title: socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names -> socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names ___ Python tracker <http://bugs.python.org/issue9377> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names
David Watson added the comment: I was looking at the MSDN pages linked to above, and these two pages seemed to suggest that Unicode characters appearing in DNS names represented UTF-8 sequences, and that Windows allowed such non-ASCII byte sequences in the DNS by default: http://msdn.microsoft.com/en-us/library/ms724220%28v=VS.85%29.aspx http://msdn.microsoft.com/en-us/library/ms682032%28v=VS.85%29.aspx (See the discussion of DNS_ERROR_NON_RFC_NAME in the latter.) Can anyone confirm if this is the case? The BSD-style gethostname() function can't be returning UTF-8, though, or else the "Nötkötti" example above would have been decoded successfully, given that Python currently uses PyUnicode_FromString(). Also, if GetComputerNameEx() only offers a choice of DNS names or NetBIOS names, and both are byte-oriented underneath (that was my reading of the "Computer Names" page), then presumably there shouldn't be a problem with mapping the result to a bytes equivalent when necessary? -- ___ Python tracker <http://bugs.python.org/issue9377> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names
David Watson added the comment: > > Also, if GetComputerNameEx() only offers a choice of DNS names or > > NetBIOS names, and both are byte-oriented underneath (that was my > > reading of the "Computer Names" page), then presumably there > > shouldn't be a problem with mapping the result to a bytes > > equivalent when necessary? > > They aren't byte-oriented underneath.It depends on whether use > GetComputerNameA or GetComputerNameW whether you get bytes or Unicode. > If bytes, they are converted as if by WideCharToMultiByte using > CP_ACP, which in turn will introduce question marks and the like > for unconvertable characters. Sorry, I didn't mean how Windows constructs the result for the "A" interface - I was talking about Python code being able to map the result from the Unicode interface to the form used in the protocol (e.g. DNS). I believe the proposal is to use the DNS name, so since the DNS is byte oriented, I would have thought that the Unicode "DNS name" result would always have a bytes equivalent that the DNS resolver code would use - perhaps its UTF-8 encoding? -- title: socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names -> socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names ___ Python tracker <http://bugs.python.org/issue9377> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names
David Watson added the comment: > On other platforms, I guess we'll just have to do some trial > and error to see what works and what not. E.g. on Linux it is > possible to set the hostname to a non-ASCII value, but then > the resolver returns an error, so it's not very practical: > > # hostname l\303\266wis > # hostname > löwis > # hostname -f > hostname: Resolver Error 0 (no error) > > Using the IDNA version doesn't help either: > > # hostname xn--lwis-5qa > # hostname > xn--lwis-5qa > # hostname -f > hostname: Resolver Error 0 (no error) I think what's happening here is that simply that you're setting the hostname to something which doesn't exist in the relevant name databases - the man page for Linux's hostname(1) says that "The FQDN is the name gethostbyname(2) returns for the host name returned by gethostname(2).". If the computer's usual name is "newton", that may be why it works and the others don't. It works for me if I add "127.0.0.9 löwis.egenix.com löwis" to /etc/hosts and then set the hostname to "löwis" (all UTF-8): hostname -f prints "löwis.egenix.com", and Python 2's socket.getfqdn() returns the corresponding bytes; non-UTF-8 names work too. (Note that the FQDN must appear before the bare hostname in the /etc/hosts entry, and I used the address 127.0.0.9 simply to avoid a collision with existing entries - by default, Ubuntu assigns the FQDN to 127.0.1.1.) -- title: socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names -> socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names ___ Python tracker <http://bugs.python.org/issue9377> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names
David Watson added the comment: > FWIW, you can do the same on a Linux box, i.e. setup the host name > and domain to some completely bogus values. And as David pointed out, > without also updating the /etc/hosts on the Linux, you always get the > resolver error with hostname -f I mentioned earlier on (which does > a DNS lookup), so there's no real connection to the DNS system on > Linux either. Just to clarify here: there isn't anything special about /etc/hosts; it's handled by a pluggable module which performs hostname lookups in it alongside a similar module for the DNS. glibc's Name Service Switch combines the views provided by the various modules into a single byte-oriented namespace for hostnames according to the settings in /etc/nssswitch.conf (this namespace allows non-ASCII bytes, as the /etc/hosts examples demonstrate). http://www.kernel.org/doc/man-pages/online/pages/man5/nsswitch.conf.5.html http://www.gnu.org/software/libc/manual/html_node/Name-Service-Switch.html It's an extensible system, so people can write their own modules to handle whatever name services they have to deal with, and configure hostname lookup to query them before, after or instead of the DNS. A hostname that is not resolvable in the DNS may be resolvable in one of these. -- title: socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names -> socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names ___ Python tracker <http://bugs.python.org/issue9377> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3023] Problem with invalidly-encoded command-line arguments (Unix)
New submission from David Watson <[EMAIL PROTECTED]>: The error message has no newline at the end: $ LANG=en_GB.UTF-8 python3.0 test.py $'\xff' Could not convert argument 2 to string$ Seriously, though: is this the intended behaviour? If the interpreter just dies when it gets a non-UTF-8 (or whatever) argument, it creates an opportunity for a denial-of-service if some admin is running a Python script via find(1) or similar. And what if you want to run a Python script on some files named in a mixture of charsets (because, say, you just untarred an archive created in a foreign charset)? Could sys.argv not provide bytes objects for those arguments, like os.listdir()? Or (better IMHO) have a separate sys.argv_bytes interface? -- components: Unicode messages: 67608 nosy: baikie severity: normal status: open title: Problem with invalidly-encoded command-line arguments (Unix) type: behavior versions: Python 3.0 ___ Python tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue3023> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3023] Problem with invalidly-encoded command-line arguments (Unix)
David Watson <[EMAIL PROTECTED]> added the comment: Hmm, yes, I see that the open() builtin doesn't accept bytes filenames, though os.open() still does. When I saw that you could pass bytes filenames transparently from os.listdir() to os.open(), I assumed that this was intentional! So what *is* os.listdir() supposed to do when it finds an unconvertible filename? Raise an exception? Pretend the file isn't there? What if someone puts unconvertible strings in the password database? I think this is going to cause real problems for people. ___ Python tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue3023> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6560] socket sendmsg(), recvmsg() methods
David Watson added the comment: Thanks for your interest! I'm actually still working on the patch I posted, docs and a test suite, and I'll post something soon. Yes, you could just use b"".join() with sendmsg() (and get slightly annoyed because it doesn't accept buffers ;) ). I made sendmsg() take multiple buffers because that's the way the system call works, but also to match recvmsg_into(), which gives you the convenience of being able to receive part of the message into a bytearray and part into an array.array("i"), say, if that's how the data is formatted. As you might know, gather-write with sendmsg() can give a performance benefit by letting the kernel assemble the message while copying the data from userspace rather than having userspace copy the data once to form the message and then having the kernel copy it again when the system call is made. I suppose with Python you just need a larger message to see the benefit :) Since it can read from buffers, though, socket.sendmsg() can pull a large chunk of data straight out of an mmap object, say, and attach headers from a bytes object without the mmapped data being touched by Python at all (or even entering userspace, in this case). The patch is for 3.x, BTW - "y*" is valid there (and does take a buffer). As for a good reference, I haven't personally seen one. There's POSIX and RFC 3542, but they don't provide a huge amount of detail. Perhaps the (updated) W. Richard Stevens networking books? I've got the Stevens/Rago second edition of Advanced Programming in the Unix Environment, which discusses FD and credential passing with sendmsg/recvmsg, but not very well (it misuses CMSG_LEN, for one thing). The networking books were updated by different people though, so perhaps they do better. The question of whether to use CMSG_NXTHDR() to step to the next header when constructing the buffer for sendmsg() is a bit murky, in particular. I've assumed that this is the way to do it since the examples in RFC 3542 (and most of the code I've seen generally) use CMSG_FIRSTHDR() to get the initial pointer, but I've found that glibc's CMSG_NXTHDR() can (wrongly, I think) return NULL if the buffer hasn't been zero-filled beforehand (this causes segfaults with the patch I initially posted). @Wim: Yes, the rfc3542 module from that package looks as if it would be usable with these patches - although it's Python 2-only, GPL-only and looks unmaintained. Those kind of ancillary data constructors will actually be needed to make full portable use of sendmsg() and recvmsg() for things like IPv6, SCTP, Linux's socket error queues, etc. The same goes for data for the existing get/setsockopt() methods, in fact - the present suggestion to use the struct module is pretty inadequate when there are typedefs involved and implementations might add and reorder fields, etc. The objects in that package seem a bit overcomplicated, though, messing about with setter methods instead of just subclassing "bytes" and having different constructors to create the object from individual arguments or received bytes (say, ucred(1, 2, 3) or ucred.from_bytes(...)). Maybe the problem of testing patches well has been putting people off so far? Really exercising the system's CMSG_*HDR() macros in particular isn't entirely straightforward. I suppose there's also a reluctance to write tests while still uncertain about how to present the interface - that's another reason why I went for the most general multiple-buffer form of sendmsg()! -- ___ Python tracker <http://bugs.python.org/issue6560> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6560] socket sendmsg(), recvmsg() methods
David Watson added the comment: OK, here's a new version as a work in progress. A lot of the new stuff is uncommented (particularly the support code for the tests), but there are proper docs this time and a fairly complete test suite (but see below). There are a couple of changes to the interface (hopefully the last). The recvmsg() methods no longer receive ancillary data by default, since calling them on an AF_UNIX socket with the old default buffer could allow a malicious sender to send unwanted file descriptors up to receiver's resource limit, and in a multi-threaded program, another thread could then be prevented from opening new file descriptors before the receiving thread had a chance to close the unwanted ones. Since the ancillary buffer size argument is now more likely to need a value, I've moved it to second place; the basic argument order is now the same as in Kalman Gergely's patch. CMSG_LEN() and CMSG_SPACE() are now provided. I've also used socket.error instead of ValueError when rejecting some buffer object/array for being too big to handle, since the system call itself might cause socket.error to be raised for a smaller (oversized) object, failing with EMSGSIZE or whatever. The code is now much more paranoid about checking the results of the CMSG_*() macros, and will raise RuntimeError if it finds its assumptions are not met. I'd still like to add tests for receiving some of the RFC 3542 ancillary data items, especially since the SCM_RIGHTS tests can't always (ever?) test recvmsg() with multiple items (if you send two FD arrays, the OS can coalesce them into a single array before delivering them). -- Added file: http://bugs.python.org/file16417/baikie-hwundram-v2.diff ___ Python tracker <http://bugs.python.org/issue6560> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6560] socket sendmsg(), recvmsg() methods
David Watson added the comment: I just found that the IPv6 tests don't get skipped when IPv6 is available but disabled in the build - you can create IPv6 sockets, but not use them :/ This version fixes the problem. -- Added file: http://bugs.python.org/file16422/baikie-hwundram-v2.1.diff ___ Python tracker <http://bugs.python.org/issue6560> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1027206] unicode DNS names in socket, urllib, urlopen
David Watson added the comment: I was about to report this for the socket module - the gethostbyname(), gethostbyname_ex() and getnameinfo() functions are the only things currently affected in that module as far as I can see. 3.x is affected too - the functions will pass non-ASCII Unicode to the system as UTF-8 there. The attached patch fixes them in 2.x and 3.x. -- keywords: +patch nosy: +baikie versions: +Python 3.2, Python 3.3 Added file: http://bugs.python.org/file16624/idna.diff ___ Python tracker <http://bugs.python.org/issue1027206> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8372] socket: Buffer overrun while reading unterminated AF_UNIX addresses
New submission from David Watson : The makesockaddr() function in the socket module assumes that AF_UNIX addresses have a null-terminated sun_path, but Linux actually allows unterminated addresses using all 108 bytes of sun_path (for normal filesystem sockets, that is, not just abstract addresses). When receiving such an address (e.g. in accept() from a connecting peer), makesockaddr() will run past the end and return extraneous bytes from the stack, or fail because they can't be decoded, or perhaps segfault in extreme cases. This can't currently be tested from within Python as Python also refuses to accept address arguments which would fill the whole of sun_path, but the attached linux-pass-unterminated.diff (for 2.x and 3.x) enables them for Linux. With the patch applied: Python 2.7a4+ (trunk, Apr 8 2010, 18:20:28) [GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu4)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import socket >>> s = socket.socket(socket.AF_UNIX) >>> s.bind("a" * 108) >>> s.getsockname() '\xfa\xbf\xa8)\xfa\xbf\xec\x15\n\x08l\xaaY\xb7\xb8CZ\xb7' >>> len(_) 126 Also attached are some unit tests for use with the above patch, a couple of C programs for checking OS behaviour (you can also see the bug by doing accept() in Python and using the bindconn program), and patches aimed at fixing the problem. Firstly, the return-unterminated-* patches make makesockaddr() scan sun_path for the first null byte as before (if it's not a Linux abstract address), but now stop at the end of the structure as indicated by the addrlen argument. However, there's one more catch before this will work on Linux, which is that Linux system calls return the length of the address they *would* have stored in the structure had there been room for it, which in this case is one byte longer than the official size of a sockaddr_un structure, due to the missing null terminator. The addrlen-* patches handle this by always calling makesockaddr() with the actual buffer size if it is less than the returned length. This silently ignores any truncation, but I'm not sure how to do anything sensible about that, and some operating systems (e.g. FreeBSD) just silently truncate the address anyway and don't return the original length (POSIX doesn't make clear which, if either, behaviour is required). Once these patches are applied, the tests pass. There is one other issue: the patches for 3.x retain the assumption that socket paths are in UTF-8, but they should actually be handled according to PEP 383. I've got a patch for that, but I'll open a separate issue for it since the handling of the Linux abstract namespace isn't documented and there's some slightly unobvious behaviour that people might be depending on. -- components: Extension Modules files: linux-pass-unterminated.diff keywords: patch messages: 102861 nosy: baikie severity: normal status: open title: socket: Buffer overrun while reading unterminated AF_UNIX addresses type: behavior versions: Python 2.5, Python 2.6, Python 2.7, Python 3.1, Python 3.2, Python 3.3 Added file: http://bugs.python.org/file16874/linux-pass-unterminated.diff ___ Python tracker <http://bugs.python.org/issue8372> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8372] socket: Buffer overrun while reading unterminated AF_UNIX addresses
Changes by David Watson : Added file: http://bugs.python.org/file16875/return-unterminated-2.x.diff ___ Python tracker <http://bugs.python.org/issue8372> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8372] socket: Buffer overrun while reading unterminated AF_UNIX addresses
Changes by David Watson : Added file: http://bugs.python.org/file16876/return-unterminated-3.x.diff ___ Python tracker <http://bugs.python.org/issue8372> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8372] socket: Buffer overrun while reading unterminated AF_UNIX addresses
Changes by David Watson : Added file: http://bugs.python.org/file16877/addrlen-2.x.diff ___ Python tracker <http://bugs.python.org/issue8372> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8372] socket: Buffer overrun while reading unterminated AF_UNIX addresses
Changes by David Watson : Added file: http://bugs.python.org/file16878/addrlen-3.x.diff ___ Python tracker <http://bugs.python.org/issue8372> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8372] socket: Buffer overrun while reading unterminated AF_UNIX addresses
Changes by David Watson : Added file: http://bugs.python.org/file16879/test-2.x.diff ___ Python tracker <http://bugs.python.org/issue8372> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8372] socket: Buffer overrun while reading unterminated AF_UNIX addresses
Changes by David Watson : Added file: http://bugs.python.org/file16880/test-3.x.diff ___ Python tracker <http://bugs.python.org/issue8372> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8373] socket: AF_UNIX socket paths not handled according to PEP 383
New submission from David Watson : In 3.x, the socket module assumes that AF_UNIX addresses use UTF-8 encoding - this means, for example, that accept() will raise UnicodeDecodeError if the peer socket path is not valid UTF-8, which could crash an unwary server. Python 3.1.2 (r312:79147, Mar 23 2010, 19:02:21) [GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu4)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from socket import * >>> s = socket(AF_UNIX, SOCK_STREAM) >>> s.bind(b"\xff") >>> s.getsockname() Traceback (most recent call last): File "", line 1, in UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 0: unexpected code byte I'm attaching a patch to handle socket paths according to PEP 383. Normally this would use PyUnicode_FSConverter, but there are a couple of ways in which the address handling currently differs from normal filename handling. One is that embedded null bytes are passed through to the system instead of being rejected, which is needed for the Linux abstract namespace. These abstract addresses are returned as bytes objects, but they can currently be specified as strings with embedded null characters as well. The patch preserves this behaviour. The current code also accepts read-only buffer objects (it uses the "s#" format), so in order to accept these as well as bytearray filenames (which the posix module accepts), the patch simply accepts any single-segment buffer, read-only or not. This patch applies on top of the patches I submitted for issue #8372 (rather than knowingly running past the end of sun_path). -- components: Extension Modules files: af_unix-pep383.diff keywords: patch messages: 102865 nosy: baikie severity: normal status: open title: socket: AF_UNIX socket paths not handled according to PEP 383 type: behavior versions: Python 3.1, Python 3.2, Python 3.3 Added file: http://bugs.python.org/file16881/af_unix-pep383.diff ___ Python tracker <http://bugs.python.org/issue8373> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8373] socket: AF_UNIX socket paths not handled according to PEP 383
David Watson added the comment: This patch does the same thing without fixing issue #8372 (not that I'd recommend that, but it may be easier to review). -- Added file: http://bugs.python.org/file16882/af_unix-pep383-no-8372-fix.diff ___ Python tracker <http://bugs.python.org/issue8373> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8373] socket: AF_UNIX socket paths not handled according to PEP 383
Changes by David Watson : Added file: http://bugs.python.org/file16883/test-existing.diff ___ Python tracker <http://bugs.python.org/issue8373> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8373] socket: AF_UNIX socket paths not handled according to PEP 383
Changes by David Watson : Added file: http://bugs.python.org/file16884/test-new.diff ___ Python tracker <http://bugs.python.org/issue8373> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8373] socket: AF_UNIX socket paths not handled according to PEP 383
Changes by David Watson : Added file: http://bugs.python.org/file16885/af_unix-pep383-doc.diff ___ Python tracker <http://bugs.python.org/issue8373> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8372] socket: Buffer overrun while reading unterminated AF_UNIX addresses
David Watson added the comment: Attaching the C test programs I forgot to attach yesterday - sorry about that. I've also tried these programs, and the patches, on FreeBSD 5.3 (an old version from late 2004). I found that it accepted unterminated addresses as well, and unlike Linux it did not normally null-terminate addresses at all - the existing socket code only worked for addresses shorter than sun_path because it zero-filled the structure beforehand. The return-unterminated patches worked with or without the zero-filling. Unlike Linux, FreeBSD also accepted oversized sockaddr_un structures (sun_path longer than its definition), so just allowing unterminated addresses wouldn't make the full range of addresses usable there. That said, I did get a kernel panic shortly after testing with oversized addresses, so perhaps it's not a good idea to actually use them :) -- Added file: http://bugs.python.org/file16898/bindconn.c ___ Python tracker <http://bugs.python.org/issue8372> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8372] socket: Buffer overrun while reading unterminated AF_UNIX addresses
Changes by David Watson : Added file: http://bugs.python.org/file16899/accept.c ___ Python tracker <http://bugs.python.org/issue8372> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3023] Problem with invalidly-encoded command-line arguments (Unix)
David Watson added the comment: @ Victor Stinner: Yes, the behaviour of those functions is as you describe - it's been changed since I filed this issue. I do consider it an improvement. By the password database, I mean /etc/passwd or replacements that are accessible via getpwnam() and friends. Users are often allowed to change things like the GECOS field, and can generally stick any old junk in there, regardless of encoding. Now that I come to check, it seems that in the Python 3.0 release, the pwd.* functions do raise UnicodeDecodeError when the GECOS field can't be decoded (bizarrely, they try to interpret it as a Python string literal, and thus choke on invalid backslash escapes). Unfortunately, this allows a user to change their GECOS field so that system programs written in Python can't determine the username corresponding to that user's UID or vice versa. ___ Python tracker <http://bugs.python.org/issue3023> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4859] pwd, spwd, grp functions vulnerable to denial of service
New submission from David Watson : The pwd (and spwd and grp) modules deal with data from /etc/passwd (and/or other sources) that can be supplied by users on the system. Specifically, users can often change the data in their GECOS fields without the OS requiring that it conform to a specific encoding, and given some automated account signup system, it's conceivable that arbitrary data could even be placed in the username field. This causes a problem since the functions in these modules try to decode the data into str objects, and if a user has placed data in /etc/passwd, say, that does not conform to the relevant encoding, the function will raise UnicodeDecodeError and thus prevent the program from learning the relevant mapping between username and UID, etc. (or crash the program if it wasn't expecting this). For a system program written in Python, this can amount to a denial of service attack, especially if the program uses the get*all() functions. Currently, the pwd module tries to decode the string fields using the Unicode-escape codec, i.e. like a Python string literal, and this can fail when given an invalid backslash escape. You can see this by running chfn(1), entering something like "\ux" in one of the fields, and then calling pwd.getpwnam(yourname) or pwd.getpwall(). Perhaps the use of this codec is a mistake, given that spwd and grp decode the string fields as UTF-8, but chfn could also be used to enter non-UTF-8 data in the GECOS field. You can see similar failures in the grp and spwd modules after adding a user with a non-UTF-8 name (do something like "useradd $'\xff'" in bash). A debug build of Python also reports a reference counting error in grp (count goes to -1) when its functions fail on non-UTF-8 data; what I think is going on is that in mkgrent(), PyStructSequence_SET_ITEM steals the reference to "w", meaning the second "Py_DECREF(w)" shouldn't be there. Also, getpwall() and getgrall() leave file descriptors open when they fail, since they don't call end*ent() in this case. The attached minor.diff fixes both of these problems, I think. I've also written a patch (bytes.diff, attached) that would add new functions pwd.getpwnamb(), etc. (analogous to os.getcwdb()) to return bytes objects for the text fields, thus avoiding these problems - what do you think? The patch also makes pwd's original string functions use UTF-8 like the other modules. Alternatively or in addition, a quick "fix" for the GECOS problem might be for the pwd module to decode the text fields as Latin-1, since in the absence of backslash escapes this is what the Unicode-escape encoding is equivalent to. This would at least block any DoS attempts using the GECOS field (or attempts to add extra commas with \x2c, etc.) without changing the behaviour much. The attached latin1.diff does this. -- components: Extension Modules files: bytes.diff keywords: patch messages: 79286 nosy: baikie severity: normal status: open title: pwd, spwd, grp functions vulnerable to denial of service type: security versions: Python 3.0, Python 3.1 Added file: http://bugs.python.org/file12621/bytes.diff ___ Python tracker <http://bugs.python.org/issue4859> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4859] pwd, spwd, grp functions vulnerable to denial of service
Changes by David Watson : Added file: http://bugs.python.org/file12622/minor.diff ___ Python tracker <http://bugs.python.org/issue4859> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4859] pwd, spwd, grp functions vulnerable to denial of service
Changes by David Watson : Added file: http://bugs.python.org/file12623/latin1.diff ___ Python tracker <http://bugs.python.org/issue4859> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4859] pwd, spwd, grp functions vulnerable to denial of service
David Watson added the comment: > baikie: Open a separated issue for the refcount error and fd leak. OK. It does affect 2.x as well, come to think of it. > On Ubuntu, it's not possible to create an user with a non-ASCII > name: > > $ sudo adduser é --no-create-home > > adduser: To avoid problems, the username should consist only of... Well, good for Ubuntu :) But you can still add one with the lower-level useradd command, and not everyone uses Ubuntu. > Your patch latin1.diff is wrong Yes, I know it's "wrong" - I just thought of it as a stopgap measure until some sort of bytes functionality is added (since pwd already decodes everything as Latin-1, but tries to interpret backslash escapes). But yeah, if it's going to be changed later, then I suppose there's not much point. > I don't think that it can be called a "denial of service attack". It depends on how the program uses these functions. Obviously Python itself is only vulnerable to a DoS if the interpreter crashes or something, but what I'm saying is that there should be a way for Python programs to access the password database that is not subject to denial of service attacks. If someone changes their GECOS field they can make pwd.getpwall() fail for another user's program, and if the program relies on pwd.getpwall() working, then that's a DoS. ___ Python tracker <http://bugs.python.org/issue4859> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4873] Refcount error and file descriptor leaks in pwd, grp modules
New submission from David Watson : When investigating issue #4859 I found that when pwd.getpwall() and grp.getgrall() fail due to decoding errors, they leave open file descriptors referring to the passwd and group files, since they don't call the end*ent() functions in this case. Also, the grp.* functions have a reference counting error when they fail in this way - a debug build reports that an object's reference count goes to -1. What I think happens is that in mkgrent(), PyStructSequence_SET_ITEM steals the reference to "w", meaning that the "Py_DECREF(w)" call shouldn't be made afterwards. The attached diff fixes both of these problems, I think, and applies to the 2.x and 3.x branches. -- components: Extension Modules files: minor.diff keywords: patch messages: 79378 nosy: baikie severity: normal status: open title: Refcount error and file descriptor leaks in pwd, grp modules type: resource usage Added file: http://bugs.python.org/file12639/minor.diff ___ Python tracker <http://bugs.python.org/issue4873> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com