[issue12958] test_socket failures on Mac OS X

2011-09-12 Thread David Watson

Changes by David Watson :


--
nosy: +baikie

___
Python tracker 
<http://bugs.python.org/issue12958>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12981] rewrite multiprocessing (senfd|recvfd) in Python

2011-09-18 Thread David Watson

David Watson  added the comment:

I had a look at this patch, and the FD passing looked OK, except
that calculating the buffer size with CMSG_SPACE() may allow more
than one file descriptor to be received, with the extra one going
unnoticed - it should use CMSG_LEN() instead (the existing C
implementation has the same problem, I see).

CMSG_SPACE() exists to allow calculating the space required to
hold multiple control messages, so it essentially gives the
offset for the next cmsghdr struct such that any alignment
requirements are satisfied.

64-bit systems will probably want to ensure that all CMSG_DATA()
payloads are aligned on 8-byte boundaries, and so have
CMSG_SPACE(4) == CMSG_SPACE(8) == CMSG_LEN(8) (the Linux headers,
for instance, align to sizeof(size_t)).  So with a 32-bit int, a
buffer size of CMSG_SPACE(sizeof(int)) would allow *two* file
descriptors to be received.  CMSG_LEN() omits the padding, thus
allowing only one.

I'm not familiar with how the FD-passing facility is used in
multiprocessing, but this seems as if it could be an avenue for
DoS attacks that exhaust the number of file descriptors allowed
for the receiving process.

--
nosy: +baikie

___
Python tracker 
<http://bugs.python.org/issue12981>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8623] Aliasing warnings in socketmodule.c

2011-09-18 Thread David Watson

David Watson  added the comment:

For reference, the warnings are partially explained here:

http://gcc.gnu.org/onlinedocs/gcc-4.6.1/gcc/Optimize-Options.html#index-fstrict_002daliasing-825

http://gcc.gnu.org/onlinedocs/gcc-4.6.1/gcc/Warning-Options.html#index-Wstrict_002daliasing-337

I get these warnings with GCC (Ubuntu/Linaro 4.4.4-14ubuntu5)
4.4.5 [i386], plus an additional one from the new recvmsg() code.
I haven't tried GCC 4.5 or later, but as the docs imply, the
warnings will not appear in debugging builds.

I take it GCC is referring to C99 section 6.5, paragraphs 6 and 7
here, but I'm not sure exactly how much these are intended to
prohibit with regard to the (mis)use of unions, or how strictly
GCC actually enforces them.

The attached socket-aliasing-sas2sa.diff is enough to get rid of
the warnings with GCC 4.4.4 - it adds add a "struct sockaddr"
member to the sock_addr_t union type, changes the SAS2SA() macro
to take the address of this member instead of using a cast, and
modifies socket_gethostbyaddr() and socket_gethostbyname_ex() to
use SAS2SA() (sock_recvmsg_guts() already uses it).

Changing SAS2SA() also gets rid of most of the additional
warnings produced by the "aggressive" warning setting
-Wstrict-aliasing=2.  However, the gethostby* functions still
point to the union object with a pointer variable not matching
the type actually stored in it, which the GCC docs warn against.

To be more conservative, socket-aliasing-union-3.2.diff applies
on top to get rid of these pointers, and instead directly access
the union for each use other than providing a pointer argument to
a function.  socket-aliasing-union-recvmsg-3.3.diff does the same
for 3.3, and makes the complained-about line in
sock_recvmsg_guts() access the union directly as well.

One other consideration here is that the different sockaddr_*
struct types used are likely to come under the "common initial
sequence" rule for unions (C99 6.5.2.3, paragraph 5, or section
A8.3 of K&R 2nd ed.), which might make some more questionable
uses valid.  That said, technically POSIX appears to require only
that the s*_family members of the various sockaddr struct types
have the same offset and type, not that they form part of a
common initial sequence (s*_family need not be the first
structure member - the BSDs for instance place it second,
although it can still be part of a common initial sequence).

--
keywords: +patch
nosy: +baikie
versions: +Python 3.3
Added file: http://bugs.python.org/file23186/socket-aliasing-sas2sa.diff

___
Python tracker 
<http://bugs.python.org/issue8623>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8623] Aliasing warnings in socketmodule.c

2011-09-18 Thread David Watson

Changes by David Watson :


Added file: http://bugs.python.org/file23187/socket-aliasing-union-3.2.diff

___
Python tracker 
<http://bugs.python.org/issue8623>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8623] Aliasing warnings in socketmodule.c

2011-09-18 Thread David Watson

Changes by David Watson :


Added file: http://bugs.python.org/file23188/socket-aliasing-union-3.3.diff

___
Python tracker 
<http://bugs.python.org/issue8623>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13001] test_socket.testRecvmsgTrunc failure on FreeBSD 7.2 buildbot

2011-09-18 Thread David Watson

Changes by David Watson :


--
nosy: +baikie

___
Python tracker 
<http://bugs.python.org/issue13001>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12981] rewrite multiprocessing (senfd|recvfd) in Python

2011-09-19 Thread David Watson

David Watson  added the comment:

On Sun 18 Sep 2011, Charles-François Natali wrote:
> > I had a look at this patch, and the FD passing looked OK, except
> > that calculating the buffer size with CMSG_SPACE() may allow more
> > than one file descriptor to be received, with the extra one going
> > unnoticed - it should use CMSG_LEN() instead
> 
> > (the existing C implementation has the same problem, I see).
> 
> I just checked, and the C version uses CMSG_SPACE() as the buffer size, but 
> passes CMSG_LEN() to cmsg->cmsg_len and msg.msg_controllen. Or am I missing 
> something?

Ah, no, you're right - that's fine.  Sorry for the false alarm.

--

___
Python tracker 
<http://bugs.python.org/issue12981>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13022] _multiprocessing.recvfd() doesn't check that file descriptor was actually received

2011-09-20 Thread David Watson

New submission from David Watson :

The function _multiprocessing.recvfd() calls recvmsg() and
expects to receive a file descriptor in an SCM_RIGHTS control
message, but doesn't check that such a control message is
actually present.  So if the sender sends data without an
accompanying file descriptor, recvfd() will the return the
integer value of the uninitialized CMSG_DATA() buffer.

The attached recvfd-check.diff checks for a complete control
message of the correct type, and raises RuntimeError if it isn't
there.  This matches the behaviour of the proposed pure-Python
implementation at issue #12981.

The patch includes a test case, but like the other recently-added
tests for the function, it isn't guarded against
multiprocessing.reduction being unavailable.  Issue #12981 has a
patch "skip_reduction.diff" (already in 3.3) to fix this, and I'm
attaching recvfd-skip-reduction-fix.diff to apply on top of it
and guard the new test case as well.

--
components: Extension Modules
files: recvfd-check.diff
keywords: patch
messages: 144351
nosy: baikie
priority: normal
severity: normal
status: open
title: _multiprocessing.recvfd() doesn't check that file descriptor was 
actually received
type: behavior
versions: Python 2.7, Python 3.2, Python 3.3
Added file: http://bugs.python.org/file23214/recvfd-check.diff

___
Python tracker 
<http://bugs.python.org/issue13022>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13022] _multiprocessing.recvfd() doesn't check that file descriptor was actually received

2011-09-20 Thread David Watson

Changes by David Watson :


Added file: http://bugs.python.org/file23215/recvfd-skip-reduction-fix.diff

___
Python tracker 
<http://bugs.python.org/issue13022>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12981] rewrite multiprocessing (senfd|recvfd) in Python

2011-09-20 Thread David Watson

David Watson  added the comment:

On Tue 20 Sep 2011, Charles-François Natali wrote:

> I committed the patch to catch the ImportError in test_multiprocessing.

This should go in all branches, I think - see issue #13022.

--

___
Python tracker 
<http://bugs.python.org/issue12981>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6560] socket sendmsg(), recvmsg() methods

2011-05-23 Thread David Watson

David Watson  added the comment:

On Mon 23 May 2011, Gergely Kálmán wrote:
> It's been a while I had a look at that code. As far as I remember though 
> the code is fairly decent not
> taking the missing unit tests into account. There are a few todos, and 
> also a pretty bad bug that I've fixed
> but not committed. The TODOs include better parsing of auxiliary data, 
> support for scatter-gather, addressed
> messages. If you wish I can send you the latest patch that has the bug 
> fixed and applies to 3.2.

Erm, have you seen the separately-implemented patch I posted at
http://bugs.python.org/file19962/baikie-hwundram-v5.diff ?  It's
basically complete IIRC.

--

___
Python tracker 
<http://bugs.python.org/issue6560>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8372] socket: Buffer overrun while reading unterminated AF_UNIX addresses

2011-06-16 Thread David Watson

David Watson  added the comment:

On Sun 12 Jun 2011, Charles-François Natali wrote:

> The patches look good to me, except that instead of passing
> (addrlen > buflen) ? buflen : addrlen
> as addrlen argument every time makesockaddr is called, I'd
> prefer if this min was done inside makesockaddr itself,
> i.e. perform min(addrlen, sizeof(struct sockaddr_un)) in the
> AF_UNIX switch case (especially since addrlen is only used for
> AF_UNIX).

Actually, I think it should be clamped at the top of the
function, since the branch for unknown address families ought to
use the length as well (it doesn't, but that's a separate issue).
I'm attaching new patches to do the check in makesockaddr(),
which also change the length parameters from int to socklen_t, in
case the OS returns a really huge value.

I'm also attaching new return-unterminated patches to handle the
possibility that addrlen is unsigned (socklen_t may be unsigned,
and addrlen *is* now unsigned in 3.x).  This entailed specifying
what to do if addrlen < offsetof(struct sockaddr_un, sun_path),
i.e. if the address is truncated at least one byte before the
start of sun_path.

This may well never happen (Python's existing code would raise
SystemError if it did, due to calling
PyString_FromStringAndSize() with a negative length), but I've
made the new patches return None if it does, as None is already
returned if addrlen is 0.  As another precedent of sorts, Linux
currently returns None (i.e. addrlen = 0) when receiving a
datagram from an unbound Unix socket, despite returning an empty
string (i.e. addrlen = offsetof(..., sun_path)) for the same
unbound address in other situations.

(I think the decoders for other address families should also
return None if addrlen is less than the size of the appropriate
struct, but again, that's a separate issue.)

Also, I noticed that on Linux, Python 3.x currently returns empty
addresses as bytes objects rather than strings, whereas the
patches I've provided make it return strings.  In case this
change isn't acceptable for the 3.x maintenance branches, I'm
attaching return-unterminated-3.x-maint-new.diff which still
returns them as bytes (on Linux only).

To sum up the patch order:

2.x:
linux-pass-unterminated-4spc.diff
test-2.x-new.diff
return-unterminated-2.x-new.diff
addrlen-makesockaddr-2.x.diff (or addrlen-2.x-4spc.diff)

3.2:
linux-pass-unterminated-4spc.diff
test-3.x-new.diff
return-unterminated-3.x-maint-new.diff
addrlen-makesockaddr-3.x.diff (or addrlen-3.x-4spc.diff)

default:
linux-pass-unterminated-4spc.diff
test-3.x-new.diff
return-unterminated-3.x-trunk-new.diff
addrlen-makesockaddr-3.x.diff (or addrlen-3.x-4spc.diff)

--
Added file: http://bugs.python.org/file22384/addrlen-makesockaddr-2.x.diff
Added file: http://bugs.python.org/file22385/addrlen-makesockaddr-3.x.diff
Added file: http://bugs.python.org/file22386/return-unterminated-2.x-new.diff
Added file: 
http://bugs.python.org/file22387/return-unterminated-3.x-maint-new.diff
Added file: 
http://bugs.python.org/file22388/return-unterminated-3.x-trunk-new.diff

___
Python tracker 
<http://bugs.python.org/issue8372>
___If accept(), etc. return a larger addrlen than was supplied,
ignore it and use the original buffer length.

diff --git a/Modules/socketmodule.c b/Modules/socketmodule.c
--- a/Modules/socketmodule.c
+++ b/Modules/socketmodule.c
@@ -969,13 +969,22 @@ makebdaddr(bdaddr_t *bdaddr)
 
 /*ARGSUSED*/
 static PyObject *
-makesockaddr(int sockfd, struct sockaddr *addr, int addrlen, int proto)
+makesockaddr(int sockfd, struct sockaddr *addr, socklen_t addrlen,
+ socklen_t buflen, int proto)
 {
 if (addrlen == 0) {
 /* No address -- may be recvfrom() from known socket */
 Py_INCREF(Py_None);
 return Py_None;
 }
+/* buflen is the length of the buffer containing the address, and
+   addrlen is either the same, or is the length returned by the OS
+   after writing an address into the buffer.  Some systems return
+   the length they would have written if there had been space
+   (e.g. when an oversized AF_UNIX address has its sun_path
+   truncated). */
+if (addrlen > buflen)
+addrlen = buflen;
 
 #ifdef __BEOS__
 /* XXX: BeOS version of accept() doesn't set family correctly */
@@ -1632,6 +1641,7 @@ sock_accept(PySocketSockObject *s)
 sock_addr_t addrbuf;
 SOCKET_T newfd;
 socklen_t addrlen;
+socklen_t buflen;
 PyObject *sock = NULL;
 PyObject *addr = NULL;
 PyObject *res = NULL;
@@ -1639,6 +1649,7 @@ sock_accept(PySocketSockObject *s)
 
 if (!getsockaddrlen(s, &addrlen))
 return NULL;
+buflen = addrlen;
 memset(&addrbuf, 0, addrlen);
 
 #ifdef MS_WINDOWS
@@ -1680,7 +1691,7 @@ sock_accept(PySocketSockObject *s)
 goto finally;
 

[issue12835] Missing SSLSocket.sendmsg() wrapper allows programs to send unencrypted data by mistake

2011-08-24 Thread David Watson

New submission from David Watson :

Changeset fd10d042b41d removed the wrappers on ssl.SSLSocket for 
the new socket.send/recvmsg() methods (since I forgot to check 
for the existence of the underlying methods - see issue #6560), 
but this leaves SSLSocket with send/recvmsg() methods inherited 
from the underlying socket type; thus SSLSocket.sendmsg() will 
insert the given data into the stream without encrypting it (or 
wrapping it in SSL in any way). 
 
This immediately screws up the SSL connection, resulting in 
receive errors at both ends ("SSL3_GET_RECORD:wrong version 
number" and the like), but the data is clearly visible in a 
packet capture, so it's too late if it was actually something 
secret. 
 
Correspondingly, recvmsg() and recvmsg_into() return the 
encrypted data, and screw up the connection by removing it from 
the SSL stream. 
 
Of course, these methods don't make sense over SSL anyway, but if 
the programmer naively assumes they do, then ideally they should 
not expose any secret information. 
 
Attaching a patch implementing Antoine Pitrou's suggestion that 
the methods should simply raise NotImplementedError.  I don't 
know if these versions should also be added only if present on 
the underlying socket - they're Not Implemented either way :-)

--
components: Library (Lib)
files: ssl_sendrecvmsg_notimplemented.diff
keywords: patch
messages: 142900
nosy: baikie
priority: normal
severity: normal
status: open
title: Missing SSLSocket.sendmsg() wrapper allows programs to send unencrypted 
data by mistake
versions: Python 3.3
Added file: http://bugs.python.org/file23030/ssl_sendrecvmsg_notimplemented.diff

___
Python tracker 
<http://bugs.python.org/issue12835>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6560] socket sendmsg(), recvmsg() methods

2011-08-24 Thread David Watson

David Watson  added the comment:

On Tue 23 Aug 2011, Nick Coghlan wrote:
> As you can see, I just pushed a change that removed the new
> methods from SSLSocket objects. If anyone wants to step up with
> a valid use case (not already covered by wrap_socket),
> preferably with a patch to add them back that includes proper
> tests and documentation changes, please open a new feature
> request and attach the new patch to that issue.

Hi, sorry about the trouble caused by the broken tests, but
SSLSocket should at least override sendmsg() to stop misguided
programs sending data in the clear:

http://bugs.python.org/issue12835

--

___
Python tracker 
<http://bugs.python.org/issue6560>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12837] Patch for issue #12810 removed a valid check on socket ancillary data

2011-08-24 Thread David Watson

New submission from David Watson :

Changeset 4736e172fa61 for issue #12810 removed the test 
"msg->msg_controllen < 0" from socketmodule.c, where 
msg_controllen happened to be unsigned on the reporter's system. 
 
I included this test deliberately, because msg_controllen may be 
of signed type (POSIX allows socklen_t to be signed, as objects 
of that type historically were - as the Rationale says: "All 
socklen_t types were originally (in BSD UNIX) of type int."). 
 
Attaching a patch to replace the check and add an accompanying 
comment.

--
components: Extension Modules
files: restore_controllen_check.diff
keywords: patch
messages: 142934
nosy: baikie
priority: normal
severity: normal
status: open
title: Patch for issue #12810 removed a valid check on socket ancillary data
type: behavior
versions: Python 3.3
Added file: http://bugs.python.org/file23036/restore_controllen_check.diff

___
Python tracker 
<http://bugs.python.org/issue12837>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12837] Patch for issue #12810 removed a valid check on socket ancillary data

2011-08-25 Thread David Watson

David Watson  added the comment:

On Wed 24 Aug 2011, Charles-François Natali wrote:
> > I included this test deliberately, because msg_controllen may be 
> > of signed type [...] POSIX allows socklen_t to be signed
> 
> http://pubs.opengroup.org/onlinepubs/007908799/xns/syssocket.h.html
> """
>  makes available a type, socklen_t, which is an unsigned opaque 
> integral type of length of at least 32 bits. To forestall portability 
> problems, it is recommended that applications should not use values larger 
> than 2**32 - 1.
> """

That has since been changed.  I'm reading from POSIX.1-2008,
which says:

   The  header shall define the socklen_t type,
   which is an integer type of width of at least 32 bits

http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_socket.h.html

The warning against using values larger than 2**32 - 1 is still
there, I presume because they would not fit in a 32-bit signed
int.

> Also, I'm not convinced by this:
> 
>/* Check for empty ancillary data as old CMSG_FIRSTHDR()
>implementations didn't do so. */
> for (cmsgh = ((msg.msg_controllen > 0) ? CMSG_FIRSTHDR(&msg) : NULL);
>  cmsgh != NULL; cmsgh = CMSG_NXTHDR(&msg, cmsgh)) {
> 
> Did you really have reports of CMSG_NXTHDR not returning NULL upon empty 
> ancillary data (it's also raquired by POSIX)?

I take it you mean CMSG_FIRSTHDR here; RFC 3542 says that:

   One possible implementation could be

  #define CMSG_FIRSTHDR(mhdr) \
  ( (mhdr)->msg_controllen >= sizeof(struct cmsghdr) ? \
(struct cmsghdr *)(mhdr)->msg_control : \
(struct cmsghdr *)NULL )

   (Note: Most existing implementations do not test the value of
   msg_controllen, and just return the value of msg_control...

IIRC, I saw an implementation in old FreeBSD headers that did not
check msg_controllen, and hence did not return NULL as RFC 3542
requires.

Now you come to mention it though, this check in the for loop
does actually protect against the kernel returning a negative
msg_controllen, so the only remaining possibility would be that
the CMSG_* macros fiddle with it.

That said, the fact remains that the compiler warning is spurious
if msg_controllen can be signed on some systems, and I still
don't think decreasing the robustness of the code (particularly
against any future modifications to that code) just for the sake
of silencing a spurious warning is a good thing to do.  People
can read the comment above the "offending" line and see that the
compiler has got it wrong.

--

___
Python tracker 
<http://bugs.python.org/issue12837>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12835] Missing SSLSocket.sendmsg() wrapper allows programs to send unencrypted data by mistake

2011-08-25 Thread David Watson

David Watson  added the comment:

On Thu 25 Aug 2011, Antoine Pitrou wrote:
> Adding an explanation message to the NotImplementedError would be more 
> helpful. Otherwise, good catch.

OK, I've copied the messages from the ValueErrors the other
methods raise.

--
Added file: 
http://bugs.python.org/file23048/ssl_sendrecvmsg_notimplemented-2.diff

___
Python tracker 
<http://bugs.python.org/issue12835>
___# HG changeset patch
# User David Watson 
# Date 1314305189 -3600
# Node ID 23cdc358bbfb0ad40607b1c54bda2f7b5abe39f0
# Parent  80f814dca274b5d848dbd306c1513263e69011ce
Make SSLSocket.sendmsg/recvmsg/recvmsg_into() raise NotImplementedError.

diff --git a/Lib/ssl.py b/Lib/ssl.py
--- a/Lib/ssl.py
+++ b/Lib/ssl.py
@@ -355,6 +355,12 @@ class SSLSocket(socket):
 else:
 return socket.sendto(self, data, flags_or_addr, addr)
 
+def sendmsg(self, *args, **kwargs):
+# Ensure programs don't send data unencrypted if they try to
+# use this method.
+raise NotImplementedError("sendmsg not allowed on instances of %s" %
+  self.__class__)
+
 def sendall(self, data, flags=0):
 self._checkClosed()
 if self._sslobj:
@@ -413,6 +419,14 @@ class SSLSocket(socket):
 else:
 return socket.recvfrom_into(self, buffer, nbytes, flags)
 
+def recvmsg(self, *args, **kwargs):
+raise NotImplementedError("recvmsg not allowed on instances of %s" %
+  self.__class__)
+
+def recvmsg_into(self, *args, **kwargs):
+raise NotImplementedError("recvmsg_into not allowed on instances of "
+  "%s" % self.__class__)
+
 def pending(self):
 self._checkClosed()
 if self._sslobj:
diff --git a/Lib/test/test_ssl.py b/Lib/test/test_ssl.py
--- a/Lib/test/test_ssl.py
+++ b/Lib/test/test_ssl.py
@@ -1651,6 +1651,11 @@ else:
 # consume data
 s.read()
 
+self.assertRaises(NotImplementedError, s.sendmsg, [b"data"])
+self.assertRaises(NotImplementedError, s.recvmsg, 100)
+self.assertRaises(NotImplementedError,
+  s.recvmsg_into, bytearray(100))
+
 s.write(b"over\n")
 s.close()
 finally:
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9569] Add tests for posix.mknod() and posix.mkfifo()

2010-08-11 Thread David Watson

New submission from David Watson :

Attaching simple tests for these functions, which aren't currently tested.

--
components: Extension Modules
files: test-mknod-mkfifo-3.x.diff
keywords: patch
messages: 113609
nosy: baikie
priority: normal
severity: normal
status: open
title: Add tests for posix.mknod() and posix.mkfifo()
type: feature request
Added file: http://bugs.python.org/file18478/test-mknod-mkfifo-3.x.diff

___
Python tracker 
<http://bugs.python.org/issue9569>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9569] Add tests for posix.mknod() and posix.mkfifo()

2010-08-11 Thread David Watson

Changes by David Watson :


Added file: http://bugs.python.org/file18479/test-mknod-mkfifo-2.x.diff

___
Python tracker 
<http://bugs.python.org/issue9569>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9570] PEP 383: os.mknod() and os.mkfifo() do not accept surrogateescape arguments

2010-08-11 Thread David Watson

New submission from David Watson :

These functions still use the "s" format for their arguments; the attached 
patch fixes them to use PyUnicode_FSConverter() in 3.2.  Some simple tests for 
these functions (not for PEP 383 behaviour) are at issue #9569.

--
components: Extension Modules
files: mknod-mkfifo-pep383-3.2.diff
keywords: patch
messages: 113611
nosy: baikie
priority: normal
severity: normal
status: open
title: PEP 383: os.mknod() and os.mkfifo() do not accept surrogateescape 
arguments
type: behavior
versions: Python 3.1, Python 3.2
Added file: http://bugs.python.org/file18480/mknod-mkfifo-pep383-3.2.diff

___
Python tracker 
<http://bugs.python.org/issue9570>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9579] In 3.x, os.confstr() returns garbage if value is longer than 255 bytes

2010-08-12 Thread David Watson

New submission from David Watson :

It may be hard to find a configuration string this long, but you
can see the problem if you apply the attached
confstr-reduce-bufsize.diff to reduce the size of the local array
buffer that posix_confstr() uses.  With it applied:

>>> import os
>>> print(ascii(os.confstr("CS_PATH")))
'\x00\ucbcb\ucbcb\ucbcb\ucbcb\ucbcb\ucbcb\ucbcb\ucbcb\ucbcb\ucbcb\ucbcb\ucbcb'

The problem arises because the code first tries to receive the
configuration string into the local buffer (char buffer[256],
reduced to char buffer[1] above), but then tries to receive it
directly into a string object if it doesn't fit.  You can see
what's gone wrong by comparing the working code in 2.x:

if ((unsigned int)len >= sizeof(buffer)) {
result = PyString_FromStringAndSize(NULL, len-1);
if (result != NULL)
confstr(name, PyString_AS_STRING(result), len);
}
else
result = PyString_FromStringAndSize(buffer, len-1);

with the code in 3.x:

if ((unsigned int)len >= sizeof(buffer)) {
result = PyUnicode_FromStringAndSize(NULL, len-1);
if (result != NULL)
confstr(name, _PyUnicode_AsString(result), len);
}
else
result = PyUnicode_FromStringAndSize(buffer, len-1);

Namely, that in 3.x it tries to receive the string into the bytes
object returned by _PyUnicode_AsString(), not the str object it
has just allocated (which has the wrong format anyway -
Py_UNICODE as opposed to char).

The attached confstr-long-result.diff fixes this by allocating a
separate buffer when necessary to receive the result, before
creating the string object from it.  By putting the confstr()
call and allocation in a loop, it also handles the possibility
that the value's length might change between calls.

--
components: Extension Modules
files: confstr-reduce-bufsize.diff
keywords: patch
messages: 113699
nosy: baikie
priority: normal
severity: normal
status: open
title: In 3.x, os.confstr() returns garbage if value is longer than 255 bytes
type: behavior
versions: Python 3.1, Python 3.2
Added file: http://bugs.python.org/file18486/confstr-reduce-bufsize.diff

___
Python tracker 
<http://bugs.python.org/issue9579>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9579] In 3.x, os.confstr() returns garbage if value is longer than 255 bytes

2010-08-12 Thread David Watson

Changes by David Watson :


Added file: http://bugs.python.org/file18487/confstr-long-result.diff

___
Python tracker 
<http://bugs.python.org/issue9579>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9580] os.confstr() doesn't decode result according to PEP 383

2010-08-12 Thread David Watson

New submission from David Watson :

The attached patch applies on top of the patch from issue #9579 to
make it use PyUnicode_DecodeFSDefaultAndSize().  (You could use
it in the existing code, but until that issue is fixed, there is
sometimes nothing to decode!)

--
components: Extension Modules
files: confstr-pep383.diff
keywords: patch
messages: 113700
nosy: baikie
priority: normal
severity: normal
status: open
title: os.confstr() doesn't decode result according to PEP 383
type: behavior
versions: Python 3.1, Python 3.2
Added file: http://bugs.python.org/file18488/confstr-pep383.diff

___
Python tracker 
<http://bugs.python.org/issue9580>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9579] In 3.x, os.confstr() returns garbage if value is longer than 255 bytes

2010-08-12 Thread David Watson

David Watson  added the comment:

The returned string should also be decoded with the file system
encoding and surrogateescape error handler, as per PEP 383 -
there's a patch at issue #9580 to do this.

--

___
Python tracker 
<http://bugs.python.org/issue9579>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9569] Add tests for posix.mknod() and posix.mkfifo()

2010-08-12 Thread David Watson

David Watson  added the comment:

I'm not quite sure what you mean, but the man page for FreeBSD 5.3 specifies 
EPERM for an unprivileged user and EINVAL for an attempt to create something 
other than a device node.  POSIX requires creating a FIFO to work for any user, 
and just says that EINVAL is for an "invalid argument".

--

___
Python tracker 
<http://bugs.python.org/issue9569>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9569] Add tests for posix.mknod() and posix.mkfifo()

2010-08-12 Thread David Watson

David Watson  added the comment:

OK, these patches work on FreeBSD 5.3 (root and non-root) if you want to check 
the errno.  I don't know what other systems might return though.  I did also 
find that the 2.x tests were failing on cleanup because the test class used 
os.unlink rather than support.unlink (which ignores missing files) as its 3.x 
counterpart does, so I've updated the patch to change that as well.

--
Added file: http://bugs.python.org/file18489/test-mknod-mkfifo-2.x-2.diff

___
Python tracker 
<http://bugs.python.org/issue9569>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9569] Add tests for posix.mknod() and posix.mkfifo()

2010-08-12 Thread David Watson

Changes by David Watson :


Added file: http://bugs.python.org/file18490/add-errno-check-2.x.diff

___
Python tracker 
<http://bugs.python.org/issue9569>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9569] Add tests for posix.mknod() and posix.mkfifo()

2010-08-12 Thread David Watson

Changes by David Watson :


Added file: http://bugs.python.org/file18491/add-errno-check-3.x.diff

___
Python tracker 
<http://bugs.python.org/issue9569>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9580] os.confstr() doesn't decode result according to PEP 383

2010-08-13 Thread David Watson

David Watson  added the comment:

The CS_PATH variable is a colon-separated list of directories ("the value for 
the PATH environment variable that finds all standard utilities"), so the file 
system encoding is certainly correct there.

I don't see any reference to an encoding in the POSIX spec for confstr().

--

___
Python tracker 
<http://bugs.python.org/issue9580>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9579] In 3.x, os.confstr() returns garbage if value is longer than 255 bytes

2010-08-13 Thread David Watson

David Watson  added the comment:

I don't see why confstr() values shouldn't change.  sysconf() values can change 
between calls, IIRC.  Implementations can also define their own confstr 
variables - they don't have to stick to the POSIX stuff.

And using a loop means the confstr() call only appears once in the source, 
which is more elegant, right? :)

--

___
Python tracker 
<http://bugs.python.org/issue9579>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9603] os.ttyname() and os.ctermid() don't decode result according to PEP 383

2010-08-14 Thread David Watson

New submission from David Watson :

These functions each return the path to a terminal, so they should use 
PyUnicode_DecodeFSDefault().  Patch attached.

--
components: Extension Modules
files: ttyname-ctermid-pep383.diff
keywords: patch
messages: 113920
nosy: baikie
priority: normal
severity: normal
status: open
title: os.ttyname() and os.ctermid() don't decode result according to PEP 383
type: behavior
versions: Python 3.1, Python 3.2
Added file: http://bugs.python.org/file18529/ttyname-ctermid-pep383.diff

___
Python tracker 
<http://bugs.python.org/issue9603>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9604] os.initgroups() doesn't accept PEP 383 usernames returned by pwd module

2010-08-14 Thread David Watson

New submission from David Watson :

The pwd module decodes usernames using PyUnicode_DecodeFSDefault(), so 
initgroups() should use PyUnicode_FSConverter() for the username.  Patch 
attached.

--
components: Extension Modules
files: initgroups-pep383.diff
keywords: patch
messages: 113921
nosy: baikie
priority: normal
severity: normal
status: open
title: os.initgroups() doesn't accept PEP 383 usernames returned by pwd module
type: behavior
versions: Python 3.2
Added file: http://bugs.python.org/file18530/initgroups-pep383.diff

___
Python tracker 
<http://bugs.python.org/issue9604>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9605] os.getlogin() should use PEP 383 decoding to match the pwd module

2010-08-14 Thread David Watson

New submission from David Watson :

The pwd module decodes usernames with PyUnicode_DecodeFSDefault(), and the 
LOGNAME environment variable (suggested as an alternative to getlogin()) is 
decoded the same way.  Attaching a patch to use PyUnicode_DecodeFSDefault() in 
getlogin().

--
components: Extension Modules
files: getlogin-pep383.diff
keywords: patch
messages: 113922
nosy: baikie
priority: normal
severity: normal
status: open
title: os.getlogin() should use PEP 383 decoding to match the pwd module
type: behavior
versions: Python 3.1, Python 3.2
Added file: http://bugs.python.org/file18531/getlogin-pep383.diff

___
Python tracker 
<http://bugs.python.org/issue9605>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9580] os.confstr() doesn't decode result according to PEP 383

2010-08-14 Thread David Watson

David Watson  added the comment:

> CS_PATH is hardcoded to "/bin:/usr/bin" in the GNU libc for UNIX. Do you know 
> another key for which the value can be controled by the user (or the system 
> administrator)?

No, not a specific example, but CS_PATH could conceivably refer
to some POSIX compatibility suite that's been installed in a
non-ASCII location, and implementations can add their own
variables for whatever they want.

> CS_PATH is just an example, there are other keys. I'm not sure that all 
> values 
> are encoded to the filesystem encodings, it might be another encoding?
> 
> Well, if we really doesn't know the encoding, a solution is to use a bytes 
> API 
> (which may avoid the question of the usage of the PEP 383).

The other variables defined by POSIX refer to environment
variables and command-line options for the C compiler and the
getconf utility, all of which would use the FS encoding in
Python, but I agree there's no way to know the appropriate
encoding in general, or even whether anything cares about
encodings.

Personally, I have no objections to making it return bytes.

--

___
Python tracker 
<http://bugs.python.org/issue9580>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9579] In 3.x, os.confstr() returns garbage if value is longer than 255 bytes

2010-08-14 Thread David Watson

David Watson  added the comment:

> I just fear that the loop is "endless". Imagine the worst case: confstr() 
> returns a counter (n, n+1, n+2, ...). In 64 bits, it can be long.

The returned length is supposed to be determined by the length of
the variable, not the length of the buffer passed by the caller,
so I don't see why the OS would have a bug like that, and it
would probably be exposed by the test suite anyway (there's
currently a simple test using CS_PATH).

> I would prefer to see a condition to stop after 2 steps. It should maybe stop 
> when an error at the 3rd step.

That is, raise an exception?  Yeah, possibly, but I think it's
better to just believe what the OS tells you rather than have an
exception that's only raised once in a blue moon for something
that may just be a low-probability event, and not an error.

--

___
Python tracker 
<http://bugs.python.org/issue9579>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9644] PEP 383: os.statvfs() does not accept surrogateescape arguments

2010-08-19 Thread David Watson

New submission from David Watson :

The statvfs() function still converts its argument with the "s"
format; the attached patch (for 3.2) fixes it to use
PyUnicode_FSConverter().

--
components: Extension Modules
files: statvfs-pep383-3.2.diff
keywords: patch
messages: 114392
nosy: baikie
priority: normal
severity: normal
status: open
title: PEP 383: os.statvfs() does not accept surrogateescape arguments
type: behavior
versions: Python 3.1, Python 3.2
Added file: http://bugs.python.org/file18578/statvfs-pep383-3.2.diff

___
Python tracker 
<http://bugs.python.org/issue9644>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9645] PEP 383: os.pathconf() does not accept surrogateescape arguments

2010-08-19 Thread David Watson

New submission from David Watson :

The pathconf() function still converts its argument with the "s"
format; the attached pathconf-pep383-3.2.diff fixes it to use
PyUnicode_FSConverter() (in 3.2).  Also attaching
pathconf-cleanup.diff to clean up the indentation, which
otherwise makes the code rather confusing to look at.

--
components: Extension Modules
files: pathconf-pep383-3.2.diff
keywords: patch
messages: 114393
nosy: baikie
priority: normal
severity: normal
status: open
title: PEP 383: os.pathconf() does not accept surrogateescape arguments
type: behavior
versions: Python 3.1, Python 3.2
Added file: http://bugs.python.org/file18579/pathconf-pep383-3.2.diff

___
Python tracker 
<http://bugs.python.org/issue9645>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9645] PEP 383: os.pathconf() does not accept surrogateescape arguments

2010-08-19 Thread David Watson

Changes by David Watson :


Added file: http://bugs.python.org/file18580/pathconf-cleanup.diff

___
Python tracker 
<http://bugs.python.org/issue9645>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9647] os.confstr() does not handle value changing length between calls

2010-08-19 Thread David Watson

New submission from David Watson :

This came up in relation to issue #9579; there is some discussion
of it there.  Basically, if os.confstr() has to call confstr()
twice because the buffer wasn't big enough the first time, the
existing code assumes the string is the same length that the OS
reported in the first call instead of using the length from the
second call and resizing the buffer if necessary.  This means the
returned value will be truncated or contain trailing garbage if
the string changed its length betweeen calls.

I don't know of an actual environment where configuration strings
can change at runtime, but it's not forbidden by POSIX as far as
I can see (the strings are described as "variables", after all,
and sysconf() values such as CHILD_MAX can change at runtime).
Implementations can also provide additional confstr() variables
not specified by POSIX.

The patch confstr-long-result.diff at issue #9579 would fix this
(for 3.x), but Victor Stinner has expressed concern that a buggy
confstr() could create a near-infinite loop with that patch
applied.

--
components: Extension Modules
messages: 114396
nosy: baikie
priority: normal
severity: normal
status: open
title: os.confstr() does not handle value changing length between calls
type: behavior
versions: Python 2.6, Python 2.7, Python 3.1, Python 3.2

___
Python tracker 
<http://bugs.python.org/issue9647>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9579] In 3.x, os.confstr() returns garbage if value is longer than 255 bytes

2010-08-19 Thread David Watson

David Watson  added the comment:

I've opened a separate issue for the changing-length problem
(issue #9647; it affects 2.x as well).  Here is a patch that
fixes the 255-byte issue only, and has similar results to the 2.x
code if the value changes length between calls (except that it
could raise a UnicodeError if the string is truncated inside a
multibyte character encoding).

--
Added file: http://bugs.python.org/file18581/confstr-minimal.diff

___
Python tracker 
<http://bugs.python.org/issue9579>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9580] os.confstr() doesn't decode result according to PEP 383

2010-08-19 Thread David Watson

David Watson  added the comment:

I wrote this patch to make confstr() return bytes (with code
similar to 2.x), and document the change in "Porting to Python
3.2" and elsewhere, but it then occurred to me that you might
have been talking about making a separate bytes API like
os.environb.  Which did you have in mind?

There is another option for a str API, which is to decode the
value as ASCII with the surrogateescape error handler.  The
returned string will then round-trip correctly through
PyUnicode_FSConverter(), etc., as long as the file system
encoding is compatible with ASCII, which PEP 383 requires it to
be.  This is how undecodable command line arguments are currently
handled when mbrtowc() is unavailable.

--
Added file: http://bugs.python.org/file18582/confstr-bytes-3.2.diff

___
Python tracker 
<http://bugs.python.org/issue9580>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9660] PEP 383: socket module doesn't handle undecodable protocol or service names

2010-08-22 Thread David Watson

New submission from David Watson :

The protocol and service/port number databases are typically
implemented as text files on Unix and can contain non-ASCII names
in any encoding (presumably for local services), but the socket
module tries to decode them as strict UTF-8.  In particular,
getservbyport() and getnameinfo() will raise UnicodeError when
this fails.

Attached is a patch for 3.2 to use the file system encoding and
surrogateescape handler instead, in line with PEP 383.  This is
what Python already does for the passwd and group databases, and
it will allow protocol and service names to be given correctly as
command line arguments.

--
components: Extension Modules
files: proto-service-pep383-3.2.diff
keywords: patch
messages: 114687
nosy: baikie
priority: normal
severity: normal
status: open
title: PEP 383: socket module doesn't handle undecodable protocol or service 
names
type: behavior
versions: Python 3.1, Python 3.2
Added file: http://bugs.python.org/file18608/proto-service-pep383-3.2.diff

___
Python tracker 
<http://bugs.python.org/issue9660>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-08-22 Thread David Watson

David Watson  added the comment:

I noticed that try-surrogateescape-first.diff missed out one of
the string references that needed to be changed to point to the
bytes object, and also used PyBytes_AS_STRING() in an unlocked
section.  This version fixes these things by taking the generally
safer approach of setting the original char * variable to the
hostname immediately after using hostname_converter().

--
Added file: http://bugs.python.org/file18609/try-surrogateescape-first-3.diff

___
Python tracker 
<http://bugs.python.org/issue9377>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1027206] unicode DNS names in socket, urllib, urlopen

2010-08-22 Thread David Watson

David Watson  added the comment:

Updated the socket module patch to include gethostbyaddr() - it
happens to accept hostnames and is used this way in the standard
library.

--
Added file: http://bugs.python.org/file18610/socket-idna.diff

___
Python tracker 
<http://bugs.python.org/issue1027206>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9660] PEP 383: socket module doesn't handle undecodable protocol or service names

2010-08-23 Thread David Watson

David Watson  added the comment:

Come to think of it, I'm not sure if the patch is correct for
Windows, as PyUnicode_DecodeFSDefault() appears to do strict MBCS
decoding by default (similarly with PyUnicode_FSConverter() for
encoding).  Can Windows return service names that won't decode
with MBCS?  Or does it use a different encoding?  I don't have a
system to experiment with.

--

___
Python tracker 
<http://bugs.python.org/issue9660>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1027206] unicode DNS names in socket, urllib, urlopen

2010-08-23 Thread David Watson

David Watson  added the comment:

> Thanks for the patch. Committed as r84261.
> 
> I'm not sure what the point is of supporting IDNA in getnameinfo, so I have 
> removed that from the patch. If you think it's needed, please elaborate.

I don't see the point of it either, but if it's not supposed to
accept hostnames, it should use AI_NUMERICHOST in the call it
makes to getaddrinfo().  As it is, it does both forward and
reverse lookups when called with a hostname.

Attaching a patch to use AI_NUMERICHOST.

Also, this issue # isn't really resolved yet as Python does not
support IRIs (AFAIK).

--
Added file: http://bugs.python.org/file18615/getnameinfo-numerichost.diff

___
Python tracker 
<http://bugs.python.org/issue1027206>
___diff --git a/Modules/socketmodule.c b/Modules/socketmodule.c
--- a/Modules/socketmodule.c
+++ b/Modules/socketmodule.c
@@ -3969,6 +3969,7 @@ socket_getnameinfo(PyObject *self, PyObj
 memset(&hints, 0, sizeof(hints));
 hints.ai_family = AF_UNSPEC;
 hints.ai_socktype = SOCK_DGRAM; /* make numeric port happy */
+hints.ai_flags = AI_NUMERICHOST;/* don't do any name resolution */
 Py_BEGIN_ALLOW_THREADS
 ACQUIRE_GETADDRINFO_LOCK
 error = getaddrinfo(hostp, pbuf, &hints, &res);
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-08-23 Thread David Watson

David Watson  added the comment:

> Is this patch in response to an actual problem, or a theoretical problem?
> If "actual problem": what was the specific application, and what was the 
> specific host name?

It's about environments, not applications - the local network may
be configured with non-ASCII bytes in hostnames (either in the
local DNS *or* a different lookup mechanism - I mentioned
/etc/hosts as a simple example), or someone might deliberately
connect from a garbage hostname as a denial of service attack
against a server which tries to look it up with gethostbyaddr()
or whatever (this may require a "non-strict" resolver library, as
noted above).

> If theoretical, I recommend to close it as "won't fix". I find it perfectly 
> reasonable if Python's socket module gives an error if the hostname can't be 
> clearly decoded. Applications that run into it as a result of gethostbyaddr 
> should treat that as "no reverse name available".

There are two points here.  One is that the decoding can fail; I
do think that programmers will find this surprising, and the fact
that Python refuses to return what was actually received is a
regression compared to 2.x.

The other is that the encoding and decoding are not symmetric -
hostnames are being decoded with UTF-8 but encoded with IDNA.
That means that when a decoded hostname contains a non-ASCII
character which is not prohibited by IDNA/Nameprep, that string
will, when used in a subsequent call, not refer to the hostname
that was actually received, because it will be re-encoded using a
different codec.

Attaching a refreshed version of try-surrogateescape-first.diff.
I've separated out the change to getnameinfo() as it may be
superfluous (issue #1027206).

--
Added file: http://bugs.python.org/file18616/try-surrogateescape-first-4.diff
Added file: 
http://bugs.python.org/file18617/try-surrogateescape-first-getnameinfo-4.diff

___
Python tracker 
<http://bugs.python.org/issue9377>
___Accept ASCII/surrogateescape strings as hostname arguments.

diff --git a/Doc/library/socket.rst b/Doc/library/socket.rst
--- a/Doc/library/socket.rst
+++ b/Doc/library/socket.rst
@@ -49,6 +49,28 @@ supported. The address format required b
 automatically selected based on the address family specified when the socket
 object was created.
 
+When a hostname is returned by a system interface, it is decoded into
+a string using the ``'ascii'`` codec and the ``'surrogateescape'``
+error handler; this leaves ASCII bytes as ASCII, including IDNA
+ASCII-compatible encodings (see :mod:`encodings.idna`), but converts
+any non-ASCII bytes to the Unicode lone surrogate codes
+U+DC80...U+DCFF.
+
+Hostname arguments can be passed as strings or :class:`bytes` objects.
+The latter are passed to the system unchanged, while strings are
+encoded as follows: if a string contains only ASCII characters and/or
+the Unicode lone surrogate codes U+DC80...U+DCFF, it is encoded using
+the ``'ascii'`` codec and the ``'surrogateescape'`` error handler;
+otherwise it is converted to IDNA ASCII-compatible form using the
+``'idna'`` codec, and if this is not possible, :exc:`UnicodeError` is
+raised.
+
+.. versionchanged:: XXX
+   Previously, hostnames were decoded as UTF-8 and encoded using IDNA
+   or UTF-8; ``surrogateescape`` was not used; some interfaces
+   formerly accepted :class:`bytearray` objects, or did not accept
+   :class:`bytes` objects.
+
 For IPv4 addresses, two special forms are accepted instead of a host address:
 the empty string represents :const:`INADDR_ANY`, and the string
 ``''`` represents :const:`INADDR_BROADCAST`. The behavior is not
diff --git a/Lib/test/test_socket.py b/Lib/test/test_socket.py
--- a/Lib/test/test_socket.py
+++ b/Lib/test/test_socket.py
@@ -322,6 +322,51 @@ class GeneralModuleTests(unittest.TestCa
 except socket.error:
 pass
 
+def tryHostnameArgs(self, function, notfounderror):
+# Call the given one-argument function with various valid and
+# invalid representations of nonexistent hostnames.  Check
+# that it raises notfounderror for valid representations, and
+# UnicodeError for invalid ones.
+
+# An RFC 1123-compliant host name (".invalid" TLD is reserved
+# under RFC 2606).
+self.assertRaises(notfounderror, function, "host.domain.invalid")
+# Previous name as a bytes object.
+self.assertRaises(notfounderror, function, b"host.domain.invalid")
+# A domain name with a non-ASCII octet, as bytes.
+self.assertRaises(notfounderror, function, b"\xff.domain.invalid")
+# Previous domain name as ASCII/surrogateescape string representation.
+self.assertRaises(notfounderror, f

[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-08-24 Thread David Watson

David Watson  added the comment:

> > It's about environments, not applications
> 
> Still, my question remains. Is it a theoretical problem (i.e. one
> of your imagination), or a real one (i.e. one you observed in real
> life, without explicitly triggering it)? If real: what was the
> specific environment, and what was the specific host name?

Yes, I did reproduce the problem on my own system (Ubuntu 8.04).
No, it is not from a real application, nor do I know anyone with
their network configured like this (except possibly Dan "djbdns"
Bernstein: http://cr.yp.to/djbdns/idn.html ).

I reported this bug to save anyone who *is* in such an
environment from crashing applications and erroneous name
resolution.

> > That means that when a decoded hostname contains a non-ASCII
> > character which is not prohibited by IDNA/Nameprep, that string
> > will, when used in a subsequent call, not refer to the hostname
> > that was actually received, because it will be re-encoded using a
> > different codec.
> 
> Again, I fail to see the problem in this. It won't happen in
> real life. However, if you worried that this could be abused,
> I think it should decode host names as ASCII, not as UTF-8.
> Then it will be symmetric again (IIUC).

That would be an improvement.  The idea of the patches I posted
is to combine this with the existing surrogateescape mechanism,
which handles situations like this perfectly well.  I don't see
how getting a UnicodeError is better than getting a string with
some lone surrogates in it.  In fact, it was my understanding of
PEP 383 that it is in fact better to get the lone surrogates.

--
title: socket,  PEP 383: Mishandling of non-ASCII bytes in host/domain names -> 
socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

___
Python tracker 
<http://bugs.python.org/issue9377>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1512163] mailbox (2.5b1): locking doesn't work (esp. on FreeBSD)

2010-08-26 Thread David Watson

David Watson  added the comment:

> Is this still an issue on later versions of Python and/or FreeBSD?

Yes, there is still an issue.  There is no longer a deadlock on
FreeBSD because the module been changed to use only lockf() and
dot-locking (on all platforms), but the issue is now about how
users can enable other locking mechanisms that they need, such as
flock(), without causing a deadlock on platforms where they refer
to the same lock as lockf().

They can't just override the classes' .lock() and .unlock()
methods, because some parts of the code perform locking
operations directly without calling those methods.

--

___
Python tracker 
<http://bugs.python.org/issue1512163>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-08-26 Thread David Watson

David Watson  added the comment:

> The surrogateescape mechanism is a very hackish approach, and
> violates the principle that errors should never pass silently.

I don't see how a name resolution API returning non-ASCII bytes
would indicate an error.  If the host table contains a non-ASCII
byte sequence for a host, then that is the host's name - it works
just as well as an ASCII name, both forwards and backwards.

What is hackish is representing char * data as a Unicode string
when there is no native Unicode API to feed it to - there is no
issue here such as file names being bytes on Unix and Unicode on
Windows, so the clean thing to do would be to return a bytes
object.  I suggested the surrogateescape mechanism in order to
retain backwards compatibility.

> However, it solves a real problem - people do run into the problem
> with file names every day. With this problem, I'd say "if it hurts,
> don't do it, then".

But to be more explicit, that's like saying "if it hurts, get
your sysadmin to reconfigure the company network".

--
title: socket,  PEP 383: Mishandling of non-ASCII bytes in host/domain names -> 
socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

___
Python tracker 
<http://bugs.python.org/issue9377>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-08-27 Thread David Watson

David Watson  added the comment:

> > I don't see how a name resolution API returning non-ASCII bytes
> > would indicate an error.
> 
> It's in violation of RFC 952 (slightly relaxed by RFC 1123).

That's bad if it's on the public Internet, but it's not an
error.  The OS is returning the name by which it knows the host.

If you look at POSIX, you'll see that what getaddrinfo() and
getnameinfo() look up and return is referred to as a "node name",
which can be an address string or a "descriptive name", and that
if used with Internet address families, descriptive names
"include" host names.  It doesn't say that the string can only be
an address string or a hostname (RFC 1123 compliant or
otherwise).

> > But to be more explicit, that's like saying "if it hurts, get
> > your sysadmin to reconfigure the company network".
> 
> Which I consider perfectly reasonable. The sysadmin should have
> known (and, in practice, *always* knows) not to do that in the first
> place (the larger the company, the more cautious the sysadmin).

It's not reasonable when addressed to a customer who might go
elsewhere.  And I still don't see a technical reason for making
such a demand.  Python 2.x seems to work just fine using 8-bit
strings.

--
title: socket,  PEP 383: Mishandling of non-ASCII bytes in host/domain names -> 
socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

___
Python tracker 
<http://bugs.python.org/issue9377>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-08-29 Thread David Watson

David Watson  added the comment:

OK, I still think this issue should be addressed, but here is a patch for the 
part we agree on: that decoding should not return any Unicode characters except 
ASCII.

--
Added file: http://bugs.python.org/file18674/decode-strict-ascii.diff

___
Python tracker 
<http://bugs.python.org/issue9377>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-08-29 Thread David Watson

David Watson  added the comment:

The rest of the issue could also be straightforwardly addressed by adding bytes 
versions of the name lookup APIs.  Attaching a patch which does that (applies 
on top of decode-strict-ascii.diff).

--
Added file: http://bugs.python.org/file18675/hostname-bytes-apis.diff

___
Python tracker 
<http://bugs.python.org/issue9377>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-08-29 Thread David Watson

Changes by David Watson :


Removed file: http://bugs.python.org/file18675/hostname-bytes-apis.diff

___
Python tracker 
<http://bugs.python.org/issue9377>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-08-29 Thread David Watson

David Watson  added the comment:

Oops, forgot to refresh the last change into that patch.  This should fix it.

--
Added file: http://bugs.python.org/file18676/hostname-bytes-apis.diff

___
Python tracker 
<http://bugs.python.org/issue9377>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9777] test_socket.GeneralModuleTests.test_idna should require the "network" resource

2010-09-04 Thread David Watson

New submission from David Watson :

This test requires network access as it tries to resolve a domain name at 
python.org.  Patch attached.

--
components: Tests
files: idna-test-resource.diff
keywords: patch
messages: 115593
nosy: baikie
priority: normal
severity: normal
status: open
title: test_socket.GeneralModuleTests.test_idna should require the "network" 
resource
type: behavior
versions: Python 3.2
Added file: http://bugs.python.org/file18751/idna-test-resource.diff

___
Python tracker 
<http://bugs.python.org/issue9777>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8372] socket: Buffer overrun while reading unterminated AF_UNIX addresses

2010-09-05 Thread David Watson

David Watson  added the comment:

> > baikie: why did the test pass for you?
> 
> The test passes (I assume) if linux-pass-unterminated.diff is applied. The 
> latter patch is only meant to exhibit the issue, though, not to be checked in.

No, I meant for linux-pass-unterminated.diff to be checked in so
that applications could always send datagrams back to the address
they got them from, even when it was 108 bytes long.  As it is
run only on Linux, testMaxPathLen does not leave space for a null
terminator because Linux just ignores it (that is what makes it
possible to bind to a 108-byte address and thus trigger the bug).

--
title: socket: Buffer overrun while reading unterminated AF_UNIX
addresses -> socket: Buffer overrun while reading unterminated AF_UNIX addresses

___
Python tracker 
<http://bugs.python.org/issue8372>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8372] socket: Buffer overrun while reading unterminated AF_UNIX addresses

2010-09-06 Thread David Watson

David Watson  added the comment:

> baikie, coming back to your original message: what precisely makes you 
> believe that sun_path does not need to be null-terminated on Linux?

That's the way I demonstrated the bug - the only way to bind to a
108-byte path is to pass it without null termination, because
Linux will not accept an oversized sockaddr_un structure (e.g. a
108-byte path plus null terminator).  Also, unless it's on OS/2,
Python's existing code never includes the null terminator in the
address size it passes to the system call, so a correctly-
behaving OS should never see it.

However, it does now occur to me that a replacement libc
implementation for Linux could try to do something with sun_path
during the call and assume it's null-terminated even though the
kernel doesn't, so it may be best to keep the null termination
requirement after all.  In that case, there would be no way to
test for the bug from within Python, so the test problems would
be somewhat moot (although the test code could still be used by
changing UNIX_PATH_MAX from 108 to 107).

Attaching four-space-indent versions of the original patches (for
2.x and 3.x), and tests incorporating Antoine's use of
assertRaisesRegexp.

--
Added file: http://bugs.python.org/file18770/linux-pass-unterminated-4spc.diff
Added file: http://bugs.python.org/file18771/return-unterminated-2.x-4spc.diff
Added file: http://bugs.python.org/file18772/return-unterminated-3.x-4spc.diff
Added file: http://bugs.python.org/file18773/addrlen-2.x-4spc.diff
Added file: http://bugs.python.org/file18774/addrlen-3.x-4spc.diff
Added file: http://bugs.python.org/file18775/test-2.x-new.diff
Added file: http://bugs.python.org/file18776/test-3.x-new.diff

___
Python tracker 
<http://bugs.python.org/issue8372>
___Allow AF_UNIX pathnames up to the maximum 108 bytes on Linux,
since it does not require sun_path to be null terminated.

diff --git a/Modules/socketmodule.c b/Modules/socketmodule.c
--- a/Modules/socketmodule.c
+++ b/Modules/socketmodule.c
@@ -1187,27 +1187,16 @@ getsockaddrarg(PySocketSockObject *s, Py
 
 addr = (struct sockaddr_un*)addr_ret;
 #ifdef linux
-if (len > 0 && path[0] == 0) {
-/* Linux abstract namespace extension */
-if (len > sizeof addr->sun_path) {
-PyErr_SetString(socket_error,
-"AF_UNIX path too long");
-return 0;
-}
-}
-else
-#endif /* linux */
-{
-/* regular NULL-terminated string */
-if (len >= sizeof addr->sun_path) {
-PyErr_SetString(socket_error,
-"AF_UNIX path too long");
-return 0;
-}
-addr->sun_path[len] = 0;
+if (len > sizeof(addr->sun_path)) {
+#else
+if (len >= sizeof(addr->sun_path)) {
+#endif
+PyErr_SetString(socket_error, "AF_UNIX path too long");
+return 0;
 }
 addr->sun_family = s->sock_family;
 memcpy(addr->sun_path, path, len);
+memset(addr->sun_path + len, 0, sizeof(addr->sun_path) - len);
 #if defined(PYOS_OS2)
 *len_ret = sizeof(*addr);
 #else
When parsing sockaddr_un structures returned by accept(), etc.,
only examine bytes up to supplied addrlen and do not require null
termination.

diff --git a/Modules/socketmodule.c b/Modules/socketmodule.c
--- a/Modules/socketmodule.c
+++ b/Modules/socketmodule.c
@@ -998,19 +998,22 @@ makesockaddr(int sockfd, struct sockaddr
 #if defined(AF_UNIX)
 case AF_UNIX:
 {
+Py_ssize_t len, splen;
 struct sockaddr_un *a = (struct sockaddr_un *) addr;
+splen = addrlen - offsetof(struct sockaddr_un, sun_path);
 #ifdef linux
-if (a->sun_path[0] == 0) {  /* Linux abstract namespace */
-addrlen -= offsetof(struct sockaddr_un, sun_path);
-return PyString_FromStringAndSize(a->sun_path,
-  addrlen);
+if (splen > 0 && a->sun_path[0] == 0) {
+/* Linux abstract namespace */
+len = splen;
 }
 else
 #endif /* linux */
 {
-/* regular NULL-terminated string */
-return PyString_FromString(a->sun_path);
+/* String, up to null terminator if present */
+for (len = 0; len < splen && a->sun_path[len] != 0; len++)
+;
 }
+return PyString_FromStringAndSize(a->sun_path, len);
 }
 #endif /* AF_UNIX */
 
When parsing sockaddr_un structures returned by accept(), etc.,
only examine bytes up to supplied addrlen and do not require null
termination.

diff --git a/Modules/socketmodule.c b/Modules/socketmodu

[issue8373] socket: AF_UNIX socket paths not handled according to PEP 383

2010-09-11 Thread David Watson

David Watson  added the comment:

Updated the patches for Python 3.2 - these are now simpler as
they do not support bytearray arguments, as these are no longer
used for filenames (the existing code does not support bytearrays
either).

I've put the docs and tests in one patch, and made separate
patches for the code, one for if the linux-pass-unterminated
patch from issue #8372 is applied, and one for if it isn't.

One point I neglected to comment on before is the ability to
specify an address in the Linux abstract namespace as a
filesystem-encoded string prefixed with a null character.  This
may seem strange, but as well as simplifying the code, it does
support an actual use case, as on Linux systems the abstract
namespace is sometimes used to hold names based on real
filesystem paths such as "\x00/var/run/hald/dbus-XAbemUfDyQ", or
imaginary ones, such as "\x00/com/ubuntu/upstart".  In fact,
running "netstat" on my own system did not reveal any non-textual
abstract names in use (although they are of course allowed).

--
Added file: http://bugs.python.org/file18850/af_unix-pep383-docs-tests-3.2.diff

___
Python tracker 
<http://bugs.python.org/issue8373>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8373] socket: AF_UNIX socket paths not handled according to PEP 383

2010-09-11 Thread David Watson

Changes by David Watson :


Added file: 
http://bugs.python.org/file18851/af_unix-pep383-3.2-with-linux-unterminated.diff

___
Python tracker 
<http://bugs.python.org/issue8373>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8373] socket: AF_UNIX socket paths not handled according to PEP 383

2010-09-11 Thread David Watson

Changes by David Watson :


Added file: 
http://bugs.python.org/file18852/af_unix-pep383-3.2-without-linux-unterminated.diff

___
Python tracker 
<http://bugs.python.org/issue8373>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8372] socket: Buffer overrun while reading unterminated AF_UNIX addresses

2010-09-11 Thread David Watson

David Watson  added the comment:

I've updated the PEP 383 patches at issue #8373 with separate
versions for if the linux-pass-unterminated patch is applied or
not.

If it's not essential to have unit tests for the overrun issue,
I'd suggest applying just the return-unterminated and addrlen
patches and omitting linux-pass-unterminated, for now at least.
This will leave Linux no worse off than a typical BSD-derived
platform.

--

___
Python tracker 
<http://bugs.python.org/issue8372>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8373] socket: AF_UNIX socket paths not handled according to PEP 383

2010-09-11 Thread David Watson

David Watson  added the comment:

One of the tests got broken by the removal of sys.setfilesystemencoding().  
Replaced it.

--
Added file: 
http://bugs.python.org/file18853/af_unix-pep383-docs-tests-3.2-2.diff

___
Python tracker 
<http://bugs.python.org/issue8373>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8372] socket: Buffer overrun while reading unterminated AF_UNIX addresses

2010-09-12 Thread David Watson

David Watson  added the comment:

> With all the effort that went into the patch, I recommend to get it right: if 
> there is space for the \0, include it. If the string size is exactly 108, and 
> it's linux, write it unterminated. Else fail.
> 
> As for testing: we should then definitely have a test that, if you can create 
> an 108 byte unix socket that its socket name is what we said it should be.

The attached patches do those things, if I understand you
correctly (the test patches add such a test for Linux, and
linux-pass-unterminated uses memset() to zero out the area
between the end of the actual path and the end of the sun_path
array).

If you're talking about including the null in the address passed
to the system call, that does no harm on Linux, but I think the
more common practice is not to include it.  The FreeBSD SUN_LEN
macro, for instance, is provided to calculate the address length
and does not include the null.

--

___
Python tracker 
<http://bugs.python.org/issue8372>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8372] socket: Buffer overrun while reading unterminated AF_UNIX addresses

2010-09-12 Thread David Watson

David Watson  added the comment:

I meant to say that FreeBSD provides the SUN_LEN macro, but it
turns out that Linux does as well, and its version behaves the
same as FreeBSD's.  The FreeBSD man pages state that the
terminating null is not part of the address:

http://www.freebsd.org/cgi/man.cgi?query=unix&apropos=0&sektion=0&manpath=FreeBSD+8.1-RELEASE&format=html

The examples in Stevens/Rago's "Advanced Programming in the Unix
Environment" also pass address lengths to bind(), etc. that do
not include the null.

--

___
Python tracker 
<http://bugs.python.org/issue8372>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9647] os.confstr() does not handle value changing length between calls

2010-10-02 Thread David Watson

David Watson  added the comment:

> If I understood correctly, you don't want the value to be truncated if the 
> variable grows between the two calls to confstr(). Which behaviour would you 
> expect? A Python exception?

A return size larger than the buffer is *supposed* to indicate
that the current value is larger than the supplied buffer, so I
would just expect it to reallocate the buffer, call confstr()
again and return the new value, unless it was known that such a
situation indicated an actual problem.

In other words, I would not expect it to do anything special.  I
didn't write the original patch the way I did in order to fix
this (potential) bug - it just seemed like the most natural way
to write the code.

--

___
Python tracker 
<http://bugs.python.org/issue9647>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-10-13 Thread David Watson

David Watson  added the comment:

> platform.system() fails with UnicodeEncodeError on systems that have their 
> computer name set to a name containing non-ascii characters. The 
> implementation of platform.system() uses at some point socket.gethostname() ( 
> see http://www.pasteall.org/16215 for a stacktrace of such usage)

This trace is from a Windows system, where the platform module
uses gethostname() in its cross-platform uname() function, which
platform.system() and various other functions in the module rely
on.  On a Unix system, platform.uname() depends on os.uname()
working, meaning that these functions still fail when the
hostname cannot be decoded, as it is part of os.uname()'s return
value.

Given that os.uname() is a primary source of information about
the platform on Unix systems, this sort of collateral damage from
an undecodable hostname is likely to occur in more places.

> It would be more than great if this error could be fixed. If another 3.1 
> release is planned, preferrably for that.

If you'd like to try the surrogateescape patches, they ought to
fix this.  The relevant patches are ascii-surrogateescape-2.diff,
try-surrogateescape-first-4.diff and uname-surrogateescape.diff.

--
title: socket,  PEP 383: Mishandling of non-ASCII bytes in host/domain names -> 
socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

___
Python tracker 
<http://bugs.python.org/issue9377>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-10-15 Thread David Watson

David Watson  added the comment:

> As a further note: I think socket.gethostname() is a special case, since this 
> is just about a local setting (i.e. not related to DNS).

But the hostname *is* commonly intended to be looked up in the
DNS or whatever name resolution mechanisms are used locally -
socket.getfqdn(), for instance, works by looking up the result
using gethostbyaddr() (actually the C function getaddrinfo(),
followed by gethostbyaddr()).  So I don't see the rationale for
treating it differently from the results of gethostbyaddr(),
getnameinfo(), etc.

POSIX says of the name lookup functions that "in many cases" they
are implemented by the Domain Name System, not that they always
are, so a name intended for lookup need not be ASCII-only either.

> We should then assume that it is encoded in the locale encoding (in 
> particular, that it is encoded in mbcs on Windows).

I can see the point of returning the characters that were
intended, but code that looked up the returned name would then
have to be changed to re-encode it to bytes to avoid the
round-tripping issue when non-ASCII characters are returned.

--

___
Python tracker 
<http://bugs.python.org/issue9377>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-10-18 Thread David Watson

David Watson  added the comment:

> The result from gethostname likely comes out of machine-local
> configuration. It may have non-ASCII in it, which is then likely
> encoded in the local encoding. When looking it up in DNS, IDNA
> should be applied.

I would have thought that someone who intended a Unicode hostname
to be looked up in its IDNA form would have encoded it using
IDNA, rather than an 8-bit encoding - how many C programs would
transcode the name that way, rather than just passing the char *
from one interface to another?

In fact, I would think that non-ASCII bytes in a hostname most
probably indicated that a name resolution mechanism other than
the DNS was in use, and that the byte string should be passed
unaltered just as a typical C program would.

> OTOH, output from gethostbyaddr likely comes out of the DNS itself.
> Guessing what encoding it may have is futile - other than guessing
> that it really ought to be ASCII.

Sure, but that doesn't mean the result can't be made to
round-trip if it turns out not to be ASCII.  The guess that it
will be ASCII is, after all, still a guess (as is the guess that
it comes from the DNS).

> Python's socket module is clearly focused on the internet, and
> intends to support that well. So if you pass a non-ASCII
> string, it will have to encode that using IDNA. If that's
> not what you want to get, tough luck.

I don't object to that, but it does force a choice between
decoding an 8-bit name for display (e.g. by using the locale
encoding), and decoding it to round-trip automatically (e.g. by
using ASCII/surrogateescape, with support on the encoding side).

Using PyUnicode_DecodeFSDefault() for the hostname or other
returned names (thus decoding them for display) would make this
issue solvable with programmer intervention - for instance,
"socket.gethostbyaddr(socket.gethostname())" could be replaced by
"socket.gethostbyaddr(os.fsencode(socket.gethostname()))", but
programmers might well neglect to do this, given that no encoding
was needed in Python 2.

Also, even displaying a non-ASCII name decoded according to the
locale creates potential for confusion, as when the user types
the same characters into a Python program for lookup (again,
barring programmer intervention), they will not represent the
same byte sequence as the characters the user sees on the screen
(as they will instead represent their IDNA ASCII-compatible
equivalent).

So overall, I do think it is better to decode names for automatic
round-tripping rather than for display, but my main concern is
simply that it should be possible to recover the original bytes
so that round-tripping is at least possible.
PyUnicode_DecodeFSDefault() would accomplish that much at least.

--
title: socket,  PEP 383: Mishandling of non-ASCII bytes in host/domain names -> 
socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

___
Python tracker 
<http://bugs.python.org/issue9377>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-10-19 Thread David Watson

David Watson  added the comment:

> > In fact, I would think that non-ASCII bytes in a hostname most
> > probably indicated that a name resolution mechanism other than
> > the DNS was in use, and that the byte string should be passed
> > unaltered just as a typical C program would.
> 
> I'm not talking about byte strings, but character strings.

I mean that passing the str object from socket.gethostname() to
the Python lookup function ought to result in the same byte
string being passed to the C lookup function as was returned by
the C gethostname() function (or else that the programmer must
re-encode the str to ensure that that result is obtained).

> > I don't object to that, but it does force a choice between
> > decoding an 8-bit name for display (e.g. by using the locale
> > encoding), and decoding it to round-trip automatically (e.g. by
> > using ASCII/surrogateescape, with support on the encoding side).
> 
> In the face of ambiguity, refuse the temptation to guess.

Yes, I would interpret that to mean not using the locale encoding
for data obtained from the network.  That's another reason why
the ASCII/surrogateescape scheme appeals to me more.

> Well, Python is not C. In Python, you would pass a str, and
> expect it to work, which means it will get automatically encoded
> with IDNA.

I think there might be a misunderstanding here - I've never
proposed changing the interpretation of Unicode characters in
hostname arguments.  The ASCII/surrogateescape scheme I suggested
only changes the interpretation of unpaired surrogate codes, as
they do not occur in IDNs or any other genuine Unicode data; all
IDNs, including those solely consisting of ASCII characters,
would be encoded to the same byte sequence as before.

ASCII/surrogateescape decoding could also be used without support
on the encoding side - that would satisfy the requirement to
"refuse the temptation to guess", would allow the original bytes
to be recovered, and would mean that attempting to look up a
non-ASCII result in str form would raise an exception rather than
looking up the wrong name.

> Marc-Andre wants gethostname to use the Wide API on Windows, which,
> in theory, allows for cases where round-tripping to bytes is
> impossible.

Well, the name resolution APIs wrapped by Python are all
byte-oriented, so if the computer name were to have no bytes
equivalent then it wouldn't be possible to resolve it anyway, and
an exception rightly ought be raised at some point in the process
of trying to do so.

--
title: socket,  PEP 383: Mishandling of non-ASCII bytes in host/domain names -> 
socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

___
Python tracker 
<http://bugs.python.org/issue9377>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-10-20 Thread David Watson

David Watson  added the comment:

I was looking at the MSDN pages linked to above, and these two
pages seemed to suggest that Unicode characters appearing in DNS
names represented UTF-8 sequences, and that Windows allowed such
non-ASCII byte sequences in the DNS by default:

http://msdn.microsoft.com/en-us/library/ms724220%28v=VS.85%29.aspx
http://msdn.microsoft.com/en-us/library/ms682032%28v=VS.85%29.aspx

(See the discussion of DNS_ERROR_NON_RFC_NAME in the latter.)
Can anyone confirm if this is the case?

The BSD-style gethostname() function can't be returning UTF-8,
though, or else the "Nötkötti" example above would have been
decoded successfully, given that Python currently uses
PyUnicode_FromString().

Also, if GetComputerNameEx() only offers a choice of DNS names or
NetBIOS names, and both are byte-oriented underneath (that was my
reading of the "Computer Names" page), then presumably there
shouldn't be a problem with mapping the result to a bytes
equivalent when necessary?

--

___
Python tracker 
<http://bugs.python.org/issue9377>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-10-20 Thread David Watson

David Watson  added the comment:

> > Also, if GetComputerNameEx() only offers a choice of DNS names or
> > NetBIOS names, and both are byte-oriented underneath (that was my
> > reading of the "Computer Names" page), then presumably there
> > shouldn't be a problem with mapping the result to a bytes
> > equivalent when necessary?
> 
> They aren't byte-oriented underneath.It depends on whether use
> GetComputerNameA or GetComputerNameW whether you get bytes or Unicode.
> If bytes, they are converted as if by WideCharToMultiByte using
> CP_ACP, which in turn will introduce question marks and the like
> for unconvertable characters.

Sorry, I didn't mean how Windows constructs the result for the
"A" interface - I was talking about Python code being able to map
the result from the Unicode interface to the form used in the
protocol (e.g. DNS).  I believe the proposal is to use the DNS
name, so since the DNS is byte oriented, I would have thought
that the Unicode "DNS name" result would always have a bytes
equivalent that the DNS resolver code would use - perhaps its
UTF-8 encoding?

--
title: socket,  PEP 383: Mishandling of non-ASCII bytes in host/domain names -> 
socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

___
Python tracker 
<http://bugs.python.org/issue9377>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-10-21 Thread David Watson

David Watson  added the comment:

> On other platforms, I guess we'll just have to do some trial
> and error to see what works and what not. E.g. on Linux it is
> possible to set the hostname to a non-ASCII value, but then
> the resolver returns an error, so it's not very practical:
> 
> # hostname l\303\266wis
> # hostname
> löwis
> # hostname -f
> hostname: Resolver Error 0 (no error)
> 
> Using the IDNA version doesn't help either:
> 
> # hostname xn--lwis-5qa
> # hostname
> xn--lwis-5qa
> # hostname -f
> hostname: Resolver Error 0 (no error)

I think what's happening here is that simply that you're setting
the hostname to something which doesn't exist in the relevant
name databases - the man page for Linux's hostname(1) says that
"The FQDN is the name gethostbyname(2) returns for the host name
returned by gethostname(2).".  If the computer's usual name is
"newton", that may be why it works and the others don't.

It works for me if I add "127.0.0.9 löwis.egenix.com löwis" to
/etc/hosts and then set the hostname to "löwis" (all UTF-8):
hostname -f prints "löwis.egenix.com", and Python 2's
socket.getfqdn() returns the corresponding bytes; non-UTF-8 names
work too.  (Note that the FQDN must appear before the bare
hostname in the /etc/hosts entry, and I used the address
127.0.0.9 simply to avoid a collision with existing entries - by
default, Ubuntu assigns the FQDN to 127.0.1.1.)

--
title: socket,  PEP 383: Mishandling of non-ASCII bytes in host/domain names -> 
socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

___
Python tracker 
<http://bugs.python.org/issue9377>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

2010-10-31 Thread David Watson

David Watson  added the comment:

> FWIW, you can do the same on a Linux box, i.e. setup the host name
> and domain to some completely bogus values. And as David pointed out,
> without also updating the /etc/hosts on the Linux, you always get the
> resolver error with hostname -f I mentioned earlier on (which does
> a DNS lookup), so there's no real connection to the DNS system on
> Linux either.

Just to clarify here: there isn't anything special about
/etc/hosts; it's handled by a pluggable module which performs
hostname lookups in it alongside a similar module for the DNS.
glibc's Name Service Switch combines the views provided by the
various modules into a single byte-oriented namespace for
hostnames according to the settings in /etc/nssswitch.conf (this
namespace allows non-ASCII bytes, as the /etc/hosts examples
demonstrate).

http://www.kernel.org/doc/man-pages/online/pages/man5/nsswitch.conf.5.html
http://www.gnu.org/software/libc/manual/html_node/Name-Service-Switch.html

It's an extensible system, so people can write their own modules
to handle whatever name services they have to deal with, and
configure hostname lookup to query them before, after or instead
of the DNS.  A hostname that is not resolvable in the DNS may be
resolvable in one of these.

--
title: socket,  PEP 383: Mishandling of non-ASCII bytes in host/domain names -> 
socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

___
Python tracker 
<http://bugs.python.org/issue9377>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3023] Problem with invalidly-encoded command-line arguments (Unix)

2008-06-01 Thread David Watson

New submission from David Watson <[EMAIL PROTECTED]>:

The error message has no newline at the end:

$ LANG=en_GB.UTF-8 python3.0 test.py $'\xff'
Could not convert argument 2 to string$

Seriously, though: is this the intended behaviour?  If the
interpreter just dies when it gets a non-UTF-8 (or whatever)
argument, it creates an opportunity for a denial-of-service if
some admin is running a Python script via find(1) or similar.
And what if you want to run a Python script on some files named
in a mixture of charsets (because, say, you just untarred an
archive created in a foreign charset)?

Could sys.argv not provide bytes objects for those arguments,
like os.listdir()?  Or (better IMHO) have a separate
sys.argv_bytes interface?

--
components: Unicode
messages: 67608
nosy: baikie
severity: normal
status: open
title: Problem with invalidly-encoded command-line arguments (Unix)
type: behavior
versions: Python 3.0

___
Python tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue3023>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3023] Problem with invalidly-encoded command-line arguments (Unix)

2008-06-02 Thread David Watson

David Watson <[EMAIL PROTECTED]> added the comment:

Hmm, yes, I see that the open() builtin doesn't accept bytes
filenames, though os.open() still does.  When I saw that you
could pass bytes filenames transparently from os.listdir() to
os.open(), I assumed that this was intentional!

So what *is* os.listdir() supposed to do when it finds an
unconvertible filename?  Raise an exception?  Pretend the file
isn't there?  What if someone puts unconvertible strings in the
password database?  I think this is going to cause real problems
for people.

___
Python tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue3023>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6560] socket sendmsg(), recvmsg() methods

2010-02-28 Thread David Watson

David Watson  added the comment:

Thanks for your interest!  I'm actually still working on the
patch I posted, docs and a test suite, and I'll post something
soon.

Yes, you could just use b"".join() with sendmsg() (and get
slightly annoyed because it doesn't accept buffers ;) ).  I made
sendmsg() take multiple buffers because that's the way the system
call works, but also to match recvmsg_into(), which gives you the
convenience of being able to receive part of the message into a
bytearray and part into an array.array("i"), say, if that's how
the data is formatted.

As you might know, gather-write with sendmsg() can give a
performance benefit by letting the kernel assemble the message
while copying the data from userspace rather than having
userspace copy the data once to form the message and then having
the kernel copy it again when the system call is made.  I suppose
with Python you just need a larger message to see the benefit :)
Since it can read from buffers, though, socket.sendmsg() can pull
a large chunk of data straight out of an mmap object, say, and
attach headers from a bytes object without the mmapped data being
touched by Python at all (or even entering userspace, in this
case).

The patch is for 3.x, BTW - "y*" is valid there (and does take a
buffer).

As for a good reference, I haven't personally seen one.  There's
POSIX and RFC 3542, but they don't provide a huge amount of
detail.  Perhaps the (updated) W. Richard Stevens networking
books?  I've got the Stevens/Rago second edition of Advanced
Programming in the Unix Environment, which discusses FD and
credential passing with sendmsg/recvmsg, but not very well (it
misuses CMSG_LEN, for one thing).  The networking books were
updated by different people though, so perhaps they do better.

The question of whether to use CMSG_NXTHDR() to step to the next
header when constructing the buffer for sendmsg() is a bit murky,
in particular.  I've assumed that this is the way to do it since
the examples in RFC 3542 (and most of the code I've seen
generally) use CMSG_FIRSTHDR() to get the initial pointer, but
I've found that glibc's CMSG_NXTHDR() can (wrongly, I think)
return NULL if the buffer hasn't been zero-filled beforehand
(this causes segfaults with the patch I initially posted).

@Wim:

Yes, the rfc3542 module from that package looks as if it would be
usable with these patches - although it's Python 2-only, GPL-only
and looks unmaintained.  Those kind of ancillary data
constructors will actually be needed to make full portable use of
sendmsg() and recvmsg() for things like IPv6, SCTP, Linux's
socket error queues, etc.  The same goes for data for the
existing get/setsockopt() methods, in fact - the present
suggestion to use the struct module is pretty inadequate when
there are typedefs involved and implementations might add and
reorder fields, etc.

The objects in that package seem a bit overcomplicated, though,
messing about with setter methods instead of just subclassing
"bytes" and having different constructors to create the object
from individual arguments or received bytes (say, ucred(1, 2, 3)
or ucred.from_bytes(...)).

Maybe the problem of testing patches well has been putting people
off so far?  Really exercising the system's CMSG_*HDR() macros in
particular isn't entirely straightforward.  I suppose there's
also a reluctance to write tests while still uncertain about how
to present the interface - that's another reason why I went for
the most general multiple-buffer form of sendmsg()!

--

___
Python tracker 
<http://bugs.python.org/issue6560>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6560] socket sendmsg(), recvmsg() methods

2010-03-02 Thread David Watson

David Watson  added the comment:

OK, here's a new version as a work in progress.  A lot of the new
stuff is uncommented (particularly the support code for the
tests), but there are proper docs this time and a fairly complete
test suite (but see below).

There are a couple of changes to the interface (hopefully the
last).  The recvmsg() methods no longer receive ancillary data by
default, since calling them on an AF_UNIX socket with the old
default buffer could allow a malicious sender to send unwanted
file descriptors up to receiver's resource limit, and in a
multi-threaded program, another thread could then be prevented
from opening new file descriptors before the receiving thread had
a chance to close the unwanted ones.

Since the ancillary buffer size argument is now more likely to
need a value, I've moved it to second place; the basic argument
order is now the same as in Kalman Gergely's patch.  CMSG_LEN()
and CMSG_SPACE() are now provided.

I've also used socket.error instead of ValueError when rejecting
some buffer object/array for being too big to handle, since the
system call itself might cause socket.error to be raised for a
smaller (oversized) object, failing with EMSGSIZE or whatever.

The code is now much more paranoid about checking the results of
the CMSG_*() macros, and will raise RuntimeError if it finds its
assumptions are not met.

I'd still like to add tests for receiving some of the RFC 3542
ancillary data items, especially since the SCM_RIGHTS tests can't
always (ever?) test recvmsg() with multiple items (if you send
two FD arrays, the OS can coalesce them into a single array
before delivering them).

--
Added file: http://bugs.python.org/file16417/baikie-hwundram-v2.diff

___
Python tracker 
<http://bugs.python.org/issue6560>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6560] socket sendmsg(), recvmsg() methods

2010-03-03 Thread David Watson

David Watson  added the comment:

I just found that the IPv6 tests don't get skipped when IPv6 is
available but disabled in the build - you can create IPv6
sockets, but not use them :/  This version fixes the problem.

--
Added file: http://bugs.python.org/file16422/baikie-hwundram-v2.1.diff

___
Python tracker 
<http://bugs.python.org/issue6560>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1027206] unicode DNS names in socket, urllib, urlopen

2010-03-22 Thread David Watson

David Watson  added the comment:

I was about to report this for the socket module - the gethostbyname(), 
gethostbyname_ex() and getnameinfo() functions are the only things currently 
affected in that module as far as I can see.  3.x is affected too - the 
functions will pass non-ASCII Unicode to the system as UTF-8 there.  The 
attached patch fixes them in 2.x and 3.x.

--
keywords: +patch
nosy: +baikie
versions: +Python 3.2, Python 3.3
Added file: http://bugs.python.org/file16624/idna.diff

___
Python tracker 
<http://bugs.python.org/issue1027206>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8372] socket: Buffer overrun while reading unterminated AF_UNIX addresses

2010-04-11 Thread David Watson

New submission from David Watson :

The makesockaddr() function in the socket module assumes that
AF_UNIX addresses have a null-terminated sun_path, but Linux
actually allows unterminated addresses using all 108 bytes of
sun_path (for normal filesystem sockets, that is, not just
abstract addresses).

When receiving such an address (e.g. in accept() from a
connecting peer), makesockaddr() will run past the end and return
extraneous bytes from the stack, or fail because they can't be
decoded, or perhaps segfault in extreme cases.

This can't currently be tested from within Python as Python also
refuses to accept address arguments which would fill the whole of
sun_path, but the attached linux-pass-unterminated.diff (for 2.x
and 3.x) enables them for Linux.  With the patch applied:

Python 2.7a4+ (trunk, Apr  8 2010, 18:20:28) 
[GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import socket
>>> s = socket.socket(socket.AF_UNIX)
>>> s.bind("a" * 108)
>>> s.getsockname()
'\xfa\xbf\xa8)\xfa\xbf\xec\x15\n\x08l\xaaY\xb7\xb8CZ\xb7'
>>> len(_)
126

Also attached are some unit tests for use with the above patch, a
couple of C programs for checking OS behaviour (you can also see
the bug by doing accept() in Python and using the bindconn
program), and patches aimed at fixing the problem.

Firstly, the return-unterminated-* patches make makesockaddr()
scan sun_path for the first null byte as before (if it's not a
Linux abstract address), but now stop at the end of the structure
as indicated by the addrlen argument.

However, there's one more catch before this will work on Linux,
which is that Linux system calls return the length of the address
they *would* have stored in the structure had there been room for
it, which in this case is one byte longer than the official size
of a sockaddr_un structure, due to the missing null terminator.

The addrlen-* patches handle this by always calling
makesockaddr() with the actual buffer size if it is less than the
returned length.  This silently ignores any truncation, but I'm
not sure how to do anything sensible about that, and some
operating systems (e.g. FreeBSD) just silently truncate the
address anyway and don't return the original length (POSIX
doesn't make clear which, if either, behaviour is required).
Once these patches are applied, the tests pass.

There is one other issue: the patches for 3.x retain the
assumption that socket paths are in UTF-8, but they should
actually be handled according to PEP 383.  I've got a patch for
that, but I'll open a separate issue for it since the handling of
the Linux abstract namespace isn't documented and there's some
slightly unobvious behaviour that people might be depending on.

--
components: Extension Modules
files: linux-pass-unterminated.diff
keywords: patch
messages: 102861
nosy: baikie
severity: normal
status: open
title: socket: Buffer overrun while reading unterminated AF_UNIX addresses
type: behavior
versions: Python 2.5, Python 2.6, Python 2.7, Python 3.1, Python 3.2, Python 3.3
Added file: http://bugs.python.org/file16874/linux-pass-unterminated.diff

___
Python tracker 
<http://bugs.python.org/issue8372>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8372] socket: Buffer overrun while reading unterminated AF_UNIX addresses

2010-04-11 Thread David Watson

Changes by David Watson :


Added file: http://bugs.python.org/file16875/return-unterminated-2.x.diff

___
Python tracker 
<http://bugs.python.org/issue8372>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8372] socket: Buffer overrun while reading unterminated AF_UNIX addresses

2010-04-11 Thread David Watson

Changes by David Watson :


Added file: http://bugs.python.org/file16876/return-unterminated-3.x.diff

___
Python tracker 
<http://bugs.python.org/issue8372>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8372] socket: Buffer overrun while reading unterminated AF_UNIX addresses

2010-04-11 Thread David Watson

Changes by David Watson :


Added file: http://bugs.python.org/file16877/addrlen-2.x.diff

___
Python tracker 
<http://bugs.python.org/issue8372>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8372] socket: Buffer overrun while reading unterminated AF_UNIX addresses

2010-04-11 Thread David Watson

Changes by David Watson :


Added file: http://bugs.python.org/file16878/addrlen-3.x.diff

___
Python tracker 
<http://bugs.python.org/issue8372>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8372] socket: Buffer overrun while reading unterminated AF_UNIX addresses

2010-04-11 Thread David Watson

Changes by David Watson :


Added file: http://bugs.python.org/file16879/test-2.x.diff

___
Python tracker 
<http://bugs.python.org/issue8372>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8372] socket: Buffer overrun while reading unterminated AF_UNIX addresses

2010-04-11 Thread David Watson

Changes by David Watson :


Added file: http://bugs.python.org/file16880/test-3.x.diff

___
Python tracker 
<http://bugs.python.org/issue8372>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8373] socket: AF_UNIX socket paths not handled according to PEP 383

2010-04-11 Thread David Watson

New submission from David Watson :

In 3.x, the socket module assumes that AF_UNIX addresses use
UTF-8 encoding - this means, for example, that accept() will
raise UnicodeDecodeError if the peer socket path is not valid
UTF-8, which could crash an unwary server.

Python 3.1.2 (r312:79147, Mar 23 2010, 19:02:21) 
[GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu4)] on linux2
Type "help", "copyright", "credits" or "license" for more
information.
>>> from socket import *
>>> s = socket(AF_UNIX, SOCK_STREAM)
>>> s.bind(b"\xff")
>>> s.getsockname()
Traceback (most recent call last):
  File "", line 1, in 
UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 0: 
unexpected code byte

I'm attaching a patch to handle socket paths according to PEP
383.  Normally this would use PyUnicode_FSConverter, but there
are a couple of ways in which the address handling currently
differs from normal filename handling.

One is that embedded null bytes are passed through to the system
instead of being rejected, which is needed for the Linux abstract
namespace.  These abstract addresses are returned as bytes
objects, but they can currently be specified as strings with
embedded null characters as well.  The patch preserves this
behaviour.

The current code also accepts read-only buffer objects (it uses
the "s#" format), so in order to accept these as well as
bytearray filenames (which the posix module accepts), the patch
simply accepts any single-segment buffer, read-only or not.

This patch applies on top of the patches I submitted for issue
#8372 (rather than knowingly running past the end of sun_path).

--
components: Extension Modules
files: af_unix-pep383.diff
keywords: patch
messages: 102865
nosy: baikie
severity: normal
status: open
title: socket: AF_UNIX socket paths not handled according to PEP 383
type: behavior
versions: Python 3.1, Python 3.2, Python 3.3
Added file: http://bugs.python.org/file16881/af_unix-pep383.diff

___
Python tracker 
<http://bugs.python.org/issue8373>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8373] socket: AF_UNIX socket paths not handled according to PEP 383

2010-04-11 Thread David Watson

David Watson  added the comment:

This patch does the same thing without fixing issue #8372 (not
that I'd recommend that, but it may be easier to review).

--
Added file: http://bugs.python.org/file16882/af_unix-pep383-no-8372-fix.diff

___
Python tracker 
<http://bugs.python.org/issue8373>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8373] socket: AF_UNIX socket paths not handled according to PEP 383

2010-04-11 Thread David Watson

Changes by David Watson :


Added file: http://bugs.python.org/file16883/test-existing.diff

___
Python tracker 
<http://bugs.python.org/issue8373>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8373] socket: AF_UNIX socket paths not handled according to PEP 383

2010-04-11 Thread David Watson

Changes by David Watson :


Added file: http://bugs.python.org/file16884/test-new.diff

___
Python tracker 
<http://bugs.python.org/issue8373>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8373] socket: AF_UNIX socket paths not handled according to PEP 383

2010-04-11 Thread David Watson

Changes by David Watson :


Added file: http://bugs.python.org/file16885/af_unix-pep383-doc.diff

___
Python tracker 
<http://bugs.python.org/issue8373>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8372] socket: Buffer overrun while reading unterminated AF_UNIX addresses

2010-04-12 Thread David Watson

David Watson  added the comment:

Attaching the C test programs I forgot to attach yesterday -
sorry about that.  I've also tried these programs, and the
patches, on FreeBSD 5.3 (an old version from late 2004).  I found
that it accepted unterminated addresses as well, and unlike Linux
it did not normally null-terminate addresses at all - the
existing socket code only worked for addresses shorter than
sun_path because it zero-filled the structure beforehand.  The
return-unterminated patches worked with or without the
zero-filling.

Unlike Linux, FreeBSD also accepted oversized sockaddr_un
structures (sun_path longer than its definition), so just
allowing unterminated addresses wouldn't make the full range of
addresses usable there.  That said, I did get a kernel panic
shortly after testing with oversized addresses, so perhaps it's
not a good idea to actually use them :)

--
Added file: http://bugs.python.org/file16898/bindconn.c

___
Python tracker 
<http://bugs.python.org/issue8372>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8372] socket: Buffer overrun while reading unterminated AF_UNIX addresses

2010-04-12 Thread David Watson

Changes by David Watson :


Added file: http://bugs.python.org/file16899/accept.c

___
Python tracker 
<http://bugs.python.org/issue8372>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3023] Problem with invalidly-encoded command-line arguments (Unix)

2009-01-02 Thread David Watson

David Watson  added the comment:

@ Victor Stinner: Yes, the behaviour of those functions is as you
describe - it's been changed since I filed this issue.  I do
consider it an improvement.

By the password database, I mean /etc/passwd or replacements that
are accessible via getpwnam() and friends.  Users are often
allowed to change things like the GECOS field, and can generally
stick any old junk in there, regardless of encoding.  Now that I
come to check, it seems that in the Python 3.0 release, the pwd.*
functions do raise UnicodeDecodeError when the GECOS field can't
be decoded (bizarrely, they try to interpret it as a Python
string literal, and thus choke on invalid backslash escapes).
Unfortunately, this allows a user to change their GECOS field so
that system programs written in Python can't determine the
username corresponding to that user's UID or vice versa.

___
Python tracker 
<http://bugs.python.org/issue3023>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4859] pwd, spwd, grp functions vulnerable to denial of service

2009-01-06 Thread David Watson

New submission from David Watson :

The pwd (and spwd and grp) modules deal with data from
/etc/passwd (and/or other sources) that can be supplied by users
on the system.  Specifically, users can often change the data in
their GECOS fields without the OS requiring that it conform to a
specific encoding, and given some automated account signup
system, it's conceivable that arbitrary data could even be placed
in the username field.

This causes a problem since the functions in these modules try to
decode the data into str objects, and if a user has placed data
in /etc/passwd, say, that does not conform to the relevant
encoding, the function will raise UnicodeDecodeError and thus
prevent the program from learning the relevant mapping between
username and UID, etc. (or crash the program if it wasn't
expecting this).  For a system program written in Python, this
can amount to a denial of service attack, especially if the
program uses the get*all() functions.

Currently, the pwd module tries to decode the string fields using
the Unicode-escape codec, i.e. like a Python string literal, and
this can fail when given an invalid backslash escape.  You can
see this by running chfn(1), entering something like "\ux" in one
of the fields, and then calling pwd.getpwnam(yourname) or
pwd.getpwall().  Perhaps the use of this codec is a mistake,
given that spwd and grp decode the string fields as UTF-8, but
chfn could also be used to enter non-UTF-8 data in the GECOS
field.  You can see similar failures in the grp and spwd modules
after adding a user with a non-UTF-8 name (do something like
"useradd $'\xff'" in bash).

A debug build of Python also reports a reference counting error
in grp (count goes to -1) when its functions fail on non-UTF-8
data; what I think is going on is that in mkgrent(),
PyStructSequence_SET_ITEM steals the reference to "w", meaning
the second "Py_DECREF(w)" shouldn't be there.  Also, getpwall()
and getgrall() leave file descriptors open when they fail, since
they don't call end*ent() in this case.  The attached minor.diff
fixes both of these problems, I think.

I've also written a patch (bytes.diff, attached) that would add
new functions pwd.getpwnamb(), etc. (analogous to os.getcwdb())
to return bytes objects for the text fields, thus avoiding these
problems - what do you think?  The patch also makes pwd's
original string functions use UTF-8 like the other modules.

Alternatively or in addition, a quick "fix" for the GECOS problem
might be for the pwd module to decode the text fields as Latin-1,
since in the absence of backslash escapes this is what the
Unicode-escape encoding is equivalent to.  This would at least
block any DoS attempts using the GECOS field (or attempts to add
extra commas with \x2c, etc.) without changing the behaviour
much.  The attached latin1.diff does this.

--
components: Extension Modules
files: bytes.diff
keywords: patch
messages: 79286
nosy: baikie
severity: normal
status: open
title: pwd, spwd, grp functions vulnerable to denial of service
type: security
versions: Python 3.0, Python 3.1
Added file: http://bugs.python.org/file12621/bytes.diff

___
Python tracker 
<http://bugs.python.org/issue4859>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4859] pwd, spwd, grp functions vulnerable to denial of service

2009-01-06 Thread David Watson

Changes by David Watson :


Added file: http://bugs.python.org/file12622/minor.diff

___
Python tracker 
<http://bugs.python.org/issue4859>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4859] pwd, spwd, grp functions vulnerable to denial of service

2009-01-06 Thread David Watson

Changes by David Watson :


Added file: http://bugs.python.org/file12623/latin1.diff

___
Python tracker 
<http://bugs.python.org/issue4859>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4859] pwd, spwd, grp functions vulnerable to denial of service

2009-01-07 Thread David Watson

David Watson  added the comment:

> baikie: Open a separated issue for the refcount error and fd leak.

OK.  It does affect 2.x as well, come to think of it.

> On Ubuntu, it's not possible to create an user with a non-ASCII
> name:
>
> $ sudo adduser é --no-create-home
>
> adduser: To avoid problems, the username should consist only of...

Well, good for Ubuntu :)  But you can still add one with the
lower-level useradd command, and not everyone uses Ubuntu.

> Your patch latin1.diff is wrong

Yes, I know it's "wrong" - I just thought of it as a stopgap
measure until some sort of bytes functionality is added (since
pwd already decodes everything as Latin-1, but tries to interpret
backslash escapes).  But yeah, if it's going to be changed later,
then I suppose there's not much point.

> I don't think that it can be called a "denial of service attack".

It depends on how the program uses these functions.  Obviously
Python itself is only vulnerable to a DoS if the interpreter
crashes or something, but what I'm saying is that there should be
a way for Python programs to access the password database that is
not subject to denial of service attacks.  If someone changes
their GECOS field they can make pwd.getpwall() fail for another
user's program, and if the program relies on pwd.getpwall()
working, then that's a DoS.

___
Python tracker 
<http://bugs.python.org/issue4859>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4873] Refcount error and file descriptor leaks in pwd, grp modules

2009-01-07 Thread David Watson

New submission from David Watson :

When investigating issue #4859 I found that when pwd.getpwall()
and grp.getgrall() fail due to decoding errors, they leave open
file descriptors referring to the passwd and group files, since
they don't call the end*ent() functions in this case.  Also, the
grp.* functions have a reference counting error when they fail in
this way - a debug build reports that an object's reference count
goes to -1.  What I think happens is that in mkgrent(),
PyStructSequence_SET_ITEM steals the reference to "w", meaning
that the "Py_DECREF(w)" call shouldn't be made afterwards.  The
attached diff fixes both of these problems, I think, and applies
to the 2.x and 3.x branches.

--
components: Extension Modules
files: minor.diff
keywords: patch
messages: 79378
nosy: baikie
severity: normal
status: open
title: Refcount error and file descriptor leaks in pwd, grp modules
type: resource usage
Added file: http://bugs.python.org/file12639/minor.diff

___
Python tracker 
<http://bugs.python.org/issue4873>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



  1   2   >