[issue10467] io.BytesIO.readinto() segfaults when used on BytesIO object seeked beyond end.
New submission from Sebastian Hagen : io.BytesIO().readinto() does not correctly handle the case of being called on a BytesIO object that has been seeked past the end of its data. It consequently ends up reading into unallocated memory, and (typically?) segfaulting if used in this manner. I've confirmed that this bug exists in the same fashion in 2.6, 2.7, 3.0, 3.1 and 3.2; the following demonstration code works on all of these. Demonstration: >>> import io; b = io.BytesIO(b'bytes'); b.seek(42); b.readinto(bytearray(1)) 42 Segmentation fault I'm attaching a simple patch against r32a3:85355 that fixes this problem. -- components: IO files: bio_readinto_1.patch keywords: patch messages: 121618 nosy: sh priority: normal severity: normal status: open title: io.BytesIO.readinto() segfaults when used on BytesIO object seeked beyond end. type: crash versions: Python 2.6, Python 2.7, Python 3.1, Python 3.2 Added file: http://bugs.python.org/file19656/bio_readinto_1.patch ___ Python tracker <http://bugs.python.org/issue10467> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7382] bytes.__getnewargs__ is broken; copy.copy() therefore doesn't work on bytes, and bytes subclasses can't be pickled by default
New submission from Sebastian Hagen : In either python 3.0, bytes instances cannot be copied, and (even trivial) bytes subclasses cannot be unpickled unless they explicitly override __getnewargs__() or __reduce_ex__(). Copy problem: >>> import copy; copy.copy(b'foo') Traceback (most recent call last): File "", line 1, in File "/usr/lib/python3.1/copy.py", line 96, in copy return _reconstruct(x, rv, 0) File "/usr/lib/python3.1/copy.py", line 280, in _reconstruct y = callable(*args) File "/usr/lib/python3.1/copyreg.py", line 88, in __newobj__ return cls.__new__(cls, *args) TypeError: string argument without an encoding Bytes subclass unpickle problem: >>> class B(bytes): ... pass ... >>> import pickle; pickle.loads(pickle.dumps(B(b'foo'))) Traceback (most recent call last): File "", line 1, in File "/usr/lib/python3.1/pickle.py", line 1373, in loads encoding=encoding, errors=errors).load() TypeError: string argument without an encoding AFAICT, the problem is that bytes.__getnewargs__() returns a tuple with a single argument - a string - and bytes.__new__() refuses to reconstruct the instance when called with in that manner. That is, "bytes.__new__(bytes, *b'foo'.__getnewargs__())" fails with a TypeError. This does not cause a problem for pickling bytes instances (as opposed to instances of a subclass of bytes), because both the Python and C versions of pickle shipped with Python 3.[01] have built-in magic (_Pickler.save_bytes() and save_bytes(), respectively) to deal with bytes instances, and therefore never call their __getnewargs__(). The pickle case, in particular, is highly irritating; the error message doesn't indicate which object is causing the problem, and until you actually try to load the pickle, there's nothing to indicate that there's anything problematic about pickling an instance of a subclass of bytes. -- components: Library (Lib) files: pickle_bytes_subclass.py messages: 95632 nosy: sh severity: normal status: open title: bytes.__getnewargs__ is broken; copy.copy() therefore doesn't work on bytes, and bytes subclasses can't be pickled by default type: behavior versions: Python 3.0, Python 3.1 Added file: http://bugs.python.org/file15387/pickle_bytes_subclass.py ___ Python tracker <http://bugs.python.org/issue7382> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7560] Various filename-taking posix methods don't like bytes / buffer objects.
New submission from Sebastian Hagen : Most of the functions in Python's stdlib that take filename parameters allow for those parameters to be buffer (such as bytes, bytearray, memoryview) objects. This is useful for various reasons, among them that on Posix-likes, file- and pathnames ultimately *are* sequences of bytes. A number of functions in the posix (and thus, os) module break this convention: mkfifo() mknod() statvfs() pathconf() E.g.: >>> os.statvfs(b'/') Traceback (most recent call last): File "", line 1, in TypeError: statvfs() argument 1 must be string, not bytes I'm attaching a patch that modifies the abovementioned functions to make them accept buffer-like objects in addition to string objects; I've never contributed code to the stdlib, so any general problems with that patch can be ascribed to my ignorance about established practice (or inability to program, in the case of downright bugs). I'm a bit off a loss as what to do about posix.system(). IMO, that one should also take bytes - at least on posix-like OSes - since it specifies a commandline, and both the name and the arguments in such lines are, on posix-likes, sequences of bytes. I'm not sure how to best reconcile that with the MS Windows version of that function, however; advice would be welcome. -- components: Library (Lib) files: posix_fn_bytes_01.patch keywords: patch messages: 96792 nosy: sh severity: normal status: open title: Various filename-taking posix methods don't like bytes / buffer objects. versions: Python 3.1, Python 3.2 Added file: http://bugs.python.org/file15659/posix_fn_bytes_01.patch ___ Python tracker <http://bugs.python.org/issue7560> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7560] Various filename-taking posix methods don't like bytes / buffer objects.
Changes by Sebastian Hagen : -- type: -> behavior ___ Python tracker <http://bugs.python.org/issue7560> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7560] Various filename-taking posix methods don't like bytes / buffer objects.
Changes by Sebastian Hagen : Removed file: http://bugs.python.org/file15659/posix_fn_bytes_01.patch ___ Python tracker <http://bugs.python.org/issue7560> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7560] Various filename-taking posix methods don't like bytes / buffer objects.
Sebastian Hagen added the comment: I'm taking that patch back. More testing would have been in order before posting; sorry for that, will repost once I've got the obvious problems worked out. -- ___ Python tracker <http://bugs.python.org/issue7560> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7560] Various filename-taking posix methods don't like bytes / buffer objects.
Sebastian Hagen added the comment: And further testing reveals that all of this has in fact already been fixed in trunk. I assumed it hadn't been, because the code for at least some of the relevant functions in Modules/_posixmodule.c is the same as in 3.1.1; I didn't know that the samentics for the "s" format parameter to PyArg_ParseTuple() had been modified. Apologies for the noise. -- ___ Python tracker <http://bugs.python.org/issue7560> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7560] Various filename-taking posix methods don't like bytes / buffer objects.
Changes by Sebastian Hagen : -- status: open -> closed versions: -Python 3.2 ___ Python tracker <http://bugs.python.org/issue7560> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7561] Filename-taking functions in posix segfault when called with a bytearray arg.
New submission from Sebastian Hagen : Various functions in the 'posix' module that take filename arguments accept bytearray values for those arguments, and mishandle those objects in a way that leads to segfaults. Python 3.1 (r31:73572, Jul 23 2009, 23:41:26) [GCC 4.3.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import os >>> os.mkdir(bytearray(b'/')) Segmentation fault There's at least two seperate problems with the way posixmodule handles these objects. The first is that the code isn't set up to handle NULL retvals from PyByteArray_AS_STRING(), which occur for zero-byte bytearray objects. This causes a NULL-pointer dereference in PyUnicode_FSConverter() if you pass a zero-length bytearray. The second issue is that release_bytes() calls bytearray_releasebuffer() with NULL for the first argument, which directly leads to a NULL-pointer dereference. I'm attaching a patch against SVN 77001 which should fix both of these issues. -- components: Library (Lib) files: posixmodule_fn_bytearray_fix_01.patch keywords: patch messages: 96795 nosy: sh severity: normal status: open title: Filename-taking functions in posix segfault when called with a bytearray arg. type: crash versions: Python 3.1, Python 3.2 Added file: http://bugs.python.org/file15660/posixmodule_fn_bytearray_fix_01.patch ___ Python tracker <http://bugs.python.org/issue7561> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7561] Filename-taking functions in posix segfault when called with a bytearray arg.
Sebastian Hagen added the comment: Not exactly. The last part fixes the second problem, which you get for non-zero-length bytearrays. But without the first fix, zero-length bytearrays still lead to a crash: Python 3.2a0 (py3k:77001M, Dec 22 2009, 18:17:08) [GCC 4.3.4] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import posix >>> posix.mkdir(bytearray(0)) Segmentation fault That's what the rest of the patch fixes. -- ___ Python tracker <http://bugs.python.org/issue7561> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7561] Filename-taking functions in posix segfault when called with a bytearray arg.
Sebastian Hagen added the comment: You're correct about PyUnicode_FSConverter(), which is why the very first part of my patch fixes that function. Only fixing that one will get rid of the segfaults, but also lead to incorrect error reporting for the zero-length bytearray case; the bytes2str() modification is to get the right exceptions. I don't know which precise semantics PyByteArray_AS_STRING() is *supposed* to have. I assumed it returning NULL was normal for 0-byte-length arrays, and based my patch off of that. -- ___ Python tracker <http://bugs.python.org/issue7561> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7561] Filename-taking functions in posix segfault when called with a bytearray arg.
Sebastian Hagen added the comment: Correction: "Only fixing that one will get rid of the segfaults" ... well, for mkdir() on GNU/Linux, anyway. POSIX.1-2008 doesn't specify what happens if you call mkdir() with a NULL pointer, so I guess other conforming implementations might in fact still segfault at that point - it just happens that the one I tested it on is too nice to do that. Either way, passing a NULL pointer to those functions is almost certainly not a good idea. -- ___ Python tracker <http://bugs.python.org/issue7561> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7561] Filename-taking functions in posix segfault when called with a bytearray arg.
Sebastian Hagen added the comment: I've glanced at some of the other PyByteArray_AS_STRING() (and PyByteArray_AsStr(), which inherits this behaviour) uses in the stdlib. By far the heaviest user is bytearrayobject.c; aside from that, there's by my count only 24 uses in current trunk. I haven't looked at all of them in detail, but the ones I have looked at all seem to ensure that the possible NULL retvals don't cause them problems. Given that, and considering that bytearray itself uses it for all kinds of operations, I'd be rather reluctant to add any additional overhead to this macro absent some authoritative statement that the current behaviour is bad. We'd definitely get better performance by just having posixmodule.c pay attention to the retval it gets. [Yes, this is probably premature optimization; but it's not as if fixing posixmodule.c takes any massive changes either, so I'm not too worried about additional code complexity in this particular case.] -- ___ Python tracker <http://bugs.python.org/issue7561> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7561] Filename-taking functions in posix segfault when called with a bytearray arg.
Sebastian Hagen added the comment: Well, it doesn't *need* to accept them ... but it would certainly be nice to have. If you've already got the filename in a bytearray object for some reason, being able to pass it through directly saves you both a copy and the explicit conversion code, which is a double-win. >From an interface POV, it'd be even better if memoryview was allowed, too ... is there a specific reason that it's not? If one kind of simple readable buffers work, I don't see any good reason not to support all such objects. -- ___ Python tracker <http://bugs.python.org/issue7561> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7561] Filename-taking functions in posix segfault when called with a bytearray arg.
Sebastian Hagen added the comment: Oh, and *forcing* use of the PEP 383 hack for such interfaces would really be the Wrong Thing. Byte sequences are the natural (and most efficient, and least prone to misunderstandings) way to store filenames on a posix-like. Storing them as unicode-except-not-really is an acceptable hack for interfaces that need to standardize on strings for some reasons, but that really doesn't apply to these functions, and I'd always store such filenames as bytes if I know I'm running on a posix-like. -- ___ Python tracker <http://bugs.python.org/issue7561> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com