[issue9971] Optimize BufferedReader.readinto

2011-05-12 Thread STINNER Victor
STINNER Victor added the comment: You don't want to backport the optimization to at least 3.2? -- ___ Python tracker ___ ___ Python-bu

[issue9971] Optimize BufferedReader.readinto

2011-05-11 Thread Antoine Pitrou
Antoine Pitrou added the comment: I've committed a minimally modified version of the patch, thank you! -- resolution: -> fixed stage: patch review -> committed/rejected status: open -> closed ___ Python tracker __

[issue9971] Optimize BufferedReader.readinto

2011-05-11 Thread Roundup Robot
Roundup Robot added the comment: New changeset a1d77c6f4ec1 by Antoine Pitrou in branch 'default': Issue #9971: Write an optimized implementation of BufferedReader.readinto(). http://hg.python.org/cpython/rev/a1d77c6f4ec1 -- nosy: +python-dev ___ Pyt

[issue9971] Optimize BufferedReader.readinto

2011-05-11 Thread John O'Connor
John O'Connor added the comment: I've attached the latest changes based on feedback (issue9971-v5.patch) for i in 1 4 128 256 1024 2048 4069 8192 16384; do echo -n "buffer_size=$i "; ./python -m timeit -s "f=open('LICENSE','rb');b=bytearray($i)" "f.seek(0)" "while f.readinto(b): pass"; done

[issue9971] Optimize BufferedReader.readinto

2011-05-11 Thread Antoine Pitrou
Antoine Pitrou added the comment: > On Linux 32 bits, size_t is 32 bits, off_t is 64 bits. If the file size > is 4 GB, the downcast may truncate the size of 0 byte. We are not talking about the file size here. -- ___ Python tracker

[issue9971] Optimize BufferedReader.readinto

2011-05-11 Thread STINNER Victor
STINNER Victor added the comment: Le mardi 10 mai 2011 à 19:06 +, John O'Connor a écrit : > Victor: AFAIK its not actually downcasting. On Linux 32 bits, size_t is 32 bits, off_t is 64 bits. If the file size is 4 GB, the downcast may truncate the size of 0 byte. It would be safer to use off

[issue9971] Optimize BufferedReader.readinto

2011-05-10 Thread John O'Connor
Changes by John O'Connor : Removed file: http://bugs.python.org/file21957/unnamed ___ Python tracker ___ ___ Python-bugs-list mailing list Unsu

[issue9971] Optimize BufferedReader.readinto

2011-05-10 Thread John O'Connor
John O'Connor added the comment: No problem for me either way. I created issue12053 to track that. - John O'Connor On Tue, May 10, 2011 at 3:19 PM, Antoine Pitrou wrote: > > Antoine Pitrou added the comment: > > > Also, while we're at it, would it be worthwhile for me to make a patch > > fo

[issue9971] Optimize BufferedReader.readinto

2011-05-10 Thread Antoine Pitrou
Antoine Pitrou added the comment: > Also, while we're at it, would it be worthwhile for me to make a patch > for the prefech() method you proposed? Should a separate issue be > created for that? I know there was no definitive answer in the email > thread but it may be fun to experiment with as w

[issue9971] Optimize BufferedReader.readinto

2011-05-10 Thread John O'Connor
John O'Connor added the comment: Victor: AFAIK its not actually downcasting. The safe downcast just uses an assertion when debugging is enabled. I chose to use it because it seems to be a convention in the file. Antoine: You say quirky, I say elegant :) Though I have no problem changing it.

[issue9971] Optimize BufferedReader.readinto

2011-05-10 Thread STINNER Victor
STINNER Victor added the comment: +Py_SAFE_DOWNCAST(READAHEAD(self), Py_off_t, Py_ssize_t) Why downcasting the size? Can't you store the size into a Py_off_t? I suppose that sizeof(Py_off_t) >= sizeof(Py_ssize_t). -- ___ Python tracker

[issue9971] Optimize BufferedReader.readinto

2011-05-10 Thread Antoine Pitrou
Antoine Pitrou added the comment: By the way, the Java-like version actually seems quite interesting. -- ___ Python tracker ___ ___ Py

[issue9971] Optimize BufferedReader.readinto

2011-05-10 Thread Antoine Pitrou
Antoine Pitrou added the comment: > Based on the above I think you are right about using the internal buffer > regardless (revision attached). You pay a price with larger buffer sizes but > on > balance it seems to be a much better general purpose solution. The java-like > solution is decent a

[issue9971] Optimize BufferedReader.readinto

2011-05-09 Thread Nir Aides
Changes by Nir Aides : -- nosy: +nirai ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/ma

[issue9971] Optimize BufferedReader.readinto

2011-05-08 Thread Antoine Pitrou
Antoine Pitrou added the comment: > It would be nice if you can also patch _pyio. I read sometimes _pyio > to check how _io is implemented. Well, _pyio implements the same APIs but it doesn't necessarily use the same algorithms. -- ___ Python tracke

[issue9971] Optimize BufferedReader.readinto

2011-05-08 Thread STINNER Victor
STINNER Victor added the comment: It would be nice if you can also patch _pyio. I read sometimes _pyio to check how _io is implemented. -- nosy: +haypo ___ Python tracker ___ __

[issue9971] Optimize BufferedReader.readinto

2011-05-08 Thread John O'Connor
Changes by John O'Connor : Added file: http://bugs.python.org/file21941/issue9971-like-java.patch ___ Python tracker ___ ___ Python-bugs-list m

[issue9971] Optimize BufferedReader.readinto

2011-05-08 Thread John O'Connor
John O'Connor added the comment: I experimented with a bunch of different options. All benchmarks performed with: $ for i in 1 4 128 256 1024 2048 4069 8192; do echo -n "buffer_size=${i} "; ./python -m timeit -s "f=open('LICENSE','rb');b=bytearray(${i})" \ "f.seek(0)" "while f.readinto(b): p

[issue9971] Optimize BufferedReader.readinto

2011-05-08 Thread Antoine Pitrou
Antoine Pitrou added the comment: > The trade-off of accommodating a small buffer size (by buffering > behind the scenes anyways) would likely slow the more common cases > which use a decent buffer size. I am wondering if an effort to > accommodate both uses would be appropriate. Possibly by not

[issue9971] Optimize BufferedReader.readinto

2011-05-07 Thread John O'Connor
Changes by John O'Connor : Removed file: http://bugs.python.org/file21900/buffered_readinto2.patch ___ Python tracker ___ ___ Python-bugs-list

[issue9971] Optimize BufferedReader.readinto

2011-05-07 Thread John O'Connor
Changes by John O'Connor : Removed file: http://bugs.python.org/file21899/buffered_readinto.patch ___ Python tracker ___ ___ Python-bugs-list m

[issue9971] Optimize BufferedReader.readinto

2011-05-07 Thread John O'Connor
John O'Connor added the comment: FWIW, It seems Java does something similar. They write directly into caller's buffer if outstanding bytes needed (after emptying internal buffer) is greater than internal buffer len. -- ___ Python tracker

[issue9971] Optimize BufferedReader.readinto

2011-05-07 Thread John O'Connor
John O'Connor added the comment: It seems to me that the point of using readinto() is to avoid double-buffering (and the extra alloc/free that comes with read()). The slowness of using a small buffer size seems like only a natural and expected consequence. The trade-off of accommodating a sm

[issue9971] Optimize BufferedReader.readinto

2011-05-07 Thread Antoine Pitrou
Changes by Antoine Pitrou : -- Removed message: http://bugs.python.org/msg135435 ___ Python tracker ___ ___ Python-bugs-list mailing li

[issue9971] Optimize BufferedReader.readinto

2011-05-07 Thread Antoine Pitrou
Changes by Antoine Pitrou : -- Removed message: http://bugs.python.org/msg135425 ___ Python tracker ___ ___ Python-bugs-list mailing li

[issue9971] Optimize BufferedReader.readinto

2011-05-07 Thread Antoine Pitrou
Antoine Pitrou added the comment: Uh, sorry for the last message (about overlapped I/O), wrong issue. -- ___ Python tracker ___ ___ Py

[issue9971] Optimize BufferedReader.readinto

2011-05-07 Thread Antoine Pitrou
Antoine Pitrou added the comment: A solution could be to use overlapped I/O on the named pipe handles. The first poll() would call ReadFile() with a tiny value (1?) and store an object wrapping the OVERLAPPED structure. Subsequent poll() or recv() would re-use that structure until the overlapp

[issue9971] Optimize BufferedReader.readinto

2011-05-07 Thread Antoine Pitrou
Antoine Pitrou added the comment: Oops... It hadn't jumped at me earlier, but the patch is actually problematic performance-wise. The reason is that it doesn't buffer data at all, so small readintos become slower (they have to go through raw I/O every time): $ ./python -m timeit -s "f=open('L

[issue9971] Optimize BufferedReader.readinto

2011-05-06 Thread John O'Connor
John O'Connor added the comment: Good catch. v3 attached. -- Added file: http://bugs.python.org/file21912/buffered_readinto3.patch ___ Python tracker ___

[issue9971] Optimize BufferedReader.readinto

2011-05-06 Thread Antoine Pitrou
Antoine Pitrou added the comment: +if (n <= 0) { +if (written > 0 || n == 0) +break; +if (n == -2) { +Py_INCREF(Py_None); +res = Py_None; +} +goto end; +} I think the logic is flawed:

[issue9971] Optimize BufferedReader.readinto

2011-05-05 Thread John O'Connor
John O'Connor added the comment: Thanks for the feedback. I made the changes, PTAL. -- Added file: http://bugs.python.org/file21900/buffered_readinto2.patch ___ Python tracker __

[issue9971] Optimize BufferedReader.readinto

2011-05-05 Thread Antoine Pitrou
Antoine Pitrou added the comment: Thank you for the patch. A couple of comments: - the whole slow path (which loops calling _bufferedreader_raw_read()) should be surrounded by calls to ENTER_BUFFERED() and LEAVE_BUFFERED() - other things: +if (!PyArg_ParseTuple(args, "O:readinto", &buffe

[issue9971] Optimize BufferedReader.readinto

2011-05-05 Thread John O'Connor
John O'Connor added the comment: Attached patch draft for buffered_readinto(). patchcheck removed some whitespace as well. Daniel, I agree. BufferedReader.readinto seemingly defeats the purpose of using a BufferedReader to begin with. Though, for the difference Antoine pointed out it makes s

[issue9971] Optimize BufferedReader.readinto

2011-05-05 Thread Antoine Pitrou
Antoine Pitrou added the comment: > As I think about this more... I'm not sure how much performance there > is to gain here in practice. It seems like any time I'd want to use > readinto(), it's because I want to do my own buffering, in which case > why would I use a BufferedReader? The differ

[issue9971] Optimize BufferedReader.readinto

2011-05-04 Thread Benjamin Peterson
Benjamin Peterson added the comment: I don't see the problem. You're free to override readinto() and read() in subclasses. readinto() is just implemented in BufferedIOBase as a convenience. -- ___ Python tracker _

[issue9971] Optimize BufferedReader.readinto

2011-05-04 Thread Daniel Stutzbach
Daniel Stutzbach added the comment: Looking at this again, I agree with John. For BufferedIOBase, read() is abstract while readinto() is concrete. That seems backward, and, indeed, it's the opposite of RawIOBase, where readinto() is abstract and read() is concrete. Unfortunately, this code

[issue9971] Optimize BufferedReader.readinto

2011-05-04 Thread Benjamin Peterson
Benjamin Peterson added the comment: 2011/5/4 John O'Connor : > > John O'Connor added the comment: > > I am new to the community but hoping to start contributing or at least > following issues and learning :) > > I'm looking at bufferediobase_readinto(). What I haven't yet figured out is > wh

[issue9971] Optimize BufferedReader.readinto

2011-05-04 Thread John O'Connor
John O'Connor added the comment: I am new to the community but hoping to start contributing or at least following issues and learning :) I'm looking at bufferediobase_readinto(). What I haven't yet figured out is why .readinto() is (currently) implemented at this layer of the hierarchy. You

[issue9971] Optimize BufferedReader.readinto

2011-05-02 Thread Antoine Pitrou
Changes by Antoine Pitrou : -- versions: +Python 3.3 -Python 3.2 ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue9971] Optimize BufferedReader.readinto

2010-09-29 Thread Daniel Urban
Changes by Daniel Urban : -- nosy: +durban ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.or

[issue9971] Optimize BufferedReader.readinto

2010-09-28 Thread Daniel Stutzbach
New submission from Daniel Stutzbach : The readinto() method is intended to offer better performance than read() by allowing the caller to read into a preallocated buffer rather than constantly allocate and deallocate buffers. However, bufferediobase_readinto() calls read(), so the extra alloc