Hello, I have sent this message to the authors as well as to this list. If this is the wrong list please let me know where I should be sending it... dev perhaps?
First the simple questions: The versions of io.BufferedReader.peek() have different behavior which one is going to stay long term? Is the C version of the reader incomplete or simply changing the behavior? lastly will you consider my input on the api (see below)? Now a full explanation. I am working on writing a multipart parser for html returns in python 3.1. The email parser being used by cgi does not work currently and cgi is broken at the moment especially when used with the wsgiref.simple_server as it is currently implemented. This is what has pushed me to write my own implementation to _part_ of cgi.py. My thinking being that if it works well in the end I might submit a patch as it needs one anyway. My questions revolve around io.BufferedReader.peek(). There are two implementations one writen in python and one in C. At least in python3.1 C is used by default. The version written in python behaves as follows: want = min(n, self.buffer_size) have = len(self._read_buf) - self._read_pos if have < want or have <= 0: to_read = self.buffer_size - have current = self.raw.read(to_read) if current: self._read_buf = self._read_buf[self._read_pos:] + current self._read_pos = 0 return self._read_buf[self._read_pos:] This basically means it will always return the requested number of bytes up to buffersize and will preform a read on the underlying stream to get extra data if the buffer has less than requested (upto full buffersize). It also will not return a longer buffer than the number of bytes requested. I have verified this is the behaviour of this. The C version works a little different. The C version works as follows: Py_ssize_t have, r; have = Py_SAFE_DOWNCAST(READAHEAD(self), Py_off_t, Py_ssize_t); /* Constraints: 1. we don't want to advance the file position. 2. we don't want to lose block alignment, so we can't shift the buffer to make some place. Therefore, we either return `have` bytes (if > 0), or a full buffer. */ if (have > 0) { return PyBytes_FromStringAndSize(self->buffer + self->pos, have); } /* Fill the buffer from the raw stream, and copy it to the result. */ _BufferedReader_reset_buf(self); r = _BufferedReader_fill_buffer(self); if (r == -1) return NULL; if (r == -2) r = 0; self->pos = 0; return PyBytes_FromStringAndSize(self->buffer, r); Which basically means it returns what ever is in the buffer period. It will not fill the buffer any more from the raw stream to allow us to peek up to one buffersize like the python version and it always returns whats in the buffer regardless of how much you request. The only exception to this is if the buffer is empty. In that case it will read it full then return it. So it can be said this function is guaranteed to return 1 byte unless a raw read is not possible. The author says they cannot shift the buffer. This is true to retain file alignment. Double buffers maybe a solution if the python versions behavior is wanted. I have not yet checked how buffering is implemented fully. In writing the parser I found that being able to peek a number of bytes was helpful but I need to be able to peek more than 1 consistently (70 in my case) to meet the rfc I am implementing. This meant the C version of peek would not work. Fine I wrote a wrapper class that adds a buffer... This seemed dumb as I was already using a buffered reader so I detach the stream and use my wrapper. But now the logic and buffer handling is in the slower python where I would rather not have it. This defeats the purpose of the C buffer reader implementation almost. The C version still has a valid use for being able to read arbitrary size reads but that is really all the buffer reader is doing and I can do block oriented reads and buffering in my wrapper since I have to buffer anyway. Unless I only need a guaranteed peek of 1 byte (baring EOF, etc.) the c version doesn't seem very useful other than for random read cases. This is not a full explanation of course but may give you the picture as I see it. In light of the above and my questions I would like to give my input, hopefully to be constructive. This is what I think the api _should_ be the peek impementation. I may have missed things of course but none the less here it is: --------------------- read(n): Current be behavior read1(n): If n is greater than 0 return n or upto current buffer contents bytes advancing the stream position. If n is less than 0 or None return the the buffer contents and advance the position. If the buffer is empty and EOF has not been reached return None. If the buffer is empty and EOF has been reached return b''. peek(n): If n is less than 0 or None return buffer contents with out advancing stream position. Return n bytes up to _buffer size_(not contents) with out advancing the stream position. If the buffer contents is less than n, buffer an additional block from the "raw" stream before hand. This may require a double buffer or such. If EOF is encountered during the raw read then return return as much as we can upto n. leftover(): Return the number (an int) of bytes in the buffer. This is not strictly necessary with the new implementations of peek and read1 being like above but I thought still useful. I could be wrong and am not tied to this idea personally. --------------------- I feel that what I and possibly others would want from a _buffered reader_ is a best try behaviour. So the functions give you what you want except when its very bad or impossible to do so. Very bad meaning losing block alignment and imposible in this case being reading past EOF (or stream out of data). I'm sorry I'm probably not very good at explaining but I do try. I would love to here your input and I would be willing to work on patches for the C version of the buffered reader to implement this _if_ these changes are supported by the authors and the community and _if_ the authors will not will not write the changes but but still support them. Regardless I would need my questions answered if possible. Thanks so much! Frederick Reeve -- http://mail.python.org/mailman/listinfo/python-list