Hi, I've managed to pinpoint the issue inside the code itself and attached a patch for 5.4.4 (I can make one for trunk as well, but at the time of writing I worked with what I had).
The bug manifests itself when delimiter size > 1 AND the file pointer falls in between a delimiter after filling the read buffer with php_stream_fill_read_buffer(). When this happens, the part of the delimiter that falls on the left side of the file pointer is skipped at the next iteration because it was examined before; however, that only makes sense for single character delimiters. My patch will decrement the skip length (if non-zero) by at most <delimiter length - 1> bytes before performing the search. This will make sure any buffered characters are taken into consideration (again). On Tue, Oct 9, 2012 at 4:33 PM, Sherif Ramadan <theanomaly...@gmail.com>wrote: > On Tue, Oct 9, 2012 at 12:59 AM, Tjerk Anne Meesters <datib...@php.net> > wrote: > > On Tue, Oct 9, 2012 at 12:14 AM, Nicolai Scheer <sc...@planetavent.de > >wrote: > > > >> Hi! > >> > >> We switched from php 5.3.10 to 5.3.17 this weekend and stumbled upon a > >> behaviour of stream_get_line that is most likely a bug and breaks a > >> lot of our file processing code. > >> > >> The issue seems to have been introduced from 5.3.10 to 5.3.11. > >> > >> I opened a bug report: #63240. > >> > > > > I've managed to reduce the code to this; it's very specific: > > > > $file = __DIR__ . '/input_dummy.txt'; > > $delimiter = 'MM'; > > file_put_contents($file, str_repeat('.', 8189) . $delimiter . > $delimiter); > > > > $fh = fopen($file, "rb"); > > > > stream_get_line($fh, 8192, $delimiter); > > var_dump($delimiter === stream_get_line($fh, 8192, $delimter)); > > > > fclose($fh); > > unlink($file); > > > > If the internal buffer length is 8192, after the first call to > > stream_get_line() the read position (x) and physical file pointer (y) > > should be positioned like so: > > > > .......MM(x)M(y)M > > > > The fact that (y) is in between the delimiter seems to cause an issue. > > > > > > > I'm not sure why this bug exists, and I haven't exactly been able to > pinpoint where the bug manifests itself, but something I find > incredibly unusual here is the fact that the size of the stream being > exactly 8193 bytes long is the reason the bug exists. > > It has nothing to do with the file pointers position since all we have > to do here is increase or decrease the size of the file by exactly 1 > byte and the bug will never show its face. > > Test case 1: (we decrease the file size from 8193 bytes to 8192 bytes) > > $file = __DIR__ . '/input_dummy.txt'; > $delimiter = 'MM'; > file_put_contents($file, str_repeat('.', 8188) . $delimiter . $delimiter); > > $fh = fopen($file, "rb"); > > stream_get_line($fh, 8192, $delimiter); > var_dump($delimiter === stream_get_line($fh, 8192, $delimiter)); > > fclose($fh); > unlink($file); > > /* bool(false) */ > > --------------------------------------- > > Test 2: (we increase the file size from 8193 bytes to 8194 bytes) > > $file = __DIR__ . '/input_dummy.txt'; > $delimiter = 'MM'; > file_put_contents($file, str_repeat('.', 8190) . $delimiter . $delimiter); > > $fh = fopen($file, "rb"); > > stream_get_line($fh, 8192, $delimiter); > var_dump($delimiter === stream_get_line($fh, 8192, $delimiter)); > > fclose($fh); > unlink($file); > > /* bool(false) */ > > > ---------------------- > > > As long as the file size is not exactly equal to 8193 bytes you don't > get this issue. In fact, you can test it with any multiple of 8192 + 1 > and the same issue appears. However, the bigger anomaly is that it > also requires the length of the delimiter to be larger than 1 before > the bug manifests itself. > > I suspect this has something to do with the way PHP streams are > buffered internally. The internal stream is read up to a certain > length and buffered in memory using the internal API functions, while > your calls to PHP-facing functions like stream_get_line() read > directly from the buffer instead. So it's possible somewhere in this > function (line 1026 of main/streams/streams.c > http://lxr.php.net/xref/PHP_5_4/main/streams/streams.c#1026) lies the > bug. > > > > >> The issue seems to be related to #44607, but that one got fixed years > ago. > >> > >> Is anybody able to confirm this behaviour or has stumbled upon this? > >> > >> Furthermore the behaviour of stream_get_line on an empty file seems to > >> have changed between php 5.3.10 and php 5.3.11: > >> > >> <?php > >> > >> $file = __DIR__ . 'empty.txt'; > >> file_put_contents( $file, '' ); > >> $fh = fopen( $file, 'rb' ); > >> $data = stream_get_line( $fh, 4096 ); > >> var_dump( $data ); > >> > >> result in > >> > >> string(0) "" > >> > >> for php 5.3.10 > >> > >> and in > >> > >> bool(false) > >> > >> for php > 5.3.10. > > > > I don't know if this should be considered a bug, but as far as I know > >> such a behaviour should not change during minor releases... > >> > >> Any insight is appreciated! > >> > >> Greetings > >> > >> Nico > >> > >> -- > >> PHP Internals - PHP Runtime Development Mailing List > >> To unsubscribe, visit: http://www.php.net/unsub.php > >> > >> > > > > > > -- > > -- > > Tjerk > -- -- Tjerk
*** main/streams/streams.c 2012-06-13 12:54:23.000000000 +0800 --- mystreams.c 2012-10-09 17:00:12.000000000 +0800 *************** *** 1017,1022 **** --- 1017,1027 ---- return memchr(&stream->readbuf[stream->readpos + skiplen], delim[0], seek_len - skiplen); } else { + if (skiplen) { + /* left part of the delimiter may still remain in the buffer, + rewind up to <delim_len - 1>*/ + skiplen -= MIN(skiplen, delim_len - 1); + } return php_memnstr((char*)&stream->readbuf[stream->readpos + skiplen], delim, delim_len, (char*)&stream->readbuf[stream->readpos + seek_len]);
-- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php