Hi,

I've managed to pinpoint the issue inside the code itself and attached a
patch for 5.4.4 (I can make one for trunk as well, but at the time of
writing I worked with what I had).

The bug manifests itself when delimiter size > 1 AND the file pointer falls
in between a delimiter after filling the read buffer with
php_stream_fill_read_buffer().

When this happens, the part of the delimiter that falls on the left side of
the file pointer is skipped at the next iteration because it was examined
before; however, that only makes sense for single character delimiters.

My patch will decrement the skip length (if non-zero) by at most <delimiter
length - 1> bytes before performing the search. This will make sure any
buffered characters are taken into consideration (again).


On Tue, Oct 9, 2012 at 4:33 PM, Sherif Ramadan <theanomaly...@gmail.com>wrote:

> On Tue, Oct 9, 2012 at 12:59 AM, Tjerk Anne Meesters <datib...@php.net>
> wrote:
> > On Tue, Oct 9, 2012 at 12:14 AM, Nicolai Scheer <sc...@planetavent.de
> >wrote:
> >
> >> Hi!
> >>
> >> We switched from php 5.3.10 to 5.3.17 this weekend and stumbled upon a
> >> behaviour of stream_get_line that is most likely a bug and breaks a
> >> lot of our file processing code.
> >>
> >> The issue seems to have been introduced from 5.3.10 to 5.3.11.
> >>
> >> I opened a bug report: #63240.
> >>
> >
> > I've managed to reduce the code to this; it's very specific:
> >
> > $file = __DIR__ . '/input_dummy.txt';
> > $delimiter = 'MM';
> > file_put_contents($file, str_repeat('.', 8189) . $delimiter .
> $delimiter);
> >
> > $fh = fopen($file, "rb");
> >
> > stream_get_line($fh, 8192, $delimiter);
> > var_dump($delimiter === stream_get_line($fh, 8192, $delimter));
> >
> > fclose($fh);
> > unlink($file);
> >
> > If the internal buffer length is 8192, after the first call to
> > stream_get_line() the read position (x) and physical file pointer (y)
> > should be positioned like so:
> >
> > .......MM(x)M(y)M
> >
> > The fact that (y) is in between the delimiter seems to cause an issue.
> >
> >
>
>
> I'm not sure why this bug exists, and I haven't exactly been able to
> pinpoint where the bug manifests itself, but something I find
> incredibly unusual here is the fact that the size of the stream being
> exactly 8193 bytes long is the reason the bug exists.
>
> It has nothing to do with the file pointers position since all we have
> to do here is increase or decrease the size of the file by exactly 1
> byte and the bug will never show its face.
>
> Test case 1: (we decrease the file size from 8193 bytes to 8192 bytes)
>
> $file = __DIR__ . '/input_dummy.txt';
> $delimiter = 'MM';
> file_put_contents($file, str_repeat('.', 8188) . $delimiter . $delimiter);
>
> $fh = fopen($file, "rb");
>
> stream_get_line($fh, 8192, $delimiter);
> var_dump($delimiter === stream_get_line($fh, 8192, $delimiter));
>
> fclose($fh);
> unlink($file);
>
> /* bool(false) */
>
> ---------------------------------------
>
> Test 2: (we increase the file size from 8193 bytes to 8194 bytes)
>
> $file = __DIR__ . '/input_dummy.txt';
> $delimiter = 'MM';
> file_put_contents($file, str_repeat('.', 8190) . $delimiter . $delimiter);
>
> $fh = fopen($file, "rb");
>
> stream_get_line($fh, 8192, $delimiter);
> var_dump($delimiter === stream_get_line($fh, 8192, $delimiter));
>
> fclose($fh);
> unlink($file);
>
> /* bool(false) */
>
>
> ----------------------
>
>
> As long as the file size is not exactly equal to 8193 bytes you don't
> get this issue. In fact, you can test it with any multiple of 8192 + 1
> and the same issue appears. However, the bigger anomaly is that it
> also requires the length of the delimiter to be larger than 1 before
> the bug manifests itself.
>
> I suspect this has something to do with the way PHP streams are
> buffered internally. The internal stream is read up to a certain
> length and buffered in memory using the internal API functions, while
> your calls to PHP-facing functions like stream_get_line() read
> directly from the buffer instead. So it's possible somewhere in this
> function (line 1026 of main/streams/streams.c
> http://lxr.php.net/xref/PHP_5_4/main/streams/streams.c#1026) lies the
> bug.
>
>
>
> >> The issue seems to be related to #44607, but that one got fixed years
> ago.
> >>
> >> Is anybody able to confirm this behaviour or has stumbled upon this?
> >>
> >> Furthermore the behaviour of stream_get_line on an empty file seems to
> >> have changed between php 5.3.10 and php 5.3.11:
> >>
> >> <?php
> >>
> >> $file = __DIR__ . 'empty.txt';
> >> file_put_contents( $file, '' );
> >> $fh = fopen( $file, 'rb' );
> >> $data = stream_get_line( $fh, 4096 );
> >> var_dump( $data );
> >>
> >> result in
> >>
> >> string(0) ""
> >>
> >> for php 5.3.10
> >>
> >> and in
> >>
> >> bool(false)
> >>
> >> for php > 5.3.10.
> >
> > I don't know if this should be considered a bug, but as far as I know
> >> such a behaviour should not change during minor releases...
> >>
> >> Any insight is appreciated!
> >>
> >> Greetings
> >>
> >> Nico
> >>
> >> --
> >> PHP Internals - PHP Runtime Development Mailing List
> >> To unsubscribe, visit: http://www.php.net/unsub.php
> >>
> >>
> >
> >
> > --
> > --
> > Tjerk
>



-- 
--
Tjerk
*** main/streams/streams.c      2012-06-13 12:54:23.000000000 +0800
--- mystreams.c 2012-10-09 17:00:12.000000000 +0800
***************
*** 1017,1022 ****
--- 1017,1027 ----
                return memchr(&stream->readbuf[stream->readpos + skiplen],
                        delim[0], seek_len - skiplen);
        } else {
+               if (skiplen) {
+                       /* left part of the delimiter may still remain in the 
buffer,
+                       rewind up to <delim_len - 1>*/
+                       skiplen -= MIN(skiplen, delim_len - 1);
+               }
                return php_memnstr((char*)&stream->readbuf[stream->readpos + 
skiplen],
                                delim, delim_len,
                                (char*)&stream->readbuf[stream->readpos + 
seek_len]);
-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to