Hi, On Saturday 11 October 2008 19:45:09 Gregory Beaver wrote: > Hi, > > I'm grappling with a design flaw I just uncovered in stream filters, and > need some advice on how best to fix it. The problem exists since the > introduction of stream filters, and has 3 parts. 2 of them can probably > be fixed safely in PHP 5.2+, but I think the third may require an > internal redesign of stream filters, and so would probably have to be > PHP 5.3+, even though it is a clear bugfix (Ilia, your opinion > appreciated on this). > > The first part of the bug that I encountered is best described here: > http://bugs.php.net/bug.php?id=46026. However, it is a deeper problem > than this, as the attempts to cache data is dangerous any time a stream > filter is attached to a stream. I should also note that the patch in > this bug contains feature additions that would have to wait for PHP 5.3. > > I ran into this problem because I was trying to use stream filters to > read in a bz2-compressed file within a zip archive in the phar > extension. This was failing, and I first tracked the problem down to an > attempt by php_stream_filter_append to read in a bunch of data and cache > it, which caused more stuff to be passed into the bz2 decompress filter > than it could handle, making it barf. After fixing this problem, I ran > into the problem described in the bug above because of > php_stream_fill_read_buffer doing the same thing when I tried to read > the data, because I requested it return 176 decompressed bytes, and so > php_stream_read passed in 176 bytes to the decompress filter. Only 144 > of those bytes were actually bz2-compressed data, and so the filter > barfed upon trying to decompress the remaining data (same as bug #46026, > found differently). > > You can probably tell from my explanation that this is an > extraordinarily complex problem. There's 3 inter-related problems here: > > 1) bz2 (and zlib) stream filter should stop trying to decompress when it > reaches the stream end regardless of how many bytes it is told to > decompress (easy to fix) > 2) it is never safe to cache read data when a read stream filter is > appended, as there is no safe way to determine in advance how much of > the stream can be safely filtered. (would be easy to fix if it weren't > for #3) > 3) there is no clear way to request that a certain number of filtered > bytes be returned from a stream, versus how many unfiltered bytes should > be passed into the stream. (very hard to fix without design change) > > I need some advice on #3 from the original designers of stream filters > and streams, as well as any experts who have dealt with this kind of > problem in other contexts. In this situation, should we expect stream > filters to always stop filtering if they reach the end of valid input? > Even in this situation, there is potential that less data is available > than passed in. A clear example would be if we requested only 170 > bytes. 144 of those bytes would be passed in as the complete compressed > data, and bz2.decompress would decompress all of it to 176 bytes. 170 > of those bytes would be returned from php_stream_read, and 6 would have > to be placed in a cache for future reads. Thus, there would need to be > some way of marking the cache as valid because of this logic path: > > <?php > $a = fopen('blah.zip'); > fseek($a, 132); // fills read buffer with unfiltered data > stream_filter_append($a, 'bzip2.decompress'); // clears read buffer cache > $b = fread($a, 170); // fills read buffer cache with 6 bytes > fseek($a, 3, SEEK_CUR); // this should seek within the filtered data > read buffer cache > stream_filter_append($a, 'zlib.inflate'); > ?> > > The question is what should happen when we append the second filter > 'zlib.inflate' to filter the filtered data? If we clear the read buffer > as we did in the first case, it will result in lost data. So, let's > assume we preserve the read buffer. Then, if we perform: > > <?php > $c = fread($a, 7); > ?> > > and assume the remaining 3 bytes expand to 8 bytes, how should the read > buffer cache be handled? Should the first 3 bytes still be the filtered > bzip2 decompressed data, and the last 3 replaced with the 8 bytes of > decompressed zlib data? > > Basically, I am wondering if perhaps we need to implement a read buffer > cache for each stream filter. This could solve our problem, I think. > The data would be stored like so: > > stream: 170 bytes of unfiltered data, and a pointer to byte 145 as the > next byte for php_stream_read() > bzip2.decompress filter: 176 bytes of decompressed bzip2 data, and a > pointer to byte 171 as the next byte for php_stream_read() > zlib.inflate filter: 8 bytes of decompressed zlib data, and a pointer to > byte 8 as the next byte for php_stream_read() > > This way, we would essentially have a stack of stream data. If the zlib > filter were then removed, we could "back up" to the bzip2 filter and so > on. This will allow proper read cache filling, and remove the weird > ambiguities that are apparent in a filtered stream. I don't think we > would need to worry about backwards compatibility here, as the most > common use case would be unaffected by this change, and the use case it > would fix has never actually worked. > > I haven't got a patch for this yet, but it would be easy to do if the > logic is sound. >
The problem is mainly to be able to filter a given amount of bytes, starting at given position, and to known when this amount of bytes have been passed to the filter. I would propose a new argument to stream_filter_append: stream_filter_append(stream, filter_name[, max_input_bytes]) So that only max_input_bytes will be passed to the filter. To known when this amount of bytes have been passed to the filter I would propose to make the stream act as a slice of the original stream: returns EOF once max_input_bytes have been passed to the filter. Removing the filter clears the EOF flag and allows to read again from the stream. Your proposition of a read buffer cache for each filter would help in that, and in making filters more robust to some use cases. Regards, Arnaud -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php