Hi,

On Saturday 11 October 2008 19:45:09 Gregory Beaver wrote:
> Hi,
> 
> I'm grappling with a design flaw I just uncovered in stream filters, and
> need some advice on how best to fix it.  The problem exists since the
> introduction of stream filters, and has 3 parts.  2 of them can probably
> be fixed safely in PHP 5.2+, but I think the third may require an
> internal redesign of stream filters, and so would probably have to be
> PHP 5.3+, even though it is a clear bugfix (Ilia, your opinion
> appreciated on this).
> 
> The first part of the bug that I encountered is best described here:
> http://bugs.php.net/bug.php?id=46026.  However, it is a deeper problem
> than this, as the attempts to cache data is dangerous any time a stream
> filter is attached to a stream.  I should also note that the patch in
> this bug contains feature additions that would have to wait for PHP 5.3.
> 
> I ran into this problem because I was trying to use stream filters to
> read in a bz2-compressed file within a zip archive in the phar
> extension.  This was failing, and I first tracked the problem down to an
> attempt by php_stream_filter_append to read in a bunch of data and cache
> it, which caused more stuff to be passed into the bz2 decompress filter
> than it could handle, making it barf.  After fixing this problem, I ran
> into the problem described in the bug above because of
> php_stream_fill_read_buffer doing the same thing when I tried to read
> the data, because I requested it return 176 decompressed bytes, and so
> php_stream_read passed in 176 bytes to the decompress filter.  Only 144
> of those bytes were actually bz2-compressed data, and so the filter
> barfed upon trying to decompress the remaining data (same as bug #46026,
> found differently).
> 
> You can probably tell from my explanation that this is an
> extraordinarily complex problem.  There's 3 inter-related problems here:
> 
> 1) bz2 (and zlib) stream filter should stop trying to decompress when it
> reaches the stream end regardless of how many bytes it is told to
> decompress (easy to fix)
> 2) it is never safe to cache read data when a read stream filter is
> appended, as there is no safe way to determine in advance how much of
> the stream can be safely filtered. (would be easy to fix if it weren't
> for #3)
> 3) there is no clear way to request that a certain number of filtered
> bytes be returned from a stream, versus how many unfiltered bytes should
> be passed into the stream. (very hard to fix without design change)
> 
> I need some advice on #3 from the original designers of stream filters
> and streams, as well as any experts who have dealt with this kind of
> problem in other contexts.  In this situation, should we expect stream
> filters to always stop filtering if they reach the end of valid input? 
> Even in this situation, there is potential that less data is available
> than passed in.  A clear example would be if we requested only 170
> bytes.  144 of those bytes would be passed in as the complete compressed
> data, and bz2.decompress would decompress all of it to 176 bytes.  170
> of those bytes would be returned from php_stream_read, and 6 would have
> to be placed in a cache for future reads.  Thus, there would need to be
> some way of marking the cache as valid because of this logic path:
> 
> <?php
> $a = fopen('blah.zip');
> fseek($a, 132); // fills read buffer with unfiltered data
> stream_filter_append($a, 'bzip2.decompress'); // clears read buffer cache
> $b = fread($a, 170); // fills read buffer cache with 6 bytes
> fseek($a, 3, SEEK_CUR); // this should seek within the filtered data
> read buffer cache
> stream_filter_append($a, 'zlib.inflate');
> ?>
> 
> The question is what should happen when we append the second filter
> 'zlib.inflate' to filter the filtered data?  If we clear the read buffer
> as we did in the first case, it will result in lost data.  So, let's
> assume we preserve the read buffer.  Then, if we perform:
> 
> <?php
> $c = fread($a, 7);
> ?>
> 
> and assume the remaining 3 bytes expand to 8 bytes, how should the read
> buffer cache be handled?  Should the first 3 bytes still be the filtered
> bzip2 decompressed data, and the last 3 replaced with the 8 bytes of
> decompressed zlib data?
> 
> Basically, I am wondering if perhaps we need to implement a read buffer
> cache for each stream filter.  This could solve our problem, I think. 
> The data would be stored like so:
> 
> stream: 170 bytes of unfiltered data, and a pointer to byte 145 as the
> next byte for php_stream_read()
> bzip2.decompress filter: 176 bytes of decompressed bzip2 data, and a
> pointer to byte 171 as the next byte for php_stream_read()
> zlib.inflate filter: 8 bytes of decompressed zlib data, and a pointer to
> byte 8 as the next byte for php_stream_read()
> 
> This way, we would essentially have a stack of stream data.  If the zlib
> filter were then removed, we could "back up" to the bzip2 filter and so
> on.  This will allow proper read cache filling, and remove the weird
> ambiguities that are apparent in a filtered stream.  I don't think we
> would need to worry about backwards compatibility here, as the most
> common use case would be unaffected by this change, and the use case it
> would fix has never actually worked.
> 
> I haven't got a patch for this yet, but it would be easy to do if the
> logic is sound.
> 

The problem is mainly to be able to filter a given amount of bytes, starting  
at given position, and to known when this amount of bytes have been passed to 
the filter.

I would propose a new argument to stream_filter_append:

stream_filter_append(stream, filter_name[, max_input_bytes])

So that only max_input_bytes will be passed to the filter. To known when this 
amount of bytes have been passed to the filter I would propose to make the 
stream act as a slice of the original stream: returns EOF once max_input_bytes 
have been passed to the filter. Removing the filter clears the EOF flag and 
allows to read again from the stream.

Your proposition of a read buffer cache for each filter would help in that, 
and in making filters more robust to some use cases.

Regards,

Arnaud



-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to