On Thu, Jan 26, 2006 at 03:22:11PM +0800, Steve Gunnell wrote: > 5) Seeking through an encoding filter could be highly problematic. > Filters such as "utf8" that have a non-deterministic byte per character > ratio should politely refuse seeks.
In theory it ought to be possible to seek back to any location you were previously at (as returned by C<tell>) For the specific case of UTF8, you can even tell if a random location in the stream was a valid point to seek to, which could be done with only a one character look ahead read (Bad plan on anything not-a-file, mind you, unless you like blocking) or by deferring the error until the next read. I don't know other variable width encodings well enough to know if any other have equivalent abilities to self-synchronise the stream. Clearly as you say, fixed width encodings are fine, when dealing with an entire file. But if you push a UCS32 filter onto a stream after reading an odd number of bytes, valid seek positions aren't going to be multiples of 4. I guess a seek validator can be coded to know this, but it starts getting fiddly. The other alternative would be that seek/tell locations are always in bytes in the underlying stream, and purposefully ignore any many-to-1 filters atop them. Nicholas Clark