On Thu, Jan 26, 2006 at 03:22:11PM +0800, Steve Gunnell wrote:

> 5) Seeking through an encoding filter could be highly problematic.
> Filters such as "utf8" that have a non-deterministic byte per character
> ratio should politely refuse seeks.

In theory it ought to be possible to seek back to any location you were
previously at (as returned by C<tell>)

For the specific case of UTF8, you can even tell if a random location in
the stream was a valid point to seek to, which could be done with only a
one character look ahead read (Bad plan on anything not-a-file, mind you,
unless you like blocking) or by deferring the error until the next read.

I don't know other variable width encodings well enough to know if any other
have equivalent abilities to self-synchronise the stream.

Clearly as you say, fixed width encodings are fine, when dealing with an
entire file. But if you push a UCS32 filter onto a stream after reading an
odd number of bytes, valid seek positions aren't going to be multiples of 4.
I guess a seek validator can be coded to know this, but it starts getting
fiddly. The other alternative would be that seek/tell locations are always
in bytes in the underlying stream, and purposefully ignore any many-to-1
filters atop them.

Nicholas Clark

Reply via email to