On 3/5/2011 10:21 AM, tkp...@hotmail.com wrote:
Question: how do I use f.tell() to
identify if an offset is legal or illegal?

   Read backwards in binary mode, byte by byte,
until you reach a byte which is, in binary, either

        0xxxxxxx
        11xxxxxx

You are then at the beginning of an ASCII or UTF-8
character.  You can copy the bytes forward from there
into an array of bytes, then apply the appropriate
codec.  This is also what you do if skipping ahead
in a UTF-8 file, to get in sync.

   Reading the last line or lines is easier.  Read backwards
in binary until you hit an LF or CR, both of which
are the same in ASCII and UTF-8.  Copy the bytes
forward from that point into an array of bytes, then
apply the appropriate codec.

                                John Nagle
        
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to