New submission from Brian Warner :
I noticed that the UnicodeDecodeError exception produced by trying to do
open(fn).readlines() (i.e. using the default ASCII encoding) on a file that's
actually UTF-8 reports the wrong offset for the first undecodeable character.
From what I can tel
Brian Warner added the comment:
> Use .readline() to locate an invalid byte is not the right algorithm. If
> you would like to do that, you should open the file in binary mode and
> decodes the content yourself, chunk by chunk. Or if you manipulate small
> files, you can use .r