from:"Brian Warner"

[issue10370] py3 readlines() reports wrong offset for UnicodeDecodeError

2010-11-08 Thread Brian Warner

New submission from Brian Warner : I noticed that the UnicodeDecodeError exception produced by trying to do open(fn).readlines() (i.e. using the default ASCII encoding) on a file that's actually UTF-8 reports the wrong offset for the first undecodeable character. From what I can tel

[issue10370] py3 readlines() reports wrong offset for UnicodeDecodeError

2010-11-09 Thread Brian Warner

Brian Warner added the comment: > Use .readline() to locate an invalid byte is not the right algorithm. If > you would like to do that, you should open the file in binary mode and > decodes the content yourself, chunk by chunk. Or if you manipulate small > files, you can use .r