Re: read from file with mixed encodings in Python3

Dave Angel Mon, 07 Nov 2011 06:37:12 -0800

On 11/07/2011 09:23 AM, Jaroslav Dobrek wrote:

Hello,


in Python3, I often have this problem: I want to do something with
every line of a file. Like Python3, I presuppose that every line is
encoded in utf-8. If this isn't the case, I would like Python3 to do
something specific (like skipping the line, writing the line to
standard error, ...)

Like so:

try:
    ....
except UnicodeDecodeError:
   ...

Yet, there is no place for this construction. If I simply do:

for line in f:
     print(line)

this will result in a UnicodeDecodeError if some line is not utf-8,
but I can't tell Python3 to stop:

This will not work:

for line in f:
     try:
         print(line)
     except UnicodeDecodeError:
         ...

because the UnicodeDecodeError is caused in the "for line in f"-part.

How can I catch such exceptions?

Note that recoding the file before opening it is not an option,
because often files contain many different strings in many different
encodings.

Jaroslav

A file with mixed encodings isn't a text file. So open it with 'rb'mode, and use read() on it. Find your own line-endings, since a given'\n' byte may or may not be a line-ending.

Once you've got something that looks like a line, explicitly decode itusing utf-8. Some invalid lines will give an exception and some willnot. But perhaps you've got some other gimmick to tell the encoding foreach line.


--

DaveA

--
http://mail.python.org/mailman/listinfo/python-list

Re: read from file with mixed encodings in Python3

Reply via email to