Jaroslav Dobrek wrote: > Hello, > > in Python3, I often have this problem: I want to do something with > every line of a file. Like Python3, I presuppose that every line is > encoded in utf-8. If this isn't the case, I would like Python3 to do > something specific (like skipping the line, writing the line to > standard error, ...) > > Like so: > > try: > .... > except UnicodeDecodeError: > ... > > Yet, there is no place for this construction. If I simply do: > > for line in f: > print(line) > > this will result in a UnicodeDecodeError if some line is not utf-8, > but I can't tell Python3 to stop: > > This will not work: > > for line in f: > try: > print(line) > except UnicodeDecodeError: > ... > > because the UnicodeDecodeError is caused in the "for line in f"-part. > > How can I catch such exceptions? > > Note that recoding the file before opening it is not an option, > because often files contain many different strings in many different > encodings.
I don't see those files often, but I think they are all seriously broken. There's no way to recover the information from files with unknown mixed encodings. However, here's an approach that may sometimes work: >>> with open("tmp.txt", "rb") as f: ... for line in f: ... try: ... line = "UTF-8 " + line.decode("utf-8") ... except UnicodeDecodeError: ... line = "Latin-1 " + line.decode("latin-1") ... print(line, end="") ... UTF-8 äöü Latin-1 äöü UTF-8 äöü -- http://mail.python.org/mailman/listinfo/python-list