> And the cool thing is: you can! :) > > In Python 2.6 and later, the new Py3 open() function is a bit more hidden, > but it's still available: > > from io import open > > filename = "somefile.txt" > try: > with open(filename, encoding="utf-8") as f: > for line in f: > process_line(line) # actually, I'd use "process_file(f)" > except IOError, e: > print("Reading file %s failed: %s" % (filename, e)) > except UnicodeDecodeError, e: > print("Some error occurred decoding file %s: %s" % (filename, e))
Thanks. I might use this in the future. > > try: > > for line in f: # here text is decoded implicitly > > do_something() > > except UnicodeDecodeError(): > > do_something_different() > > > This isn't possible for syntactic reasons. > > Well, you'd normally want to leave out the parentheses after the exception > type, but otherwise, that's perfectly valid Python code. That's how these > things work. You are right. Of course this is syntactically possible. I was too rash, sorry. In confused it with some other construction I once tried. I can't remember it right now. But the code above (without the brackets) is semantically bad: The exception is not caught. > > The problem is that vast majority of the thousands of files that I > > process are correctly encoded. But then, suddenly, there is a bad > > character in a new file. (This is so because most files today are > > generated by people who don't know that there is such a thing as > > encodings.) And then I need to rewrite my very complex program just > > because of one single character in one single file. > > Why would that be the case? The places to change should be very local in > your code. This is the case in a program that has many different functions which open and parse different types of files. When I read and parse a directory with such different types of files, a program that uses for line in f: will not exit with any hint as to where the error occurred. I just exits with a UnicodeDecodeError. That means I have to look at all functions that have some variant of for line in f: in them. And it is not sufficient to replace the "for line in f" part. I would have to transform many functions that work in terms of lines into functions that work in terms of decoded bytes. That is why I usually solve the problem by moving fles around until I find the bad file. Then I recode or repair the bad file manually. -- http://mail.python.org/mailman/listinfo/python-list