The default encoding is "UTF-8". It works if I do: with open("filename", errors="ignore") as f: ....
So I think Python2, by default, ignores all errors whereas Python3 doesn't On 1 December 2014 at 01:49, Chris Angelico <ros...@gmail.com> wrote: > On Sun, Nov 30, 2014 at 7:07 PM, balaji marisetti > <balajimarise...@gmail.com> wrote: >> Hi, > > Hi. This list is for the development *of* Python, not development > *with* Python, so I'm sending this reply also to > python-list@python.org where it can be better handled. You'll probably > want to subscribe here: > > https://mail.python.org/mailman/listinfo/python-list > > or alternatively, point a news reader at comp.lang.python. Let's > continue this conversation on python-list rather than python-dev. > >> When I try to iterate through the lines of a >> file("openssl-1.0.1j/crypto/bn/asm/x86_64-gcc.c"), I get a >> UnicodeDecodeError (in python 3.4.0 on Ubuntu 14.04). But there is no >> such error with python 2.7.6. What could be the problem? > > The difference between the two Python versions is that 2.7 lets you be > a bit sloppy about Unicode vs bytes, but 3.4 requires that you keep > them properly separate. > >> In [39]: with open("openssl-1.0.1j/crypto/bn/asm/x86_64-gcc.c") as f: >> for line in f: >> print (line) >> >> --------------------------------------------------------------------------- >> UnicodeDecodeError Traceback (most recent call last) >> <ipython-input-39-24a3ae32a691> in <module>() >> 1 with open("../openssl-1.0.1j/crypto/bn/asm/x86_64-gcc.c") as f: >> ----> 2 for line in f: >> 3 print (line) >> 4 >> >> /usr/lib/python3.4/codecs.py in decode(self, input, final) >> 311 # decode input (taking the buffer into account) >> 312 data = self.buffer + input >> --> 313 (result, consumed) = self._buffer_decode(data, >> self.errors, final) >> 314 # keep undecoded input until the next call >> 315 self.buffer = data[consumed:] >> >> >> -- >> :-)balaji > > Most likely, the line of input that you just reached has a non-ASCII > character, and the default encoding is ASCII. (Though without the > actual exception message, I can't be sure of that.) The best fix would > be to know what the file's encoding is, and simply add that as a > parameter to your open() call - perhaps this: > > with open("filename", encoding="utf-8") as f: > > If you use the right encoding, and the file is correctly encoded, you > should have no errors. If you still have errors... welcome to data > problems, life can be hard. :| > > ChrisA -- :-)balaji -- https://mail.python.org/mailman/listinfo/python-list