On Friday, December 30, 2016 at 7:16:25 AM UTC+5:30, Steve D'Aprano wrote: > On Sun, 25 Dec 2016 04:50 pm, Grady Martin wrote: > > > On 2016年12月22日 22時38分, wrote: > >>I am getting the error: > >>UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 15: > >>invalid start byte > > > > The following is a reflex of mine, whenever I encounter Python 2 Unicode > > errors: > > > > import sys > > reload(sys) > > sys.setdefaultencoding('utf8') > > > This is a BAD idea, and doing it by "reflex" without very careful thought is > just cargo-cult programming. You should not thoughtlessly change the > default encoding without knowing what you are doing -- and if you know what > you are doing, you won't change it at all. > > The Python interpreter *intentionally* removes setdefaultencoding at startup > for a reason. Changing the default encoding can break the interpreter, and > it is NEVER what you actually need. If you think you want it because it > fixes "Unicode errors", all you are doing is covering up bugs in your code. > > Here is some background on why setdefaultencoding exists, and why it is > dangerous: > > https://anonbadger.wordpress.com/2015/06/16/why-sys-setdefaultencoding-will-break-code/ > > If you have set the Python 2 default encoding to anything but ASCII, you are > now running a broken system with subtle bugs, including in data structures > as fundamental as dicts. > > The standard behaviour: > > py> d = {u'café': 1} > py> for key in d: > ... print key == 'caf\xc3\xa9' > ... > False > > > As we should expect: the key in the dict, u'café', is *not* the same as the > byte-string 'caf\xc3\xa9'. But watch how we can break dictionaries by > changing the default encoding: > > py> reload(sys) > <module 'sys' (built-in)> > py> sys.setdefaultencoding('utf-8') # don't do this > py> for key in d: > ... print key == 'caf\xc3\xa9' > ... > True > > > So Python now thinks that 'caf\xc3\xa9' is a key. Or does it? > > py> d['caf\xc3\xa9'] > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > KeyError: 'caf\xc3\xa9' > > By changing the default encoding, we now have something which is both a key > and not a key of the dict at the same time. > > > > > A relevant Stack Exchange thread awaits you here: > > > > http://stackoverflow.com/a/21190382/2230956 > > And that's why I don't trust StackOverflow. It's not bad for answering > simple questions, but once the question becomes more complex the quality of > accepted answers goes down the toilet. The highest voted answer is *wrong* > and *dangerous*. > > And then there's this comment: > > Until this moment I was forced to include "# -- coding: utf-8 --" at > the begining of each document. This is way much easier and works as > charm > > I have no words for how wrong that is. And this comment: > > ty, this worked for my problem with python throwing UnicodeDecodeError > on var = u"""vary large string""" > > No it did not. There is no possible way that Python will throw that > exception on assignment to a Unicode string literal. > > It is posts like this that demonstrate how untrustworthy StackOverflow can > be. > > > > -- > Steve > “Cheer up,” they said, “things could be worse.” So I cheered up, and sure > enough, things got worse.
Thanks for your detailed comment. The code is going all fine sometimes, and sometimes giving out errors. If any one may see how I am doing the problem. -- https://mail.python.org/mailman/listinfo/python-list