On Sat, 28 Sep 2019 at 10:53, Peter Otten <__pete...@web.de> wrote: > > Hongyi Zhao wrote: > > > Hi, > > > > I have some code comes from python 2 like the following: > > > > str('a', encoding='utf-8') > > This fails in Python 2 > > >>> str("a", encoding="utf-8") > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > TypeError: str() takes at most 1 argument (2 given) > > ...unless you have redefined str, e. g. with > > >>> str = unicode > >>> str("a", encoding="utf-8") > u'a' > > > But for python 3, this will fail as follows: > > > >>>> str('a', encoding='utf-8') > > Traceback (most recent call last): > > File "<input>", line 1, in <module> > > TypeError: decoding str is not supported > > > > > > How to fix it? > > Don' try to decode an already decoded string; use it directly: > > "a"
To explain a little further, one of the biggest differences between Python 2 and Python 3 is that you *have* to be clear in Python 3 on which data is encoded byte sequences (which need a decode to turn them into text strings, but cannot be encoded, because they already are) and which are text strings (which don't need to be, and can't be, decoded, but which can be encoded if you want to get a byte sequence). If you're not clear whether some data is a byte string or a text string, you will get in a muddle, and Python 2 won't help you (but it will sometimes produce mojibake without generating an error) whereas Python 3 will tend to throw errors flagging the issue (but it may sometimes be stricter than you are used to). Thinking that saying `str = unicode` is a reasonable thing to do is a pretty strong indication that you're not clear on whether your data is text or bytes - either that or you're hoping to make a "quick fix". But as you've found, quick fixes tend to result in a cascade of further issues that *also* need quick fixes. The right solution here (and by far the cleanest one) is to review your code as a whole, and have a clear separation between bytes data and text data. The usual approach people use for this is to decode bytes into text as soon as it's read into your program, and only ever use genuine text data within your program - so you should only ever be using encode/decode in the I/O portion of your application, where it's pretty clear when you have encoded bytes coming in or going out. Hope this helps, Paul -- https://mail.python.org/mailman/listinfo/python-list