On Fri, 16 Jan 2009 17:32:17 -0800, Giampaolo Rodola' wrote: > On 17 Gen, 02:24, MRAB <goo...@mrabarnett.plus.com> wrote: > >> If you're truly working with strings of _characters_ then 'str' is what >> you need, but if you're working with strings of _bytes_ then 'bytes' is >> what you need. > > I work with string of characters but to convert bytes into string I need > to specify an encoding and that's what confuses me. Before there was no > need to deal with that.
In Python 2.x, str means "string of bytes". This has been renamed "bytes" in Python 3. In Python 2.x, unicode means "string of characters". This has been renamed "str" in Python 3. If you do this in Python 2.x: my_string = str(bytes_from_socket) then you don't need to convert anything, because you are going from a string of bytes to a string of bytes. If you do this in Python 3: my_string = str(bytes_from_socket) then you *do* have to convert, because you are going from a string of bytes to a string of characters (unicode). The Python 2.x equivalent code would be: my_string = unicode(bytes_from_socket) and when you convert to unicode, you can get encoding errors. A better way to do this would be some variation on: my_str = bytes_from_socket.decode('utf-8') You should read this: http://www.joelonsoftware.com/articles/Unicode.html -- Steven -- http://mail.python.org/mailman/listinfo/python-list