Giampaolo Rodola' wrote: > Hi, I'm sure the message I'm going to write will seem quite dumb to > most people but I really don't understand the str/bytes/unicode > differences introduced in Python 3.0 so be patient. What I'm trying > to do is porting pyftpdlib to Python 3.x. I don't want to support > Unicode. I don't want pyftpdlib for py 3k to do anything new or > different. I just want it to behave exactly the same as in the 2.x > version and I'd like to know if that's possible with Python 3.x. > > Now. The basic difference is that socket.recv() returns a bytes > object instead of a string object and that's the thing which confuses > me mainly. My question is: is there a way to convert that bytes > object into exactly *the same thing* returned by socket.recv() in > Python 2.x (a string)? > > I know I can do: > > data = socket.recv(1024) > data = data.decode(encoding) > > ...to convert bytes into a string but that's not exactly the same > thing. In Python 2.x I didn't have to care about the encoding. What > socket.recv() returned was just a string. That was all. Now doing > something like b''.decode(encoding) puts me in serious troubles since > that can raise an exception in case client and server use a different > encoding. > > As far as I've understood the basic difference I see now is that a > Python 2.x based FTP server could handle a 3.x based FTP client using > "latin1" encoding or "utf-8" or anything else while with Python 3.x > I'm forced to tell my server which encoding to use and I don't know > how to deal with that. > Originally Python had a single string type 'str' with 8 bits per character. That was a bit limiting for international use. Then a new string type 'unicode' was introduced.
Now, in Python 3.x, it's time to tidy things up. The 'str' type has been renamed 'bytes' and the 'unicode' type has been renamed 'str'. If you're truly working with strings of _characters_ then 'str' is what you need, but if you're working with strings of _bytes_ then 'bytes' is what you need. socket.send() and socket.recv() are still the same, it's just that it's now clearer that they work with bytes and not strings. -- http://mail.python.org/mailman/listinfo/python-list