Skip Montanaro wrote: > I started to answer, then got confused when I read the docstrings for > unicode.encode and unicode.decode: [snip]
It certainly is confusing. When I first started Unicoding, I pretty much stuck to Aahz's rule of thumb, without understanding this details, and still do that. But now I do undertstand it. Although encodings are bijective (i.e., equivalent one-to-one mappings), they are not apolar. One side of the encoding is arbitrarily labeled the encoded form; the other is arbitrarily labeled the decoded form. (This is not a relativistic system, here.) The encode method maps from the decoded to the encoded set. The decode method does the inverse. That's it. The only real technical difference between encode and decode is the direction they map in. By convention, the decoded form is a Python unicode string, and the encoded form is the byte string. I believe it's technically possible (but very rude) to write an "inverse encoding", where the "encoded" form is a unicode string, and the decoded form is UTF-8 byte string. Also, note that there are some encodings unrelated to Unicode. For example, try this: . >>> "abcd".encode("base64") This is an encoding between two byte strings. -- CARL BANKS -- http://mail.python.org/mailman/listinfo/python-list