michele> BTW what's the difference between .encode and .decode ? I started to answer, then got confused when I read the docstrings for unicode.encode and unicode.decode:
>>> help(u"\xe4".decode) Help on built-in function decode: decode(...) S.decode([encoding[,errors]]) -> string or unicode Decodes S using the codec registered for encoding. encoding defaults to the default encoding. errors may be given to set a different error handling scheme. Default is 'strict' meaning that encoding errors raise a UnicodeDecodeError. Other possible values are 'ignore' and 'replace' as well as any other name registerd with codecs.register_error that is able to handle UnicodeDecodeErrors. >>> help(u"\xe4".encode) Help on built-in function encode: encode(...) S.encode([encoding[,errors]]) -> string or unicode Encodes S using the codec registered for encoding. encoding defaults to the default encoding. errors may be given to set a different error handling scheme. Default is 'strict' meaning that encoding errors raise a UnicodeEncodeError. Other possible values are 'ignore', 'replace' and 'xmlcharrefreplace' as well as any other name registered with codecs.register_error that can handle UnicodeEncodeErrors. It probably makes sense to one who knows, but for the feeble-minded like myself, they seem about the same. I'd be happy to add a couple examples to the string methods section of the docs if someone will produce something simple that makes the distinction clear. Skip -- http://mail.python.org/mailman/listinfo/python-list