Francis Girard wrote: > I have an ISO-8859-1 file containing things like > "Hello\u000d\u000aWorld", i.e. the character '\', followed by the > character 'u' and then '0', etc. > > What is the easiest way to automatically translate these codes into > unicode characters ?
If the file really contains the escape sequences use "unicode-escape" as the encoding: >>> "Hello\\u000d\\u000aWorld".decode("unicode-escape") u'Hello\r\nWorld' If it contains the raw bytes use "iso-8859-1": >>> "Hello\x0d\x0aWorld".decode("iso-8859-1") u'Hello\r\nWorld' Open the file with codecs.open(filename, encoding=encoding_as_determined_above) instead of the builtin open(). Peter -- http://mail.python.org/mailman/listinfo/python-list