Re: Interpreting string containing \u000a

Peter Otten Wed, 18 Jun 2008 05:26:33 -0700

Francis Girard wrote:

> I have an ISO-8859-1 file containing things like
> "Hello\u000d\u000aWorld", i.e. the character '\', followed by the
> character 'u' and then '0', etc.
> 
> What is the easiest way to automatically translate these codes into
> unicode characters ?


If the file really contains the escape sequences use "unicode-escape" as the
encoding:

>>> "Hello\\u000d\\u000aWorld".decode("unicode-escape")
u'Hello\r\nWorld'

If it contains the raw bytes use "iso-8859-1":

>>> "Hello\x0d\x0aWorld".decode("iso-8859-1")
u'Hello\r\nWorld'

Open the file with

codecs.open(filename, encoding=encoding_as_determined_above)

instead of the builtin open().

Peter
--
http://mail.python.org/mailman/listinfo/python-list

Re: Interpreting string containing \u000a

Reply via email to