This is likely an easy problem; however, I couldn't think of appropriate keywords for google:
Basically, I have some raw data that needs to be preprocessed before it is saved to the database e.g. In [1]: unicode_html = u'\u3055\u3080\u3044\uff0f\r\n\u3064\u3081\u305f \u3044\r\n' I need to turn this into an elementtree, but some of the data is japanese whereas the rest is html. This string contains a <br />. In [2]: e = ET.fromstring('<data>%s</data>' % unicode_html) In [2]: e.text Out[3]: u'\u3055\u3080\u3044\uff0f\n\u3064\u3081\u305f\u3044\n' In [4]: len(e) Out[4]: 0 How can I decode the unicode html <br /> into a string that ElementTree can understand? -- http://mail.python.org/mailman/listinfo/python-list