globophobe wrote: > In [1]: unicode_html = u'\u3055\u3080\u3044\uff0f\r\n\u3064\u3081\u305f > \u3044\r\n' > > I need to turn this into an elementtree, but some of the data is > japanese whereas the rest is html. This string contains a <br />.
where? <br /> is an element, not a character. "\r" and "\n" are characters, not elements. If you want to build a tree where "\r\n" is replaced with a <br /> element, you can encode the string as UTF-8, use the replace method to insert the element, and then call fromstring. Alternatively, you can build the tree yourself: import xml.etree.ElementTree as ET unicode_html = u'\u3055\u3080\u3044\uff0f\r\n\u3064\u3081\u305f\u3044\r\n' parts = unicode_html.splitlines() elem = ET.Element("data") elem.text = parts[0] for part in parts[1:]: ET.SubElement(elem, "br").tail = part print ET.tostring(elem) </F> -- http://mail.python.org/mailman/listinfo/python-list