Gabriel Genellina wrote: > En Mon, 09 Mar 2009 15:30:31 -0200, Petr Muller <a...@afri.cz> escribió: > >> Thanks for response and sorry for I wasn't clear first time. I have a >> heap of data (logs), from which I build a XML document using >> xml.dom.minidom. In this data, some xml invalid characters may occur - >> form feed (\x0c) character is one example. >> >> I don't know what else is illegal in xml, so I've searched if there's >> some method how to prepare strings for insertion to a xml doc before I >> start research on a xml spec and write such function on my own. > > You don't have to; Python already comes with xml support. Using > ElementTree to build the document is usually easier and faster: > http://effbot.org/zone/element-index.htm
While I usually second that, this isn't the problem here. This thread is about unallowed characters in XML. The set of allowed characters is defined here: http://www.w3.org/TR/xml/#charsets And, as Terry Reedy pointed out, the "unicode.translate" method should get you where you want. Just define a dict that maps the characters that you want to remove to whatever character you want to use instead (or None) and pass that into .translate(). Stefan -- http://mail.python.org/mailman/listinfo/python-list