Gabriel Genellina wrote:
> En Mon, 09 Mar 2009 15:30:31 -0200, Petr Muller <a...@afri.cz> escribió:
> 
>> Thanks for response and sorry for I wasn't clear first time. I have a
>> heap of data (logs), from which I build a XML document using
>> xml.dom.minidom. In this data, some xml invalid characters may occur -
>> form feed (\x0c) character is one example.
>>
>> I don't know what else is illegal in xml, so I've searched if there's
>> some method how to prepare strings for insertion to a xml doc before I
>> start research on a xml spec and write such function on my own.
> 
> You don't have to; Python already comes with xml support. Using
> ElementTree to build the document is usually easier and faster:
> http://effbot.org/zone/element-index.htm

While I usually second that, this isn't the problem here. This thread is
about unallowed characters in XML. The set of allowed characters is defined
here:

http://www.w3.org/TR/xml/#charsets

And, as Terry Reedy pointed out, the "unicode.translate" method should get
you where you want. Just define a dict that maps the characters that you
want to remove to whatever character you want to use instead (or None) and
pass that into .translate().

Stefan
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to