Re: just a bug (was: xml.dom.minidom: how to preserve CRLF's inside CDATA?)

Marc 'BlackJack' Rintsch Fri, 25 May 2007 02:53:34 -0700

In <[EMAIL PROTECTED]>, sim.sim wrote:

> Below the code that tryes to parse an well-formed xml, but it fails
> with error message:
> "not well-formed (invalid token): line 3, column 85"


How did you verified that it is well formed?  `xmllint` barf on it too.

> The "problem" within CDATA-section: it consists a part of utf-8
> encoded string wich was splited (widely used for memory limited
> devices).
> 
> When minidom parses the xml-string, it fails becouse it tryes to convert
> into unicode the data within CDATA-section, insted of just to return the
> value of the section "as is". The convertion contradicts the
> specification http://www.w3.org/TR/REC-xml/#sec-cdata-sect

An XML document contains unicode characters, so does the CDTATA section.
CDATA is not meant to put arbitrary bytes into a document.  It must
contain valid characters of this type
http://www.w3.org/TR/REC-xml/#NT-Char (linked from the grammar of CDATA in
your link above).

Ciao,
        Marc 'BlackJack' Rintsch
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: just a bug (was: xml.dom.minidom: how to preserve CRLF's inside CDATA?)

Reply via email to