On 22 май, 16:45, "sim.sim" <[EMAIL PROTECTED]> wrote: > Hi all. > i'm faced to trouble using minidom: > > #i have a string (xml) within CDATA section, and the section includes > "\r\n": > iInStr = '<?xml version="1.0"?>\n<Data><![CDATA[BEGIN:VCALENDAR\r > \nEND:VCALENDAR\r\n]]></Data>\n' > > #After i create DOM-object, i get the value of "Data" without "\r\n" > > from xml.dom import minidom > iDoc = minidom.parseString(iInStr) > iDoc.childNodes[0].childNodes[0].data # it gives u'BEGIN:VCALENDAR > \nEND:VCALENDAR\n' > > according tohttp://www.w3.org/TR/REC-xml/#sec-line-ends > > it looks normal, but another part of the documentation says that "only > the CDEnd string is recognized as > markup":http://www.w3.org/TR/REC-xml/#sec-cdata-sect > > so parser must (IMHO) give the value of CDATA-section "as is" (neither > both of parts of the document do not contradicts to each other). > > How to get the value of CDATA-section with preserved all symbols > within? (perhaps use another parser - which one?) > > Many thanks for any help.
Hi all, I have another problem with minidom and now it is really critical. Below the code that tryes to parse an well-formed xml, but it fails with error message: "not well-formed (invalid token): line 3, column 85" from xml.dom import minidom iMessage = "3c3f786d6c2076657273696f6e3d22312e30223f3e0a3c6d657373616\ 7653e0a202020203c446174613e3c215b43444154415bd094d0b0d0bdd0bdd18bd0b5\ 20d0bfd0bed0bfd183d0bbd18fd180d0bdd18bd18520d0b7d0b0d0bfd180d0bed181d\ 0bed0b220d0bcd0bed0b6d0bdd0be20d183d187d0b8d182d18bd0b2d0b0d182d18c20\ d0bfd180d0b820d181d0bed0b1d181d182d0b2d0b5d0bdd0bdd18bd18520d180d0b5d\ 0bad0bbd0b0d0bcd0bdd15d5d3e3c2f446174613e0a3c2f6d6573736167653e0a0a".\ decode('hex') iMsgDom = minidom.parseString(iMessage) The "problem" within CDATA-section: it consists a part of utf-8 encoded string wich was splited (widely used for memory limited devices). When minidom parses the xml-string, it fails becouse it tryes to convert into unicode the data within CDATA-section, insted of just to return the value of the section "as is". The convertion contradicts the specification http://www.w3.org/TR/REC-xml/#sec-cdata-sect So my question still open: How to get the value of CDATA-section with preserved all symbols within? (perhaps use another parser - which one?) Thanks for help. Maksim -- http://mail.python.org/mailman/listinfo/python-list