On May 22, 2:45 pm, "sim.sim" <[EMAIL PROTECTED]> wrote: > Hi all. > i'm faced to trouble using minidom: > > #i have a string (xml) within CDATA section, and the section includes > "\r\n": > iInStr = '<?xml version="1.0"?>\n<Data><![CDATA[BEGIN:VCALENDAR\r > \nEND:VCALENDAR\r\n]]></Data>\n' > > #After i create DOM-object, i get the value of "Data" without "\r\n" > > from xml.dom import minidom > iDoc = minidom.parseString(iInStr) > iDoc.childNodes[0].childNodes[0].data # it gives u'BEGIN:VCALENDAR > \nEND:VCALENDAR\n' > > according tohttp://www.w3.org/TR/REC-xml/#sec-line-ends > > it looks normal, but another part of the documentation says that "only > the CDEnd string is recognized as > markup":http://www.w3.org/TR/REC-xml/#sec-cdata-sect > > so parser must (IMHO) give the value of CDATA-section "as is" (neither > both of parts of the document do not contradicts to each other). > > How to get the value of CDATA-section with preserved all symbols > within? (perhaps use another parser - which one?) > > Many thanks for any help.
You will lose the \r characters. From the document you referred to """ This section defines some symbols used widely in the grammar. S (white space) consists of one or more space (#x20) characters, carriage returns, line feeds, or tabs. White Space [3] S ::= (#x20 | #x9 | #xD | #xA)+ Note: The presence of #xD in the above production is maintained purely for backward compatibility with the First Edition. As explained in 2.11 End-of-Line Handling, all #xD characters literally present in an XML document are either removed or replaced by #xA characters before any other processing is done. The only way to get a #xD character to match this production is to use a character reference in an entity value literal. """ -- http://mail.python.org/mailman/listinfo/python-list