On May 22, 8:45 am, "sim.sim" <[EMAIL PROTECTED]> wrote: > Hi all. > i'm faced to trouble using minidom: > > #i have a string (xml) within CDATA section, and the section includes > "\r\n": > iInStr = '<?xml version="1.0"?>\n<Data><![CDATA[BEGIN:VCALENDAR\r > \nEND:VCALENDAR\r\n]]></Data>\n' > > #After i create DOM-object, i get the value of "Data" without "\r\n" > > from xml.dom import minidom > iDoc = minidom.parseString(iInStr) > iDoc.childNodes[0].childNodes[0].data # it gives u'BEGIN:VCALENDAR > \nEND:VCALENDAR\n' > > according tohttp://www.w3.org/TR/REC-xml/#sec-line-ends > > it looks normal, but another part of the documentation says that "only > the CDEnd string is recognized as > markup":http://www.w3.org/TR/REC-xml/#sec-cdata-sect > > so parser must (IMHO) give the value of CDATA-section "as is" (neither > both of parts of the document do not contradicts to each other). > > How to get the value of CDATA-section with preserved all symbols > within? (perhaps use another parser - which one?) > > Many thanks for any help.
I'm thinking that the endline character "\n" is relevant for *nix systems. So if you're running this on Windows, Python will translate it automatically to "\r\n". According to Lutz's book, Programming Python 3rd Ed, it's for historical reasons. It says that most text editors handle text in Unix format, with the exception of Notepad, which is why some documents are displayed as just one long line in Notepad. (see pg 150 of said book). The book goes on to talk about how to use a script that will check this endline character and fix it depending on the platform you're running under. The following link seems to do something along those lines as well. http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/435882 Not exactly helpful, but maybe it'll give you some insight into the issue. Mike -- http://mail.python.org/mailman/listinfo/python-list