Michael Ekstrand wrote: > Hello all, > > In my current project, I am working with XML data in a protocol that has > checksum/signature verification of a portion of the document. There is > an envelope with a header element, containing signature data; following > the header is a body. The signatures are computed as cryptographic > checksums of the entire Body element, including start and end tags, > exactly as it appears in the data transmission. > > Therefore, I need to extract the entire text of an element of an XML > document. I have a function that scans an XML string and does this, but > it seems like a rather clumsy way to accomplish this task. I've been > playing with xml.dom.minidom and its toxml() method, but to no avail - > the server sends me XML with empty elements as full open/close tags, > but toxml() serializes them to the XML empty element (<Element/>), so > the checksum winds up not matching. > > Is there some parsing mechanism (using PyXML or any other freely usable > 3rd party library is an option) that will allow me to accomplish this? > Or am I best off sticking with my little string scanning function?
Read up on XML canonicalization (abrreviated as c14n). lxml implements this, also xml.dom.ext.c14n in PyXML. You'll need to canonicalize on both ends before hashing. To paraphrase an Old Master, if you are running a cryptographic hash over a non-canonical XML string representation, then you are living in a state of sin. -- Robert Kern [EMAIL PROTECTED] "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter -- http://mail.python.org/mailman/listinfo/python-list