[issue9521] xml.etree.ElementTree skips processing instructions when parsing

Nikolaus Rath Sun, 19 Jan 2014 19:14:10 -0800

Nikolaus Rath added the comment:

No, I really mean XML processing instruction. I agree with you that the XML 
declaration is a non-issue, because there is no information lost: you know that 
you're going to write XML, and you manually specify the encoding. Thus it's 
trivial to add the correct XML declaration if desired.


The fact that PIs are not read, however, is a real problem. The XML spec 
requires that PIs MUST be passed trough (http://www.w3.org/TR/REC-xml/#sec-pi). 
Furthermore, ElementTree is designed to represent XML data, so writing out an 
ElementTree as XML and reading it back in must (in my opinionn not result in 
information loss. But currently it does:

>>> import xml.etree.ElementTree as ET
>>> import tempfile
>>> root = ET.Element('body', {'text': 'some text for the body'})
>>> root.insert(1, ET.ProcessingInstruction('do-something'))
>>> tree = ET.ElementTree(root)
>>> tmp = tempfile.NamedTemporaryFile()
>>> tree.write(tmp.name)
>>> tmp.seek(0)
0
>>> tree_copy = ET.parse(tmp.name)
>>> ET.dump(tree)
<body text="some text for the body"><?do-something?></body>
>>> ET.dump(tree_copy)
<body text="some text for the body" />

I think tree and tree_copy not having the some contents is a bug.

Regarding comments: personally I think that throwing away is not a good idea 
either. But this is allowed by the XML spec 
(http://www.w3.org/TR/REC-xml/#dt-comment). This should probably go in a 
separate bug report if someone is interested in it.

As for backwards compatibility: yes, this is a concern. The keyword argument 
would be a solution. On the other hand, I'm not sure that the default should be 
something that causes dataloss...?

lxml sounds like it's doing the right things. Is there some connection between 
lxml and etree that I'm not aware of, or did you just give it as an example of 
how a different library behaves?

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue9521>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue9521] xml.etree.ElementTree skips processing instructions when parsing

Reply via email to