Serhiy Storchaka <storchaka+cpyt...@gmail.com> added the comment:

My thoughts:

1. Whitespaces are significant in XML. Pretty-printed XML is different from the 
original XML to an XML parser. For some applications some whitespaces around 
tags are not significant. But this depends on the application and in different 
parts of the document whitespaces can have different meaning. For example the 
document can contain a metadata with insignificant whitespaces and marked up 
text with significant whitespaces. There is a special attribute named xml:space 
that can signal the meaning of whitespaces for the part of a document.

https://www.w3.org/TR/xml/#sec-white-space

2. In HTML whitespaces around <P> are insignificant, but whitespaces around <I> 
are significant. Whitespaces inside <PRE>...</PRE> are significant.

3. If strip whitespaces around tags and insert newlines and indentations, 
shouldn't we strip whitespaces inside the text context? Or preserve newlines 
but update indentations?

4. If modify whitespaces on output, it may be worth to add an option to ignore 
insignificant whitespaces on input.

5. Serialization of ElementTree in the stdlib is much slower than in lxml (see 
issue25881). Perhaps it should be implemented in C. But it should be kept 
simple for this. Pretty-printing can be implemented as an outher preprocessing 
operation (for example the original Eli's code indents the tree in-place: 
http://effbot.org/zone/element-lib.htm#prettyprint) or as a proxy that indents 
elements on-fly.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue14465>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to