[issue18902] Make ElementTree event handling more modular to allow custom targets for the non-blocking parser

Stefan Behnel Sat, 28 Sep 2013 10:43:55 -0700

Stefan Behnel added the comment:

Copying a relevant comment by Eli from 
http://bugs.python.org/issue18990#msg198145 and replying inline.


"""
The way the APIs are currently defined, XMLParser and XMLPullParser are 
different animals. XMLParser can be considered to only have one "front" in the 
API - feed() and close(). You feed() until the document is done and then you 
close() and get the parsed tree. There's no other way to get the parsed tree 
(unless you use a custom builder, I guess).

On the other hand XMLPullParser has two clear "fronts" - an input front with 
feed() and close() and an output front with read_events(). For XMLPullParser, 
close() is just an input signal. The canonical way to get output from 
XMLPullParser is read_events(). close() has no better reason to return output 
than feed(). When we decided to change the method names (recall that Antoine's 
originals were completely different), we perhaps forgot this detail.
"""

No, we didn't.


"""
Even though XMLPullParser's method is named close(), it's *not* like 
XMLParser's close(). If someone is using XMLPullParser for its close() he's 
likely using the class incorrectly.

Just as an example: consider that in a lot of use cases the programmer will 
want to discard parts of the tree that's parsed iteratively (similarly to the 
main use case of iterparse()), because the XML itself is too huge. It's a 
convenient streaming API, in other words. Now, if the reader discards parts of 
the tree (by deleting subtrees), then returning the root from close() becomes 
even more meaningless, because it's no longer the root and we have no idea what 
it actually is.
"""

Let me repeat that this was already the case before the new class was added and 
that it's a feature. If the target decides to discard parts of the tree, or not 
build a tree at all and (say) instead count elements and return their total 
number on close(), then that's what the user asked for by selecting that target.

Let's agree to disagree on your conclusions, but I still can't see any 
advantages of making the separation between the two classes. The way I see it, 
making XMLPullParser inherit from XMLParser makes it very easy to explain what 
the difference is: the read_events() method, i.e. the additional way to receive 
the parse events that the combination of parser and target generate. 
Essentially, it's the target that does all the work here and the parser only 
collects the results and presents them to the user. Thus my intention to keep 
the parser as "stupid" as it looks from the user's side, instead of adding 
something new right next to it.

That being said, if ElementTree keeps them separate and decides to *never* 
return anything from XMLPullParser.close(), then that's sufficiently compatible 
with lxml.etree, so I won't object to it. lxml has a long history of extending 
what's there in order to make it easier to use.

As long as we can find a way to keep both libraries compatible for users, I 
think we should be able to both move forward.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue18902>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue18902] Make ElementTree event handling more modular to allow custom targets for the non-blocking parser

Reply via email to