[issue17741] event-driven XML parser

Eli Bendersky Mon, 26 Aug 2013 21:02:18 -0700

Eli Bendersky added the comment:

On Mon, Aug 26, 2013 at 12:35 PM, Stefan Behnel <[email protected]>wrote:


>
> Stefan Behnel added the comment:
>
> Here is a proof-of-concept patch that integrates the functionality of the
> IncrementalParser into the XMLParser. I ended up reusing most of Antoines
> implementation and test suite. In case he'll look back into this ticket at
> some point, I'll put a "thank you" here.
>
> The patch certainly needs more cleanup, but it shows where things are
> going and passes all tests.
>

OK, I looked at the patch. So I think I understood your verbal proposal
correctly earlier. And as I said then, I don't like it. I believe Nick has
summarized nicely why heaping everything into XMLParser is a bad idea, but
since you went the extra mile and made a patch, I will try to provide a
more expanded rationale.

Putting _setevents aside for the moment, XMLParser is a clean and simple
API. Its output is only "push" (by calling callbacks on the target). It
doesn't deal with Elements at all. It decomposes the input XML and produces
push events. That's it. It doesn't even have to interact with the target to
a larger extent than pushing callback calls to it.

You propose to add a completely different mode of operation to it,
controlled by a constructor flag. In "collect_events mode", XMLParser
undergoes a transformation. It becomes a pull API. It starts interacting
with the target more deeply by using it to produce elements for events.
Coupled with "weird" targets, its semantics become totally unclear.

So it's a single class that acts completely differently based on an
argument. This is very unpythonic. But things get worse. XMLParser is not
supposed to be an isolated entity. Users should be able to implement custom
parsers that provide its API. When the API is simple - feed/close and some
callbacks, replacing/ehnancing/faking it is easy. When the API is complex
-- two different modes of operation -- it's considerably more difficult.

That's the gist of it. Sticking to the current design of separation of
concerns between the parser and the target/treebuilder is IMO the better
way looking forward. In the mean-time, a helper class for asynchronous
parsing is useful and timely. By restricting its API, changing its
underlying implementation will be easy in the future.

----------

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue17741>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue17741] event-driven XML parser

Reply via email to