Stefan Behnel added the comment:
> Also, even if the new approach is implemented in the next release,
IncrementalParser can stay as a simple synonym to
XMLParser(target=EventBuilder(...)).
No it can't. According to your signature, it accepts a parser instance as
input. So it can't just create a new one internally.
"""
This is an existing API within ET:
xml.etree.ElementTree.iterparse(source, events=None, parser=None)
The parser argument is limited: it can't use a custom TreeBuilder. It also
must provide _setevents, so either inherit from XMLParser or just provide
that method in another way [I'll try to improve the documentation to make
all of this explicit]. Whatever we say in this issue, iterparse is here to
stay.
"""
I agree. However, I can't see why this should constrain us when adding a new
API. Sure, the current implementation puts certain restrictions on the input.
But I see that as an advantage, not as a constraint. It's currently broken in
what it accepts as input, so we are free to change that aspect later on. And we
can now try to get it right for the new API that we add.
Also, AFAICT, passing a parser into iterparse() is a very rare thing to do, if
only because it doesn't really work, so the likelyhood of breaking code is very
limited (I'm not saying we should, just thinking about the impact). Note that
lxml.etree doesn't currently support this at all.
Just to throw some more ideas into the wild, what do you think about making
this an inherent feature of the XMLParser class itself? That can be done now,
without major changes to the underlying code. It could be enabled by a new
keyword argument "collect_events=None" that you could pass into the constructor
in order to define the set of events. The "events()" method would always be
available, but would only yield events if there actually are any. Otherwise, it
would just terminate and be done. So, unless you pass it a set of event names
at init time, the parser wouldn't collect events and you wouldn't get any out
of it. Sounds like a clean interface to me.
Collecting the events would work in the same way as it currently works, i.e. it
would call the target's start() method, and whatever that returns will be added
as value to the events as a tuple ("start", target_return_value). That means
that it's up to the parser target to determine what the event iteration
returns. Can be elements, can be None, can be something else.
As for backwards compatibility with iterparse(), I don't see a problem with
officially documenting iterparse() as requiring an XMLParser if a parser is
provided, and as reconfiguring that parser to its needs. It already does that
anyway and it currently doesn't work for anything but an XMLParser.
Also, on a related note, we might (!) consider the addition of the _setevents()
method a bug in CPython 3.3. It didn't exist in ElementTree before it was
merged with cElementTree. Instead, the expected interface was (apparently) that
the parser that is passed in has a "_parser" attribute that holds an Expat
parser, and that iterparse() could reconfigure according for itself. Both are
not public APIs and even differed between ET and cET, so I don't see any harm
being done, but it would be good to have one way to do it. OTOH, if that way is
_setevents(), then be it. The advantage would be that this interface is at
least independent of Expat and could potentially be implemented by other types
of parsers.
----------
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue17741>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com