Carl Banks, 01.05.2010 12:33:
On Apr 29, 10:12 pm, Stefan Behnel wrote:
dmtr, 30.04.2010 04:57:
I don't want these "{http://www.very_long_url.com}" in front of my
tags. They create performance disaster on large files
I seriously doubt that they do.
I don't know what kind of XML files you deal with, but for me a large
XML file is gigabyte-sized (obviously I don't use Element Tree for
those).
Why not? I used cElementTree for files of that size (1-1.5GB unpacked) a
couple of times, and it was never a problem.
Even for files tens-of-megabyte files string ops to expand tags with
namespaces is going to be a pretty decent penalty--remember
ElementTree does nothing lazily.
So? Did you run a profiler on it to know that there is a penalty due to the
string concatenation? cElementTree's parser (expat) and its tree builder
are blazingly fast, especially the iterparse() implementation.
http://codespeak.net/lxml/performance.html#parsing-and-serialising
http://codespeak.net/lxml/performance.html#a-longer-example
http://effbot.org/zone/celementtree.htm#benchmarks
(first cElementTree adds them, then I have to remove them in python).
I think that's your main mistake: don't remove them. Instead, use the fully
qualified names when comparing.
Unless you have multiple namespaces or are working with defined schema
or something, it's useless boilerplate.
It'd be a nice feature if ElementTree could let users optionally
ignore a namespace, unfortunately it doesn't have it.
I agree that that would make for a nice parser option, e.g. when dealing
with HTML and XHTML in the same code.
Stefan
--
http://mail.python.org/mailman/listinfo/python-list