Re: Ignoring XML Namespaces with cElementTree

Stefan Behnel Sat, 01 May 2010 05:37:53 -0700

Carl Banks, 01.05.2010 12:33:

On Apr 29, 10:12 pm, Stefan Behnel wrote:

dmtr, 30.04.2010 04:57:

I don't want these "{http://www.very_long_url.com}"; in front of my
tags.  They create performance disaster on large files


I seriously doubt that they do.


I don't know what kind of XML files you deal with, but for me a large
XML file is gigabyte-sized (obviously I don't use Element Tree for
those).

Why not? I used cElementTree for files of that size (1-1.5GB unpacked) acouple of times, and it was never a problem.

Even for files tens-of-megabyte files string ops to expand tags with
namespaces is going to be a pretty decent penalty--remember
ElementTree does nothing lazily.

So? Did you run a profiler on it to know that there is a penalty due to thestring concatenation? cElementTree's parser (expat) and its tree builderare blazingly fast, especially the iterparse() implementation.


http://codespeak.net/lxml/performance.html#parsing-and-serialising
http://codespeak.net/lxml/performance.html#a-longer-example
http://effbot.org/zone/celementtree.htm#benchmarks

(first cElementTree adds them, then I have to remove them in python).


I think that's your main mistake: don't remove them. Instead, use the fully
qualified names when comparing.


Unless you have multiple namespaces or are working with defined schema
or something, it's useless boilerplate.

It'd be a nice feature if ElementTree could let users optionally
ignore a namespace, unfortunately it doesn't have it.

I agree that that would make for a nice parser option, e.g. when dealingwith HTML and XHTML in the same code.


Stefan

--
http://mail.python.org/mailman/listinfo/python-list

Re: Ignoring XML Namespaces with cElementTree

Reply via email to