I have a Django app that selects one of many possible XML documents,
parses it with minidom.parse(), finds all elements of a certain tag
with getElementsByTagName() then it sends a small subset of those
elements off to a client browser.  So I typically have to parse many
thousands of elements to make the DOM object to then get the
appropriate 50 or 100 element portion that will be sent off to the
user as a smaller XML file where it is parsed at the client side with
javascript.

My memory performance with Apache and mod_wsgi isn't very good, and I
assume that's because I parse the whole file into memory, and also
because getElements creates another large object before I can extract
my desired elements. I do a bit of scrutinizing and massaging of them
before they get wrapped up and sent on.

I'd like to call unlink() on the big objects, but I don't know that I
can do that somewhere, as I 'return' the small subset, and I don't
have control after that is returned by Django to the requestor.

Any thoughts on a) is this a big contributor to my memory footprint?
b) wouldn't this all be garbage collected after the response is
handled anyway?  c) would I be better off with a SAX parsing of the
XML file to avoid building the whole DOM tree to get my small subset
of objects?

Your thoughts appreciated. Is there a better way?  I wouldn't want to
stuff the XML-file resident content into a database and rebuild the
XML element subsets because there are tons of files arriving and
disappearing and it would become a huge background activity for
another program.

Ross.

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-us...@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.

Reply via email to