Re: python xml DOM? pulldom? SAX?

Fredrik Lundh Mon, 29 Aug 2005 09:48:48 -0700

"jog" wrote:

> I want to get text out of some nodes of a huge xml file (1,5 GB). The
> architecture of the xml file is something like this


> I want to combine the text out of page:title and page:revision:text for
> every single page element. One by one I want to index these combined
> texts (so for each page one index)

here's one way to do it:

try:
    import cElementTree as ET
except ImportError:
    from elementtree import ElementTree as ET

for event, elem in ET.iterparse(file):
    if elem.tag == "page":
        title = elem.findtext("title")
        revision = elem.findtext("revision/text")
        print title, revision
        elem.clear() # won't need this any more

references:

    http://effbot.org/zone/element-index.htm
    http://effbot.org/zone/celementtree.htm (for best performance)
    http://effbot.org/zone/element-iterparse.htm

</F> 



-- 
http://mail.python.org/mailman/listinfo/python-list

Re: python xml DOM? pulldom? SAX?

Reply via email to