Re: XML -> Tab-delimited text file (using lxml)

Stefan Behnel Wed, 19 Nov 2008 08:06:10 -0800

Gibson wrote:
> I'm attempting to do the following:
> A) Read/scan/iterate/etc. through a semi-large XML file (about 135 mb)
> B) Grab specific fields and output to a tab-delimited text file
> [...]
>   out = open('output.txt','w')
>   cat = etree.parse('catalog.xml')


Use iterparse() instead of parsing the file into memory completely.

untested:

    for _, item in etree.iterparse('catalog.xml', tag='Item'):
        # do some cleanup to save memory
        previous_item = item.getprevious()
        while previous_item is not None:
             previous_item.getparent().remove(previous_item)
             previous_item = item.getprevious()

        # now read the data
        id = item.get('ID')
        collect = {}
        for child in item:
            if child.tag != 'ItemVal': continue
            collect[child.get('ValueId')] = child.get('value')

        print "%s\t%s\t%s\t%s" % ((id,) + tuple(
            collect[key] for key in ['name','description','image']))

Stefan
--
http://mail.python.org/mailman/listinfo/python-list

Re: XML -> Tab-delimited text file (using lxml)

Reply via email to