Stuart McGraw wrote: > I have a broad (~200K nodes) but shallow xml file > I want to parse with Elementtree. There are too many > nodes to read into memory simultaneously so I use > iterparse() to process each node sequentially. > > Now I find i need to get and save the input file line > number of each node. Googling turned up a way > to do it by subclassing FancyTreeBuilder, > (http://groups.google.com/group/comp.lang.python/msg/45f5313409553b4b?hl=en&) > but that tries to read everything at once. > > Is there a way to do something similiar with iterparse()?
something like this could work: import elementtree.ElementTree as ET import StringIO data = """\ <doc> <tag> <subtag>text</subtag> <subtag>text</subtag> </tag> </doc> """ class FileWrapper: def __init__(self, source): self.source = source self.lineno = 0 def read(self, bytes): s = self.source.readline() self.lineno += 1 return s # f = FileWrapper(open("source.xml") f = FileWrapper(StringIO.StringIO(data)) for event, elem in ET.iterparse(f, events=["start", "end"]): if event == "start": print f.lineno, event, elem </F> -- http://mail.python.org/mailman/listinfo/python-list