[EMAIL PROTECTED] wrote: > Well, I'm fairly new to Python and have encountered a strange error while > reading an XML document in Python. I used the SAX parser, and my input XML is > fairly large with 300000 records. I extract about 25 fields from each record > and spit out a csv file. The strange thing is that after about 2000 records, > some value (one of the 25) is missing in the csv file, so it just appears as > ',,', as if the value was missing from the Input file. I checked the Input > file and all values are intact.
you're aware that you can get multiple calls to the character data handlers for each character data section? (in other words, you cannot just use the text you get in the first call; you have to collect text sections until you see an end tag). alternatively, you can forget about SAX and use a better tool. tools that let you iterate over subtrees are a lot easier to use, and can be faster too. here's the elementtree solution: from elementtree.ElementTree import iterparse for event, elem in iterparse(source): if elem.tag == "record": # ... process record elements ... elem.clear() http://effbot.org/zone/element-iterparse.htm (if speed is important, the cElementTree implementation of iterparse is ~5x faster than Python's standard SAX library) </F> -- http://mail.python.org/mailman/listinfo/python-list