Re: Question about XML Parser in Python.

Fredrik Lundh Wed, 11 May 2005 09:39:21 -0700

[EMAIL PROTECTED] wrote:

>   Well, I'm fairly new to Python and have encountered a strange error  while
> reading an XML document in Python. I used the SAX parser, and my input XML  is
> fairly large with 300000 records. I extract about 25 fields from each record
> and spit out a csv file. The strange thing is that after about 2000 records,
> some value (one of the 25) is missing in the csv file, so it just appears as
> ',,', as if the value was missing from the Input file. I checked the Input
> file  and all values are intact.


you're aware that you can get multiple calls to the character data handlers
for each character data section?  (in other words, you cannot just use the
text you get in the first call; you have to collect text sections until you see
an end tag).

alternatively, you can forget about SAX and use a better tool.  tools that
let you iterate over subtrees are a lot easier to use, and can be faster too.

here's the elementtree solution:

    from elementtree.ElementTree import iterparse

    for event, elem in iterparse(source):
        if elem.tag == "record":
            # ... process record elements ...
            elem.clear()

    http://effbot.org/zone/element-iterparse.htm

(if speed is important, the cElementTree implementation of iterparse is ~5x
faster than Python's standard SAX library)

</F>



-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Question about XML Parser in Python.

Reply via email to