My original posting has a funky line break character (it appears as an ascii square) that blows up my program, but it may or may not show up when you view my message.
I was afraid to use element tree, since my xml files can be very long, and I was concerned about using memory structures to hold all the data. It is my understanding that SAX reads the file line by line? Is there a way to account for the invalid token in the error handler? I don't mind parsing out the bad characters on a case-by-case basis. The weather data I am ingesting only seems to have this line break character that the parser doesn't like. Thanks! Scott Chris Lambacher wrote: > What exactly is invalid about the XML fragment you provided? > It seems to parse correctly with ElementTree: > >>> from xml.etree import ElementTree as ET > >>> e = ET.fromstring(""" > ... <cities> > ... <city> > ... <name>Tampa</name> > ... <description>A great city ^^ and place to live</description> > ... </city> > ... <city> > ... <name>Clearwater</name> > ... <description>Beautiful beaches</description> > ... </city> > ... </cities> > ... """) > >>> print ET.tostring(e) > <cities> > <city> > <name>Tampa</name> > <description>A great city ^^ and place to live</description> > </city> > <city> > <name>Clearwater</name> > <description>Beautiful beaches</description> > </city> > </cities> > >>> > > Do you have invalid characters? unclosed tags? The solution to each of these > problems is different. More info will solicit better solutions. > > -Chris > > On Fri, Jan 05, 2007 at 01:50:18PM -0800, [EMAIL PROTECTED] wrote: > > I've got an XML feed from a vendor that is not well-formed, and having > > them change it is not an option. I'm trying to figure out how to > > create an error-handler that will ignore the invalid token and continue > > on. > > > > The file is large, so I'd prefer not to put it all in memory or save it > > off and strip out the bad characters before I parse it. > > > > I've included one of the problematic characters in a small XML snippet > > below. > > > > I'm new to Python, and I don't know how to accomplish this. Any help is > > greatly appreciated! > > > > ----------------------------------------------------------------- > > > > Here is my code: > > > > from xml.sax import make_parser > > from xml.sax.handler import ContentHandler > > import StringIO > > > > class ErrorHandler: > > def __init__(self, parser): > > self.parser = parser > > def warning(self, msg): > > print '*** (ErrorHandler.warning) msg:', msg > > def error(self, msg): > > print '*** (ErrorHandler.error) msg:', msg > > def fatalError(self, msg): > > print msg > > > > class ContentHandler(ContentHandler): > > def __init__ (self): > > pass > > def startElement(self, name, attrs): > > pass > > def characters (self, ch): > > pass > > def endElement(self, name): > > pass > > > > xmlstr = """ > > <cities> > > <city> > > <name>Tampa</name> > > <description>A great city and place to live</description> > > </city> > > <city> > > <name>Clearwater</name> > > <description>Beautiful beaches</description> > > </city> > > </cities> > > """ > > parser = make_parser() > > curHandler = ContentHandler() > > errorHandler = ErrorHandler(parser) > > parser.setContentHandler(curHandler) > > parser.setErrorHandler(errorHandler) > > parser.parse(StringIO.StringIO(xmlstr)) > > > > -- > > http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list