(this is a repost, for it's been a while since I posted this text via Google Groups and it plain didn't appear on c.l.py - if it did appear anyway, apols)
So I set out to learn handling three-letter-acronym files in Python, and SAX worked nicely until I encountered badly formed XMLs, like with bad characters in it (well Unicode supposed to handle it all but apparently doesn't), using http://dchublist.com/hublist.xml.bz2 as example data, with goal to extract Users and Address properties where number of Users is greater than given number. So I extended my First XML Example with an error handler: # ========= snip =========== from xml.sax import make_parser from xml.sax.handler import ContentHandler from xml.sax.handler import ErrorHandler class HubHandler(ContentHandler): def __init__(self, hublist): self.Address = '' self.Users = '' hl = hublist def startElement(self, name, attrs): self.Address = attrs.get('Address',"") self.Users = attrs.get('Users', "") def endElement(self, name): if name == "Hub" and int(self.Users) > 2000: #print self.Address, self.Users hl.append({self.Address: int(self.Users)}) class HubErrorHandler(ErrorHandler): def __init__(self): pass def error(self, exception): import sys print "Error, exception: %s\n" % exception def fatalError(self, exception): print "Fatal Error, exception: %s\n" % exception hl = [] parser = make_parser() hHandler = HubHandler(hl) errHandler = HubErrorHandler() parser.setContentHandler(hHandler) parser.setErrorHandler(errHandler) fh = file('hublist.xml') parser.parse(fh) def compare(x,y): if x.values()[0] > y.values()[0]: return 1 elif x.values()[0] < y.values()[0]: return -1 return 0 hl.sort(cmp=compare, reverse=True) for h in hl: print h.keys()[0], " ", h.values()[0] # ========= snip =========== And then BAM, Pythonwin has hit me: >>> execfile('ph.py') Fatal Error, exception: hublist.xml:2247:11: not well-formed (invalid token) Fatal Error, exception: hublist.xml:2247:11: not well-formed (invalid token) Fatal Error, exception: hublist.xml:2247:11: not well-formed (invalid token) Fatal Error, exception: hublist.xml:2247:11: not well-formed (invalid token) Fatal Error, exception: hublist.xml:2247:11: not well-formed (invalid token) >>> ================================ RESTART ================================ Just before the "RESTART" line, Windows has announced it killed pythonw.exe process (I suppose it was a child process). WTF is happening here? Wasn't fatalError method in the HubErrorHandler supposed to handle the invalid tokens? And why is the message repeated many times? My method is called apparently, but something in SAX goes awry and the interpreter crashes. -- http://mail.python.org/mailman/listinfo/python-list