[EMAIL PROTECTED] wrote: > (this is a repost, for it's been a while since I posted this text via > Google Groups and it plain didn't appear on c.l.py - if it did appear > anyway, apols)
It did, although some people have added google groups to their kill file. > So I set out to learn handling three-letter-acronym files in Python, > and SAX worked nicely until I encountered badly formed XMLs, like with > bad characters in it (well Unicode supposed to handle it all but > apparently doesn't), If it's not well-formed, it's not XML. XML parsers are required to reject non well-formed input. In case it actually is well-formed XML and the problem is somewhere in your code but you can't see it through the SAX haze, try lxml. It also allows you to pass the expected encoding to the parser to override broken document encodings. http://codespeak.net/lxml/ Stefan -- http://mail.python.org/mailman/listinfo/python-list