[EMAIL PROTECTED] wrote:
> (this is a repost, for it's been a while since I posted this text via
> Google Groups and it plain didn't appear on c.l.py - if it did appear
> anyway, apols)

It did, although some people have added google groups to their kill file.


> So I set out to learn handling three-letter-acronym files in Python,
> and SAX worked nicely until I encountered badly formed XMLs, like with
> bad characters in it (well Unicode supposed to handle it all but
> apparently doesn't),

If it's not well-formed, it's not XML. XML parsers are required to reject non
well-formed input.

In case it actually is well-formed XML and the problem is somewhere in your
code but you can't see it through the SAX haze, try lxml. It also allows you
to pass the expected encoding to the parser to override broken document 
encodings.

http://codespeak.net/lxml/

Stefan
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to