I've been using the xml.sax.handler module to do event-driven parsing of XML files in this python application I'm working on. However, I keep having really pesky invalid token exceptions. Initially, I was only getting them on control characters, and a little "sed -e 's/ [^[:print:]]/ /g' $1;" took care of that just fine. But recently, I've been getting these invalid token excpetions with n-tildes (like the n in EspaƱa), smart/fancy/curly quotes and other seemingly harmless characters. Specifying encoding="utf-8" in the xml header hasn't helped matters.
Any ideas? As a last resort, I'd be willing to scrub invalid characters.... it just seems strange that curly quotes and n-tildes wouldn't be valid XML! Is that really the case? TIA! Jason -- http://mail.python.org/mailman/listinfo/python-list