I've been parsing existing HTML with BeautifulSoup, and occasionally hit content which has something like "Design & Advertising", that is, an "&" instead of an "&". Is there some way I can get BeautifulSoup to clean those up? There are various parsing options related to "&" handling, but none of them seem to do quite the right thing.
If I write the BeautifulSoup parse tree back out with "prettify", the loose "&" is still in there. So the output is rejected by XML parsers. Which is why this is a problem. I need valid XML out, even if what went in wasn't quite valid. John Nagle -- http://mail.python.org/mailman/listinfo/python-list