[EMAIL PROTECTED] schrieb: > I understand that the web is full of ill-formed XHTML web pages but > this is Microsoft: > > http://moneycentral.msn.com/companyreport?Symbol=BBBY > > I can't validate it and xml.minidom.dom.parseString won't work on it.
Interestingly, no-one mentioned lxml so far: http://codespeak.net/lxml http://codespeak.net/lxml/dev/parsing.html#parsers Parse it as HTML and then use anything from XPath to XSLT to treat it. Have fun, Stefan -- http://mail.python.org/mailman/listinfo/python-list